Navigation

Friday, 17 September 2010

Presenting at SUGUK London - October 14th

Yep, the next SharePoint User Group UK (SUGUK) London session has been announced, and I'm going to be presenting on Sandbox Development. (yay!)

The event is on October 14th 2010 at at Mostyn Hotel (Marble Arch, Bryanston Street, London, W1H 7BY) and the evening starts at 18:00.

I have the second slot and will be talking about Sandbox Development, a subject which I've been keenly interested in what with Content and Code's products and the expectation of SharePoint 2010 suposedly coming to BPOS (Microsoft's cloud services).
"This development focused session will walk through the new SharePoint 2010 Sandbox Solutions framework, including the architecture, configuration and development of Sandboxed solutions. Although covering off a few administrative functions do not be fooled .. we will be opening Visual Studio 2010, stepping through code and debugging processes!"
The session kicks off with Jaap Vossers talking about SharePoint and JQuery. I've been working with Jaap for several months now and some of the JQuery stuff he's done is fantastic, so this is well worth coming along for that session alone!
"This session will cover everything you need to know about harnessing the power of jQuery in your SharePoint sites. The introduction will look at getting started, the syntax and the plugins before we look at jQuery in a SharePoint context; the benefits, examples, calling web services, loading scripts and deployment. We will then cover the various frameworks and utilities (SPServices, jQueryLoader) before rounding off with a look at further integration opportunities in SP2010 and specifically the Client OM and REST. This is a technical session."
If you want to register then please sign up at the SUGUK Forum Post (http://suguk.org/forums/thread/25091.aspx).

I'm really looking forward to it, so please come along, and hopefully I'll see you there!

Thursday, 16 September 2010

Forays into SharePoint 2010 Performance Testing with Visual Studio 2010

Over the past six months I have increasingly become an evangelist of Performance Testing. It has always previously been an area that I was aware of but I never really got massively involved in, but recently I've seen it as an increasingly important part of my work, especially on the larger scale projects with load balanced web front ends (for performance, not just redundancy) and you start hitting I/O limits on SQL. I suppose this may have been triggered by the SharePoint Conference 2009, and one of my follow up blog posts "Load Testing SharePoint 2010 with Visual Studio Team Test".

So in this post I firstly wanted to look at why you should do Performance Testing?

It sounds like a bit of a stupid question (with an obvious answer) but it really is surprising how many people don't do it. How many of you have ever asked the following questions on a project?
"How many users can the production system support?"
"What would be the impact of doubling the number of users?"
"What impact with backups have on performance?"
"How fast will the solution perform during peak hours?"
"What is the most cost-effective way of improving performance?"
All of these are questions that you absolutely HAVE to be able to answer. The client (whether it is your organisation, or another organisation who you are running a project for) deserves to know the answers to these, and without them how can you have any idea whether your solution is going to be fit for purpose?

Sure, you can read up on Estimating Performance and Capacity Planning in SharePoint, but all that gives you is some rough guidelines.. we need to be able to apply some science to the process!

The last question is probable the most compelling. Re-configuring farms and buying new hardware is an expensive process, the consultancy alone can cost thousands of pounds, and you don't want to have your client coming back asking why they just spent tens of thousands of pounds on a new state of the art iSCSI SAN array, to have zero impact on performance ("hey .. we thought it would help .. but we didn't really know!") because the bottleneck was actually the CPU on the Web Front End (WFE).

The story often gets even worse when things do start going wrong. If you have ever been in the unfortunate position where you are troubleshooting a system that is performing badly, these kinds of questions are quite common:
"What is causing the poor performance?"
"How can we fix this?"
"Why did you not notice this during development?"

Again, the last two questions is the killer.. if you don't do any Performance Testing then you won't know that you have a problem until it is too late. The earlier you can get some metrics on this, the faster you will be able to react to performance issues (in some cases finding them and fixing them before the client even knows about it!)

Equally, without performance testing you won't know WHY the problems are occuring. If you don't know why then you can't know HOW the best way is to fix them!

So the key messages are this:

  • Early Warning .. catch problems early on and they will be easier to fix. There is no point waiting until users are hitting the system to find out the solution can't cope with the load!
  • Knowledge ... what is causing the problems, and how do you fix them?
  • Confidence ... not just that you know what you are doing, but you can prove it. This instils confidence in your sales, confidence in your delivery, and confidence from your clients too!
Performance Testing with Visual Studio 2010
I've been using Visual Studio 2010 Ultimate edition. It is the only "2010" product that incorporates Web Performance Tests and Load Tests, the two critical pieces that you will use to test the performance on SharePoint 2010 (or any other web based system). It also integrates tightly with Team Foundation Server and provides "Lab Management" capability, but that is out of the scope of this blog post.

In order to do comprehensive testing you really need 4 different software packages:
  1. Visual Studio 2010 Ultimate: This is where you create your tests and control the execution of them.
  2. Visual Studio 2010 Test Controller: Part of the Visual Studio Agents 2010 ISO, this allows you to co-ordinate tests executed by several "agents", as well as collecting results and storing all of the test results (and performance counters) in a database. The license for this is included in Visual Studio 2010 Ultimate.
  3. Visual Studio 2010 Test Agent: Part of the Visual Studio Agents 2010 ISO, this can be installed on machines that will simulate load and execute tests. They are connected to a "Controller" which gives them instructions. The license for this is included in Visual Studio 2010 Ultimate.
  4. Visual Studio 2010 Virtual User Pack: This is a license that allows you to increase the number of virtual "users" you can simulate by 1,000 (for each pack that you purchase). This is a separate license that must be purchased separately (there is no trial version!)
If you need any help installing these and getting them running then there is a great MSDN article which you should read: Installing and Configuring Visual Studio Agents and Test and Build Controllers or the equally awesome article from Visual Studio Magazine: Load Testing with Visual Studio 2010.

So what about actually creating the tests?

Well, the interface is pretty simple. You can create your "Web Performance Tests" using a simple Browser Recorder (literally using a Web Browser which records all of your actions, and then click "stop" when you are finished). This works great, but there are a few caveats:
  • You might want to use the "Generate Code" option if you are adding documents or list items. This converts your recorded web test into a code file, allowing you to programmatically change document names, or field values .. useful to make sure you are not just overwriting the same document over and over again
  • Web Service tests require a bit more "knowledge" of how they work, needing the SOAP envelope (in XML) and the SOAPAction header.
It is worth noting that there is an excellent Code Plex project available: "SharePoint Performance Tests". Although this was written for Visual Studio 2008 (you can convert it to 2010 if you want) it contains a number of configurable tests (via XML) that allow you to dynamically create tests for generic SharePoint platforms .. well worth a look!

You can then very easily create a "Load Test" which allows you to pick'n'mix tests, and a distribution of which tests you want to run.

My personal favourite is the "Tests Per User Per Hour". For this you would sit down with your client and work out "what would a typical user do in an hour of using the system.." one such activity resulted in this kind of activity distribution:
  • Hit the site home page 50 times
  • Execute 10 searches
  • Upload 5 documents
  • Respond to 20 workflow tasks


This kind of valuable information allows you to build your tests and then distribute them using the Load Test. All you do then is plug in how many users you want to simulate and away you go!

Counting the Counters?
All of this so far is great stuff, but without the performance counters you really aren't going to get much legs from Visual Studio. You might get the WHAT is going on (i.e. do the tests complete very quickly?) but you certainly won't get the WHY information which is oh-so important (i.e. is it the CPU, RAM or Disk?)

For this you need to add Performance Counters... thankfully this is rediculously simple. You have something called "Counter Sets" which you can configure to collect from the computers that operate in your farm.
There are a bunch of pre-defined counter-sets you can choose from:
  • Application
  • ASP.Net (I pick this for my WFE Servers)
  • .Net Application (I pick this for my Application Servers)
  • IIS
  • SQL (I pick this for my SQL Servers)


I won't go into any more detail than that. A step-by-step walkthrough of the options (including screenshots) can be found at the Load Testing with Visual Studio 2010 article at Visual Studio Magazine.

What about the Results?
Well, there isn't a really simple answer to this. You really need to have a good understanding on how the different hardware components interact, and what limits you should be looking for.

The big hardware counters (CPU usage, Available Memory) are the obvious ones. Any server which exceeds 80% CPU usage for any sustained period is going to be in trouble and is close to a bottleneck. Equally any server which starts to run out of memory (or more importantly .. slowly loses memory, suggesting a memory leak!) should be identified.

But it's the deeper, more granular analysis that proves most useful. On a recent client project I was looking at a Proof of Concept environment. We knew that we had a bottleneck in our WFE (CPU was averaging around 90%) and it was extremely workflow heavy, but the page performance was far too bad to put down to just the CPU.

On closer inspection we found a direct correlation between ther Page Response Time and the Disk Queue Length in SQL Server:



The top-left corner is the Disk Queue Length in SQL Server, and the Top Right is the Page Response Time for the Document Upload operation (bottom right is the overall Test Response time), clearly the spikes happened at the same time.

This is the true power of using Visual Studio. All of the tests and performance counters are time-stamped, allowing you to drill into any specific instance and see exactly what was happening at that moment in time!

Looking closer at the SQL Disk usage, the Write Time (%) and Read Time (%) show us even more interesting results:


The top of the graph shows the Disk Write Usage (%) and the bottom half shows the Disk Read Usage (%). Clearly, the disk is very busy writing (often being at 100%) while it does very little reading. This fits perfectly with our test results as most of the "read" operations (like viewing the home page, or executing a search result) were extremely fast ... but most of the "write" operations (like uploading a document) were much slower.

So the WHAT is slow write performance (uploading of documents).
The WHY is now very simple, the disks on the SQL Server need looking at (possibly upgrading to faster disks, or some optimisation in the configuration of the databases).

Conclusion
To be honest I could talk about this subject all day, but hopefully this gives you some indication of just how crucial Performance Testing is .. and how powerful Visual Studio can be as a testing tool.

The ease of creating test scripts, the vast flexibility and power of the enormous performance counters available, and the ability to drill into a single second of activity and see (simultaneously) what was going on in all of the other servers .. its an awesome combination.

I'll probably be posting more blog posts on this in the future, but for now good luck, and hope you get as much of a kick out of VS2010 as I have :)

Tuesday, 7 September 2010

HTC Twitter "Peep" and OAuth

[UPDATE - 27/09/2010]
HTC Peep on my HTC HD2 is now working!
I found that I needed to login using my email address instead of my username, but it is now working again!
(hurrah)
[/UPDATE]

If you are using a Windows Mobile device from HTC (like me .. I have an HTC) then you've probably run into the same Twitter issue that I have.

Twitter recently shutdown their Basic Authentication method for Twitter and this hosed a whole range of Twitter applications which were not appropriately using the "OAuth" method that Twitter preferred.

One of those applications is the HTC "Peep" application. Now, I'm a quite avid consumer of Twitter, and although Twitter did recently release an announcement that it was working  it seems that HTC Windows Mobile clients (such as the HD2) are still not working.

Well, I submitted a question to HTC Support and they very kindly sent me a response back (in under 1 hour, very impressive). Their email response was as follows:

"We are currently investigating an issue with our Peep/Twitter/Friend Stream client that has stopped working and hope to have a solution soon. Please monitor the support pages for updates, or if you prefer we can record your details and contact you again once a solution is available"
So hopefully it will all be working again soon. Fingers crossed ... It's not exactly a mission-critical application for me, but I do hate it when things just "stop working" that I was a user of.

Thursday, 2 September 2010

How to: Achieve Count(*) on a large SharePoint list

This has been a mission of mine for a while now (before I went on holiday and took a 2 week hiatus from all things SharePoint :)).

One of the clients I've been working with has been trying to replicate a pretty simple operation (by normal development standards). They have a SharePoint list with a LOT of items in it (we are talking 200,000 list items and above) and includes some Choice fields.

They want to return a count of how often each choice value is being used. Now, if you were using SQL Server you would simply do the following pseudo-SQL:
select count(*) from myList group by myChoiceField
At first look in SharePoint this is not possible:
  • There is no "count" operation in CAML, nor any other kind of aggregation function
  • SharePoint Search "full text query" does not support the count(*) operator (or anything similar)
  • The only reference to aggregations is in the SPView.Aggregations property .. this is only used by the rendered HTML and the values are not returned in the result set.
Now .. I know that you can get count values on a list, if you create a View with a Group By then it shows you the number of items in each group, so it MUST be possible! So my mission started

List view with groups
We want to replicate this behaviour,
but programmatically!

First.. we need a test environment
The first thing I did was create a really big list. We are talking about 200,000 list items, so you can't just pull all the items out in an SPQuery (as it would be far too slow!).

I generated a simple custom list. I add a choice field (with optional values of 1-20) and then generated 200,000 list items with a randomly assigned choice value (and a bunch of them without any choice value at all .. just for laughs).

Now I could play with my code

Attempt Number 1 - Retrieve all list and programmatically calculate the counts (fail)
I kinda knew this wouldn't work .. but I needed a sounding board to know HOW bad it really was. There are 200,000 items after all, so this was never going to be fast.
  • Use SPQuery to retrieve 2 fields (the ID, and my "choice" field).
  • Retrieve the result set, and iterate through them, incremementing an integer value to get each "group" count value
This was a definite #fail.To retrieve all 200,000 list items in a single SPQuery took about 25 seconds to execute ... FAR too slow.

Attempt Number 2 - Execute separate query for each "group" (fail)
I was a little more positive with this one ... smaller queries execute much faster so this had some legs (and this is certainly a viable option if you only want the count for a SINGLE group).
  • Create an SPQuery for each of the "choice" values we want to group by (there are 20 of them!)
  • Execute each query, and use SPListItemCollection.Count to get the value
Unfortunately this was another spectacular #fail. Each query executed in around 2 seconds .. which would be fine if we didn't have to do it 20 times! :( (i.e. 40 second page load!!)

Attempt Number 3 - Use the SPView object (success!)
Ok .. so I know that the SPView can render extremely fast. With my sample list, and creating a streamlined "group by" view it was rendering in about 2 seconds (and thats on my laptop VM! I'm sure a production box would be much much quicker).

The main problem is ... how do you get these values programmatically?

The SPView class contains a "RenderAsHtml" method which returns the full HTML output of the List View (including all of the group values, javascript functions, the lot). My main question was how did it actually work? (and how on earth did it get those values so quickly!)

I started off poking into the SPView object using Reflector (tsk tsk). The chain I ended up following was this:
  • SPView.RenderAsHtml() -->

    • SPList.RenderAsHtml() (obfuscated ... arghhhh)
So that was a dead end .. I did some more poking around and found out that SPContext also has a view render method ...
  • SPContext.RenderViewAsHtml() -->

    • SPContextInternalClass.RenderViewAsHtml() -->

      • COM object ! (arghhhh)
Now .. the fact that we just hit a COM object suggests that we are starting to wander towards the SQL queries that get executed to retrieve the view data .. I didn't want to go anywhere NEAR that one, so I decided to leave it there and perhaps try using the output HTML instead (nasty .. but not much of a choice left!).
using (SPSite site = new SPSite(http://myspsite))
{
SPList list = site.RootWeb.Lists["TestList"];
string strViewHtml = list.Views["GroupedView"].RenderAsHtml();
}
Having done this we now have the HTML output of our view (and this code takes about 2-3 seconds to execute ... fast enough for my laptop .. we can always cache the value if needed).
 
Looking through the DOM output in the browser, it was possible to identify the "group" element by their attributes. It is a TBody node with both an ID attribute and a "groupString" attribute (the GroupString is the important one, as it tells us the view is configured to "Group By").
 
What I needed next was a way of getting the actual values out of the HTML. For this I used the extremely awesome "HTML Agility Pack" from Codeplex. This is a set of libraries that allow you to parse DOM elements, including both plain "poorly formed" HTML as well as XHTML, and then use XPath queries to extract any values you want (much in the same way you would normally use the XML namespace for XHTML).
 
This gave me the TBODY node, and from there I could use string manipulation on the "InnerText" to pull out the group name and the count value :)
// Using HTML Agility Pack - Codeplex
// load the HTML into the HtmlDocument object

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(strViewHtml);

// retrieve all TBODY elements which have both
// an ID and groupString attribute
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//tbody[@id][@groupstring]");
if (nodes != null)
{
foreach (HtmlNode node in nodes)
{
// extract the Group Name
string strGroupName = node.InnerText.Substring(node.InnerText.LastIndexOf(" ")+6);
strGroupName = strGroupName.Substring(0, strGroupName.IndexOf("&#")-1);
Console.Write ("Group: " + strGroupName + ", ");

// extract the number of items
string strValueText = node.InnerText.Substring(node.InnerText.LastIndexOf("(") + 1);
Console.WriteLine("Number of Items: " + strValueText.Substring(0, strValueText.Length - 1));
}
}
As you can see I'm doing some rather nasty SubString statements.. there may well be a quicker and cleaner way to do this using Regex .. this was more a proof of concept than anything else :)

Result!
Console output, showing group names and counts.
3 seconds isn't bad, running on a "single server" laptop VM image :)

The end result was 2-3 second bit of code, retreiving Group By, Count values for a list with 200,000 list items.

Not bad for an afternoons work :)

Attempt 4 - Do the same thing in JQuery (kudos to Jaap Vossers)
This was actually the original solution, I asked Jaap if he could look at this if he had spare time, as I knew he had a lot of JQuery experience (and he blew me away by having it all working in under 30 minutes!).

Basically it uses pretty standard JQuery to go off and retrieve the HTML content from another page, scraping the HTML and pulling back the values. Same as the C# it grabs the group TBody, then walks down the DOM to retrieve the text value that it outputs.

The speed is roughly the same as the actual view itself. I'm sure some more JQuery could be employed to pull out the specific values and do more with them, but the concept appears to be sound:
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>

<script type="text/javascript">
$(document).ready(function(){

// you will need to change this URL
var url = http://myspsite/Lists/MyList/GroupedView.aspx;

var groupings = [];

$.get(url, function(data) {
$(data).find("tbody[id^=titl][groupString] > tr > td").each(

function(index, value){
groupings.push($(this).text());
}
);

$("#placeholder").append("<ul></ul>");

$.each(groupings, function(index, value){

$("#placeholder ul").append("<li>" + value + "</li>")
});
});
});

</script>
<div id="placeholder"></div>
tada .. (thanks Jaap!)


Result from JQuery output,
dropped into a Content Editor Web Part

Summary
Well .. I know doing HTML scraping isn't pretty, but seeing as the code is MUCH faster than anything else I've seen (and is stuck in the middle of a COM object) there didn't seem to be much choice.

By all means, feel free to let me know if you have any alternatives to this.