One of the clients I've been working with has been trying to replicate a pretty simple operation (by normal development standards). They have a SharePoint list with a LOT of items in it (we are talking 200,000 list items and above) and includes some Choice fields.
They want to return a count of how often each choice value is being used. Now, if you were using SQL Server you would simply do the following pseudo-SQL:
select count(*) from myList group by myChoiceFieldAt first look in SharePoint this is not possible:
- There is no "count" operation in CAML, nor any other kind of aggregation function
- SharePoint Search "full text query" does not support the count(*) operator (or anything similar)
- The only reference to aggregations is in the SPView.Aggregations property .. this is only used by the rendered HTML and the values are not returned in the result set.
List view with groups
We want to replicate this behaviour,
but programmatically!
|
First.. we need a test environment
The first thing I did was create a really big list. We are talking about 200,000 list items, so you can't just pull all the items out in an SPQuery (as it would be far too slow!).
I generated a simple custom list. I add a choice field (with optional values of 1-20) and then generated 200,000 list items with a randomly assigned choice value (and a bunch of them without any choice value at all .. just for laughs).
Now I could play with my code
Attempt Number 1 - Retrieve all list and programmatically calculate the counts (fail)
I kinda knew this wouldn't work .. but I needed a sounding board to know HOW bad it really was. There are 200,000 items after all, so this was never going to be fast.
- Use SPQuery to retrieve 2 fields (the ID, and my "choice" field).
- Retrieve the result set, and iterate through them, incremementing an integer value to get each "group" count value
Attempt Number 2 - Execute separate query for each "group" (fail)
I was a little more positive with this one ... smaller queries execute much faster so this had some legs (and this is certainly a viable option if you only want the count for a SINGLE group).
- Create an SPQuery for each of the "choice" values we want to group by (there are 20 of them!)
- Execute each query, and use SPListItemCollection.Count to get the value
Attempt Number 3 - Use the SPView object (success!)
Ok .. so I know that the SPView can render extremely fast. With my sample list, and creating a streamlined "group by" view it was rendering in about 2 seconds (and thats on my laptop VM! I'm sure a production box would be much much quicker).
The main problem is ... how do you get these values programmatically?
The SPView class contains a "RenderAsHtml" method which returns the full HTML output of the List View (including all of the group values, javascript functions, the lot). My main question was how did it actually work? (and how on earth did it get those values so quickly!)
I started off poking into the SPView object using Reflector (tsk tsk). The chain I ended up following was this:
- SPView.RenderAsHtml() -->
- SPList.RenderAsHtml() (obfuscated ... arghhhh)
- SPContext.RenderViewAsHtml() -->
- SPContextInternalClass.RenderViewAsHtml() -->
- COM object ! (arghhhh)
using (SPSite site = new SPSite(http://myspsite))Having done this we now have the HTML output of our view (and this code takes about 2-3 seconds to execute ... fast enough for my laptop .. we can always cache the value if needed).
{
SPList list = site.RootWeb.Lists["TestList"];
string strViewHtml = list.Views["GroupedView"].RenderAsHtml();
}
Looking through the DOM output in the browser, it was possible to identify the "group" element by their attributes. It is a TBody node with both an ID attribute and a "groupString" attribute (the GroupString is the important one, as it tells us the view is configured to "Group By").
What I needed next was a way of getting the actual values out of the HTML. For this I used the extremely awesome "HTML Agility Pack" from Codeplex. This is a set of libraries that allow you to parse DOM elements, including both plain "poorly formed" HTML as well as XHTML, and then use XPath queries to extract any values you want (much in the same way you would normally use the XML namespace for XHTML).
This gave me the TBODY node, and from there I could use string manipulation on the "InnerText" to pull out the group name and the count value :)
// Using HTML Agility Pack - Codeplex
// load the HTML into the HtmlDocument object
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(strViewHtml);
// retrieve all TBODY elements which have both
// an ID and groupString attribute
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//tbody[@id][@groupstring]");
if (nodes != null)As you can see I'm doing some rather nasty SubString statements.. there may well be a quicker and cleaner way to do this using Regex .. this was more a proof of concept than anything else :)
{
foreach (HtmlNode node in nodes)
{
// extract the Group Name
string strGroupName = node.InnerText.Substring(node.InnerText.LastIndexOf(" ")+6);
strGroupName = strGroupName.Substring(0, strGroupName.IndexOf("&#")-1);
Console.Write ("Group: " + strGroupName + ", ");
// extract the number of items
string strValueText = node.InnerText.Substring(node.InnerText.LastIndexOf("(") + 1);
Console.WriteLine("Number of Items: " + strValueText.Substring(0, strValueText.Length - 1));
}
}
Result!
Console output, showing group names and counts.
3 seconds isn't bad, running on a "single server" laptop VM image :)
|
The end result was 2-3 second bit of code, retreiving Group By, Count values for a list with 200,000 list items.
Not bad for an afternoons work :)
Attempt 4 - Do the same thing in JQuery (kudos to Jaap Vossers)
This was actually the original solution, I asked Jaap if he could look at this if he had spare time, as I knew he had a lot of JQuery experience (and he blew me away by having it all working in under 30 minutes!).
Basically it uses pretty standard JQuery to go off and retrieve the HTML content from another page, scraping the HTML and pulling back the values. Same as the C# it grabs the group TBody, then walks down the DOM to retrieve the text value that it outputs.
The speed is roughly the same as the actual view itself. I'm sure some more JQuery could be employed to pull out the specific values and do more with them, but the concept appears to be sound:
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>tada .. (thanks Jaap!)
<script type="text/javascript">
$(document).ready(function(){
// you will need to change this URL
var url = http://myspsite/Lists/MyList/GroupedView.aspx;
var groupings = [];
$.get(url, function(data) {
$(data).find("tbody[id^=titl][groupString] > tr > td").each(
function(index, value){
groupings.push($(this).text());
}
);
$("#placeholder").append("<ul></ul>");
$.each(groupings, function(index, value){
$("#placeholder ul").append("<li>" + value + "</li>")
});
});
});
</script>
<div id="placeholder"></div>
Result from JQuery output,
dropped into a Content Editor Web Part
|
Summary
Well .. I know doing HTML scraping isn't pretty, but seeing as the code is MUCH faster than anything else I've seen (and is stuck in the middle of a COM object) there didn't seem to be much choice.
By all means, feel free to let me know if you have any alternatives to this.
Haah I was working on top contributors web part. thought of doing it using datatable because getting group count using object model appeared almost impossible.
ReplyDeleteYou made my day. We survive because of people like you.
Thanks a lot
Any suggestion of what to try if this doesn't work for me right away? Everything seems great, I verified the paths to the grouped view and jquery, as well as verified that the view was set up in groups, but this script doesn't output anything. Any pointers would be greatly appreciated.
ReplyDeletehey Martin,
ReplyDeleteAs you mentioned, you had 200k items in the list. Did you create items in different folders. I have a similar requirement to have 100k items in a task list and I am not sure of the best way to do this. Can you please show me the way? I would reaaly really aprpeciate if you can please take out time to answer me.
Thanks.
Nisha
nisha.xclusiv@gmail.com
Hi Martin,
ReplyDeleteAS you mentioned you 200k items in a List. I have similar required to put 100k items in a task list. I am not sure of the best way to approach it. Can you please advise?
I would really appreciate if you can please take out some time to answer me.
Thanks
I've done this a little bit different and HTML Scrapped the 'All site content' page.
ReplyDelete/_layouts/viewlsts.aspx
This page shows the item count and doesn't require a new page to be specially created.
Adam Larner
Adam,
ReplyDeleteYes, although that page actually calls the SPList.ItemCount property which gives the same information.
My example above was providing counts for filtered items within a list (i.e. a Group By operator)
Like Skeene said, I did not get an output. I replaced the var url with mines but nothing was displayed... any ideas? anything else that needs to be replaced?
ReplyDeletethe url may need quotes
ReplyDelete// you will need to change this URL
var url = "http://myspsite/Lists/MyList/GroupedView.aspx";
blah. bot checks... some day only robots will be able to prove they're not robots
Dude. This is so unbelievably awesome. Thank you!
ReplyDeleteHi, from your screenshots you were using SharePoint 2007.
ReplyDeleteHave you tried that with SharePoint 2010 and its famous list view threshold?
As your list is above the 5,000 items threshold (200,000 items) your grouped view shouldn't have been able to return all items... I'm interested to know as I need to do something similar with SharePoint 2010. Thanks in advance for your help.