Content Summaries with HtmlAgilityPack

Recently I had a client ask me to modify their Blog page to list only the first paragraph and the first image found in each of their posts instead of the full post on the default Blog Post List.

The site (http://deancambray.com.au) was built with Umbraco and utilises the Blog4Umbraco package along with our Extensions package.  The solution that we came up with was to use the already available HtmlAgilityPack that's included in the Umbraco distribution and write our own Razor Script to list the Blog Posts.

While the entire script also incorporates other features such as a custom numeric pager, I wanted to focus this time on just extracting certain elements out of each post and displaying them in a customised format.

The helper: RenderSummary

First things first: make sure you have a reference to the HtmlAgilityPack library near the top of the script:

@using HtmlAgilityPack;

Our helper looks like this:

@helper RenderSummary(dynamic node) {
    var doc = new HtmlDocument();
    doc.LoadHtml(node.BodyText.ToString());
    var imgNode = doc.DocumentNode.SelectSingleNode("//img[@src]"); 
    if (imgNode != null) {
        var url = imgNode.Attributes["src"].Value;
        string alt = string.Empty;
        string title = string.Empty;
        if (imgNode.Attributes["alt"] != null) { alt = imgNode.Attributes["alt"].Value; }
        if (imgNode.Attributes["title"] != null) { title = imgNode.Attributes["title"].Value; }
<a href="@node.Url" title="Permalink to @node.Name"><img src="@url" alt="@alt" title="@title" /></a>
    }
    var para = doc.DocumentNode.SelectNodes("//p");
    if (para != null) {
        foreach (var p in para) {
            if (string.IsNullOrWhiteSpace(p.InnerText.Replace(" ", ""))) { continue; }
            
            <p>@Html.Raw(p.InnerText)</p>
            break;                                         
        }
    }
}

Our script uses a helper to render the Summary of each post that was found, and instantiates a new HtmlAgilityPack.HtmlDocument for each article by loading the article content using LoadHtml.  Once that's done, we can then use standard xpath queries to select the content that we want.  In this case, we want to find the first image that may be contained in the article and the first non-empty paragraph.

We can check that an image or paragraph exists by the return value of the SelectSingleNode or SelectNodes methods making it very easy to conditionally display the image or a placeholder if desired, for example.

Once we have our image, it's a trivial matter to extract the source url and other attributes using the Attributes collection on the returned HtmlNode and building our custom <img> tag.

Because it is very easy to insert paragraphs through TinyMCE that are empty, we want to find the first paragraph that actually has visible content in it. Otherwise our summary will look very empty indeed.  Once we have found the right paragraph, we can use the InnerText property to extract just the textual elements and ignore things like embedded images, lists and line breaks.  This results in a cleaner display and guarantees that the image (which may be found within the first paragraph) is not shown twice.

Note that you could also use the InnerHtml property instead if you wanted to include the extra format elements and other bits and pieces.

Tying it together

OUr BlogListPosts script is intended to replace the XSLT counterpart provided with Blog4Umbraco, so I've taken the basic structure of that script and tidied it up somewhat for clarity.  I've removed part of it that does the filtering and paging of the list items based on category and/or archive folder.  I wanted to focus on just the Summary rendering, so here's a condensed version of the body of the script featuring the use of the RenderSummary helper defined above:

@{
    var list = Current.DescendantsOrSelf("BlogPost").Items.OrderByDescending(n => n.GetPropertyValue("PostDate"));

    foreach (dynamic post in list)
    {
        <div class="post">
            <h2 class="entry-title"><a href="@post.Url" title="Permalink to @post.Name">@post.Name</a></h2>

            <div class="entry-date">
                <small class="published">@post.PostDate.ToString("dddd, MMM dd, yyyy")</small>
            </div>

            <div class="entry-content summary">
                @RenderSummary(post)
            </div>
            <div class="footer">
                <small class="more"><a href="@post.Url" title="Permalink to @post.Name">Read More...</a></small>
            </div>
        </div>
    }
    
}

Find this post helpful?  Why don't you drop us a line in the comments below...

Retrieving DropDownList Values in Razor or C#

Recently I needed to update the value of an Umbraco DropDownList property in code based on a value instead of the key that's automatically assigned by the Prevalue Editor.  I came across this but it discusses retrieving values for XSLT specifically.  In my scenario I needed to find a specific key.

The simplest way to do this is with XML to Linq.  In the example below I'm using the property's DataTypeDefinition to retrieve the relevant prevalue collection instead of hard-coding the Id of the DropDownList.  This means I have full flexibility in case something changes in the future:

if (p.getProperty("status") != null)
{
    var status = p.getProperty("status");

    status.Value = XElement.Parse(library.GetPreValues(status.PropertyType.DataTypeDefinition.Id).Current.OuterXml)
                           .Descendants("preValue").FirstOrDefault(pv => pv.Value == "On Offer").Attribute("id").Value;

    p.Save();
}

Note I could also have written it like this in Linq notation:

status.Value = (from pv in XElement.Parse(library.GetPreValues(status.PropertyType.DataTypeDefinition.Id).Current.OuterXml).Descendants("preValue")
                       where pv.Value == "On Offer"
                       select pv.Attribute("id").Value).FirstOrDefault();

 That's all there is to it.

Introducing the Umbraco View Counter

Over the last couple of days we've been busy creating an Umbraco package that deals with Content View Counters - it enables the web master to track the number of times content has been viewed on the site.

The Documentation and package has just been uploaded to the Umbraco Project Repository and can be downloaded from here.  This post deals with a few of the features of the package, which was built agains Umbraco 4.7 and dotNet 4.0

Introduction

TheRefactored Content Viewspackage is essentially a content views (number of times  viewed) counter.  The current functionality offered by this package includes:

  • Optional Data Type that allows for configuring view counters with various categories and the ability to instruct Macros etc. to "hide" the view Count yet still increment it.
  • Optional incrementing when displaying the view count (useful when you want to display the view count in a content listing, for example)
  • Example Razor Script and Macro.
  • Library methods to manipulate the counters and retrieve details as an XML fragment for use with XSLT.

Basic Usage.

To simply retrieve and/or increment the counter for a specific content item, call the following library method.  The category and increment parameters are optional, with default values shown initalics:

ViewCount.GetViewCount(nodeId, category: "<empty string>", increment: false);

There is no requirement to configure a DataType; supplying the node id of any valid Content-based node (Member, Document, Media, etc.) will create the Views record in the database if it doesn't exist.  However configuring and using a DataType will allow you to control the advanced features of the counter.

Out of the box

Out of the box you get a default DataType (View Count) and a sample Razor Macro that displays the current View Count of the node being displayed.  If you have set up the Document Type with the View Count DataType, the macro will check whether the View Count should be displayed or not.

Macro Parameters for Page Views:

  • Category (text) - optionally specifies the Category to record the Page Count against.
  • Increment (bool) - set to true to increment the Page Count when the macro is called.

Macro Script Contents:

@inherits umbraco.MacroEngines.DynamicNodeContext
@using umbraco.MacroEngines;
@using umbraco.NodeFactory;
@using Refactored.UmbracoViewCounter;

@if (!ViewCount.HideCounter(Model.Id, category: Parameter.Category)) {
  <span># Views:@ViewCount.GetViewCount(@Model.Id, category: @Parameter.Category, increment: @Parameter.Increment == "1").ToString("N0")</span>
} else {
  ViewCount.Increment(Model.Id, category: Parameter.Category);
}

 Setting up a Data Type

The Data Type has the following Parameters:

View Count DataType

  • Category- Specifyinga different category for multiple DataTypes allows you to differentiate between multiple View Counts in a single content item.  You can then render the content in different views and have a different View Count for each rendering.
  • Hide View Count- Allows you to control (in conjunction with the API and Razor or XSLT macros, for example) whether to hide or show the view count at a Data Type level.
  • Enable View History- Turns on recording of View Count History data including the time the view was incremented.  Also recorded is Reset command events.  This data is stored in the refViewCountHistory table and persists even if the current view count is reset.  This is off by default.
  • Disable Counter Reset- Turning this on disables the Reset action on Content configured with a View Count DataType.