How to set HtmlAgilityPack Timeout

HtmlAgilityPack is a great HTML parser library that I often use for scraping. It does web requests on your behalf via the HtmlWeb().Load methods, but doesn’t expose the HttpWebRequest.Timeout property. I see a lot of people recommending using HttpWebRequest or WebClient to get the request and then HtmlAgilityPack to query the DOM, but there’s an easier way.

You can view the source for HtmlWeb here and see that they expose a PreRequest delegate:

public delegate bool PreRequestHandler(HttpWebRequest request);

And they call that delegate right before making the GetResponse call:

if (PreRequest != null)
{
    // allow our user to change the request at will
    if (!PreRequest(req))
    {
        return HttpStatusCode.ResetContent;
    }              
}

HttpWebResponse resp;

try
{
    resp = req.GetResponse() as HttpWebResponse;
}

So all you have to do is assign a delegate to PreRequest and set your timeout within that delegate:

var web = new HtmlWeb();
web.PreRequest = delegate(HttpWebRequest webRequest)
{
     webRequest.Timeout = 4;
     return true;
};
var doc = web.Load("http://www.msn.com/");

Yep, it’s that easy.

Jon