A C# class library for parsing HTML into a searchable, filterable DOM tree
var html = "<html><head></head><body><div>Hello World!</div></body></html>";
var dom = Html.Parse(html,
new ParserOptions()
{
ReplaceNbsp = " ",
TrimText = TrimType.OneTrailingSpace
});
var elemText = dom.Elements.Where(el => el.Text.Contains("Hello")).FirstOrDefault();
// select all child nodes from all DIV tags
var nodes = dom.Elements.Where(el => el.TagName == "div").SelectMany((el, results) => el.Children());
foreach(var elem in dom.Elements.Where(el => el.ClassNames.Contains("button")))
{
var button = elem.AllChildren.First(el => el.TagName == "a");
elem.ReplaceWith(button)
}
In the example above, we are replacing all elements that contain the ClassName
"button" with an <a>
tag found within each button childNode hierarchy using the AllChildren
method.
Property | Default | Description |
---|---|---|
ReplaceNbsp | |
Replaces HTML encoded spaces with the provided string |
TrimText | TrimType.None |
Trims spaces from #text nodes. NOTE: Parser automatically removes any duplicate spaces from #text nodes. |
An enum
used as ParserOptions.TrimText
Label | Value | Description |
---|---|---|
None | 0 | Parser does not trim any spaces from the beginning or end of all #text nodes |
Right | 1 | Parser trims all spaces from the end of all #text nodes |
Left | 2 | Parser trims all spaces from the beginning of all #text nodes |
Both | 3 | Parser trims all spaces from the beginning and end of all #text nodes |
OneTrailingSpace | 4 | Parser trims all spaces from the beginning and end of all #text nodes, and if there is a space at the end of the #text node, it will not be removed |