Gettig Htmlelement Based On Htmlagilitypack.htmlnode
Solution 1:
In fact there seems to be no direct possibility to change the document directly in the webbroser control. But you can extract the html from it, mnipulate it and write it back again like this:
HtmlAgilityPack.HtmlDocumentdoc=newHtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webBrowser1.DocumentText);
foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.ChildNodes) {
node.Attributes.Add("TEST", "TEST");
}
StringBuildersb=newStringBuilder();
using (StringWritersw=newStringWriter(sb)) {
doc.Save(sw);
webBrowser1.DocumentText = sb.ToString();
}
For direct manipulation you can maybe use the unmanaged pointer webBrowser1.Document.DomDocument
to the document, but this is outside of my knowledge.
Solution 2:
HtmlAgilityPack definitely can't provide access to nodes in live HTML directly. Since you said there is no distinct style/class/id on the element you have to walk through the nodes manually and find matches.
Assuming HTML is reasonably valid (so both browser and HtmlAgilityPack perform normalization similarly) you can walk pairs of elements starting from the root of both trees and selecting the same child node.
Basically you can build "position-based" XPath to node in one tree and select it in another tree. Xpath would look something like (depending you want to pay attention to just positions or position and node name):
"/*[1]/*[4]/*[2]/*[7]""/body/div[2]/span[1]/p[3]"
Steps:
- In using
HtmlNode
you've found collect all parent nodes up to the root. - Get root of element of HTML in browser
- for each level of children find position of corresponding child on HtmlNodes collection on step 1 in its parent and than find live HtmlElement among children of current live node.
- Move to newly found child and go back to 3 till found node you are looking for.
Solution 3:
the XPath
attribute of the HtmlAgilityPack.HtmlNode
shows the nodes on the path from root to the node. For example \div[1]\div[2]\table[0]
. You can traverse this path in the live document to find the corresponding live element. However this path may not be precise as HtmlAgilityPack removes some tags like <form>
then before using this solution add the omitted tags back using
HtmlNode.ElementsFlags.Remove("form");
struct DocNode
{
publicstring Name;
publicint Pos;
}
///// structure to hold the name and position of each node in the path
The following method finds the live element according to the XPath
staticpublic HtmlElement GetLiveElement(HtmlNode node, HtmlDocument doc)
{
var pattern = @"/(.*?)\[(.*?)\]"; // like div[1]// Parse the XPath to extract the nodes on the pathvar matches = Regex.Matches(node.XPath, pattern);
List<DocNode> PathToNode = new List<DocNode>();
foreach (Match m in matches) // Make a path of nodes
{
DocNode n = new DocNode();
n.Name = n.Name = m.Groups[1].Value;
n.Pos = Convert.ToInt32(m.Groups[2].Value)-1;
PathToNode.Add(n); // add the node to path
}
HtmlElement elem = null; //Traverse to the element using the pathif (PathToNode.Count > 0)
{
elem = doc.Body; //begin from the bodyforeach (DocNode n in PathToNode)
{
//Find the corresponding child by its name and position
elem = GetChild(elem, n);
}
}
return elem;
}
the code for GetChild Method used above
publicstatic HtmlElement GetChild(HtmlElement el, DocNode node)
{
// Find corresponding child of the elemnt // based on the name and position of the nodeint childPos = 0;
foreach (HtmlElement child in el.Children)
{
if (child.TagName.Equals(node.Name,
StringComparison.OrdinalIgnoreCase))
{
if (childPos == node.Pos)
{
return child;
}
childPos++;
}
}
returnnull;
}
Post a Comment for "Gettig Htmlelement Based On Htmlagilitypack.htmlnode"