How Do I Find The String Index Of A Tag (an Element) Without Counting Expanded Entities?

May 30, 2024 Post a Comment

I've got a large piece of text which I want to be able to select, storing the selected part by its startindex and endindex. (For example, selecting or in word would give me startin

Solution 1:

If you really want to use your code and just modifying it a little you could replace all special characters with the visible equivalent, while keeping the html tags... Change your declaration of startIndex to this:

var startIndex = $('#textBlock').html().replace(/&amp;/g, "&").replace(/&quot;/g, "\"").indexOf('<strike>');

you can append the replaces() functions with other special characters you want to count as normal characters not the HTML version of them. In my example i replaced the & and the " characters.

There are more optimalisations possible in your code this is a simple way to fix your problem.

Hope this helps a bit, see the forked fiddle here http://jsfiddle.net/vQNyv/

Solution 2:

The Problem

Using html() returns:

This is a cool test &amp;<strike>stuff like</strike> that

Using text(), however, would return:

This is a cool test & stuff like that

So, html() is necessary in order to see the string, <strike>, but then of course all special entities are escaped, which they should be. There are ways to hack around this problem, but imagine what would happen if, say, the text was describing HTML itself:

Use the <strike></strike> tags to strike out text.

In this case, you want the interpretation,

Use the &lt;strike&gt;&lt;/strike&gt; tag to strike out text.

That's why the only correct way to approach this would be to iterate through DOM nodes.

The jQuery/DOM Solution

Here's a jsFiddle of my solution, and here's the code:

jQuery.fn.indexOfTag = function(tag) {
    var nodes = this[0].childNodes;
    var chars = 0;
    for (var i = 0; nodes && i < nodes.length; i++) {
        var node = nodes[i];
        var type = node.nodeType;
        if (type == 3 || type == 4 || type == 5) {
            // alert('advancing ' + node.nodeValue.length + ' chars');
            chars += node.nodeValue.length;
        } elseif (type == 1) {
            if (node.tagName == tag.toUpperCase()) {
                // alert('found <' + node.tagName + '> at ' + chars + ', returning');
                return chars;
            } else {
                // alert('found <' + node.tagName + '>, recursing');
                var subIndexOfTag = $(node).indexOfTag(tag);
                if (subIndexOfTag == -1) {
                    // alert('did not find <' + tag.toUpperCase() + '> in <' + node.tagName + '>');
                    chars += $(node).text().length;
                } else {
                    // alert('found <' + tag.toUpperCase() + '> in <' + node.tagName + '>');
                    chars += subIndexOfTag;
                    return chars;
                }
            }
        }
    }
    return-1;
}

Uncomment the alert()s to gain insight into what's going on. Here's a reference on the nodeTypes.

The jQuery/DOM Solution counting outerHTML

Based on your comments, I think you're saying you do want to count HTML tags (character-wise), but just not the HTML entities. Here's a new jsFiddle of the function itself, and here's a new jsFiddle of it applied to your problem.

Html5 stackoverflow Examples