Skip to content Skip to sidebar Skip to footer

Jsoup - How To Extract Every Elements

I'm trying to get font information by using Jsoup. For an example: Below is my code: result = rtfToHtml(new StringReader(streamToString((InputStream)contents.getTransferData(dfRTF

Solution 1:

If you only need to extract the text from a document, plus any <b> or <i> tags (as per your example), consider using the Whitelist class (see docs):

String html = "<body><pclass='default'><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'><b>Hello World</b></span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> , Testing </span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'><i><b>Font </b></i></span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'> Style </span><spanstyle='color: #000000; font-size: 21pt; font-family: MyriadPro-Bold;'><i>Check</i></span><spanstyle='color: #000000; font-size: 10pt; font-family: MyriadPro-Bold;'></span></p></body>";

Whitelist wl = Whitelist.simpleText();
wl.addTags("b", "i"); // add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);  

Which will output (as per your example):

11-0719:04:45.738: I/System.out(318): <b>Hello World</b>   , Testing11-0719:04:45.738: I/System.out(318): <i><b>Font </b></i>Style11-0719:04:45.738: I/System.out(318): <i>Check</i>

Update:

ArrayList<String> elements = new ArrayList<String>();

Elements e = doc.select("span");

for (int i = 0; i < e.size(); i++) {
    elements.add(e.get(i).html());
}

Solution 2:

You need to change your selector to the <p> tag like so: Element all = doc.select("p").first();

Then you need to get all the children of that element.

StringmyString="";
for(Element item : all.children()) {
    myString += item.text();
}

I am assuming you want the text inside the tags, and not the tags themselves.

Alternatively you could do.

Elements all = doc.select("b");
all.addAll(doc.select("i"));
all.addAll(doc.select("span"));
String myString = all.text();

Post a Comment for "Jsoup - How To Extract Every Elements"