RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Regexpression for html content extraction
Hi guys, I have an HTML page and want to extract after a specific <h2> tag all the content followed by the <p> tag.
I am using the Extract Information component and the Regular Expression as query/type. I have tried to extract the
content of the <h2> tag (regex: <h2>(.+?)</h2>) which gives me the right result Specific 1 text (HTML snipped is listed below).
But when I am trying to extract the <p>blabla...</p> content after this specific <h2> tag using
regex: <h2>Specific 1</h2><p>(.+?)</p> that doesn't work.
Can someonte tell me why and what the right regex is to get the <p> content?