RAPIDMINER 9.7 BETA ANNOUNCEMENT
The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!
Regexpression for html content extraction
Hi guys, I have an HTML page and want to extract after a specific <h2> tag all the content followed by the <p> tag.
I am using the Extract Information component and the Regular Expression as query/type. I have tried to extract the
content of the <h2> tag (regex: <h2>(.+?)</h2>) which gives me the right result Specific 1 text (HTML snipped is listed below).
But when I am trying to extract the <p>blabla...</p> content after this specific <h2> tag using
regex: <h2>Specific 1</h2><p>(.+?)</p> that doesn't work.
Can someonte tell me why and what the right regex is to get the <p> content?