Options

read data from html tables on web pages

FlixportFlixport Member Posts: 33 Contributor II
Hey all,

has the operator HTML Reader been deleted from the new version or why can I not find it? 
Would be nice if someone answers me, thanks.

Tagged:

Best Answer

Answers

  • Options
    FlixportFlixport Member Posts: 33 Contributor II
    Hello @varunm1

    As I understand, the Web Table Extraction extracts data from an HTML table. But The data we are interested in is often not tabulated. Is there a solution for this?

    thanks

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @Flixport

    Not sure about this. @Telcontar120 or @mschmitz can suggest on this

    Thanks
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    There are definitely ways to get data from web pages into RapidMiner but it is not necessarily simple or straightforward depending on the page structure (that's why there's a whole expert training class just on web mining!).  It's also complicated by the fact that some of the web mining operators have not been updated in some time and so there are some "quirks" you need to be aware of.  But if you are interested in this topic you should download the free web mining extension from the marketplace and take a look at the Get Page operator to start.  This will allow you to pull in any html page and then you can try to extract the information you need with some of the other text mining operators (from the underlying html).
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    yes so just to be clear there are actually two extensions we're talking about here: the Web Mining extension and the Web Table Extraction extension.

    The Web Mining extension is a rather dated one and the advice from @Telcontar120 should help you there.

    The Web Table Extraction extension was developed out of RapidMiner Research in Dortmund; my colleague @ey wrote the extension and an accompanying Knowledge Base article about a year ago that may help.

    Scott
  • Options
    FlixportFlixport Member Posts: 33 Contributor II
    edited March 2019
    Hey all,

    thank for the answers. I think you can also as a solution to convert the HTML document into an XML document or is that not possible?

    thanks

Sign In or Register to comment.