🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

"Web crawl - lightbox content / overlay"

mobmob Member Posts: 37 Contributor I
edited June 14 in Help
I'm trying to crawl a website that shows some of its content in a lightbox (the content is shown overlaying the webpage I crawl instead of as a webpage). Is it possible for the web crawl extensions in rapidminer 5 to handle this?

If I use the crawl operator I can get the urls that trigger the overlays in a web browser session but saving the html source to a directory doesn't include the overlay and if you crawl the urls directly they only show the base webpage even though each url is different (no overlaid content shown if you visit the urls directly which seems like a bad user experience to me  )

If I use selenium running in firefox and Click and wait the overlay is displayed and the source code can be manually saved.

Is it possible to get rapidminer to handle this use case or is it due to some clever coding by the web designers a step too far for rapidminer?
If its not possible to use the native operators whats the best way to integrate selenium code into a rapidminer process to pass back the html code that includes the overlay?
Tagged:

Answers

  • mobmob Member Posts: 37 Contributor I
    In case its useful to others. I used python and the BeautifulSoup4 and selenium packages to extract the text and process the urls before saving them to text files.so they could be processed by Rapidminer 
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,155  RM Data Scientist
    That sounds cool.
    can you share the processes/scripts?

    ~Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • mobmob Member Posts: 37 Contributor I
    Possibly after I've submitted them for an assignment I'm doing
Sign In or Register to comment.