Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Web Crawl wikipedia"
Hi All
I hope you can help me
I have a page on Wikipedia that i would like to crawl https://en.wikipedia.org/wiki/2012_in_film
I wish to extract certain details that determined if each of the films was a success or a bust i.e. text mine each of the wiki pages for each of the films
Based on this information, i would like to predict if the movies due to be released in 2013 are going to be a success
My problem is I'm unable to crawl the top website and the pages it links to (i.e. each movie) as it returns no records or files
I can web crawl wikipedia.org without any issues
Does anyone know what the problem is?
Thanks for your time
I hope you can help me
I have a page on Wikipedia that i would like to crawl https://en.wikipedia.org/wiki/2012_in_film
I wish to extract certain details that determined if each of the films was a success or a bust i.e. text mine each of the wiki pages for each of the films
Based on this information, i would like to predict if the movies due to be released in 2013 are going to be a success
My problem is I'm unable to crawl the top website and the pages it links to (i.e. each movie) as it returns no records or files
I can web crawl wikipedia.org without any issues
Does anyone know what the problem is?
Thanks for your time
Tagged:
0
Answers
as (almost) always we can't help you if we don't know how you configured your operators. Please post your process xml as described in my signature.
Best regards,
Marius