I have been a big of Rapidminer and I try to explore more and more into this tool. Today, I wanted to scrape the data from the review site.
1. Download the pages from the site
2. Crawl through each page to extract the data
I want my process to not only loop through multiple files but also through the file itself for multiple reviews. 1 file has approximately 8 reviews and I want to loop through this file as well as 7 other files so in all 64 reviews. I am using "Process document from file" --> "Extract Information"
Settings for - "Process document from file"
File from a list of directories, file pattern - *, use file extension, add metadata information
Settings for "Extract Information"
Query type - Xpath, Attribute type - nominal, Xpath queries as below, namespace - nothing, Ignore CDATA and Assume HTML - checked
But when I am using that in the tool, I am not able to configure that due to some reason and its failing. Can anyone please advice me here? ???
Here is my xpath in the extract information operator: