RapidMiner

RapidMiner

Get Pages

Regular Contributor

Get Pages

Is there a way to restrict get pages (or some other process) to just retrieving the first page from a website? I'm trying to compare the content of different pages from the same website. When get pages loads each of them, it drills through all of the links which skews the word count.
1 REPLY
Super Contributor

Re: Get Pages

Hi,

are you sure that you are using Get Pages and not Crawl Web? Get Pages only loads the exact link that you provide in the link attribute in the input data.

Best regards,
Marius