🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
"Crawl Web Link-Page pairs are incorrect"
When I use Crawl Web operator and check the "add pages as attribute" parameter then the result will consist of Link-Page pairs (the number of examples depend on "max pages" parameter). But if I check the HTML content of the Link attribute's value (Url from Link attribute (Url 1)) then I see that the real HTML content (of the Link value) is different from the Page attribute's content. How can it be?
If I don't store the Page in Crawl Web but use a Get Pages operator that has the "link attribute" parameter set to Link attribute (from the Crawl Web) and set the "page attribute" parameter to Page, I see that the Link and Page pairs are different too (as in Crawl Web). And when I check the output of Get Pages, I can see an URL attribute too next to the Link and Page attributes (and some more attributes). And the URL attribute contains the real Url (Url 2) belongs to the Page attribute's value. So the HTML content of the URL attribute's value (Url 2) is the same as the Page attribute content (Page's value). But different from Link attribute's value (Url 1).
But I don't understand why the Link attribute and URL attribute are different. And why the Page attribute's values don't belong to Link attribute's values.