here is my project, i hope someone can help me  ;D ;D ;D

I need to go to numerous webpages (like cabelas.com, gandermountain.com etc) and find pages with products similar to ours.  To start, i would like to price out all shirts from brand "X" that these 2 companies sell.  so, i need to configure Rapid miner to go to specific URLs, like cabelas.com/shirts/X, that has all of their X products, and record the item name, and the retail price.  then, i would like to export that info into something readable, preferably Excel. 

Step 2 would be to START with a list of our products, and some how, through keywords or some kind of configurable attribute, import it into Rapid miner, then have it scrape the same info, but when it imports it, i would like it to do so the products match up. 

so, as the final product, i would like a spreadsheet with the product, price, then company 1's price, company 2, etc.

does this make sense?  if i have to go about it a different way to get the same result, that's fine, but i need a result that easily allows us to compare prices. 

i have been through the video tutorials, and they have gotten me a little comfortable with it, but they were hard because i had no clue what they were importing, or what the data means, so i didnt get the info i needed.

thank you all, i look forward to chatting with you this summer about rapid miner!!



    Hi Drew,

    as far as I got it, this makes perfect sense. I assume the reason why nobody answered so far might be that it is hard to really tell you more than this without getting deeply involved in setting up the necessary processes - which might take some time.

    May I ask what you have achieved so far? Do you have already a crawling process collecting the web pages from the sites you mentioned? Do you already have processes for extracting the necessary information from those pages? I assume you will need at least the product name, the description, and the price. The name and description might become useful if you have to calculate a similarity in step 2. Do the products have a unique product id or something similar so the matching will be definitely correct for at least a certain fraction of the products?

    Or do you even have started already with the second stage where you try to match your products with the others?

    It is probably easier to help you with very specific technical questions, otherwise this will become more of a consultancy - which I usually charge people for  ;D

