RapidMiner

Contributor I nourhan_taya
Contributor I

Merging textual data from different web pages

Iam using text mining in financial markets prediction.I have around 6 articles daily in form of hyperlinks in excel sheet. I want to merge the articles of each day in one document automatically and i do not know how to do this.Thanks in advance
4 REPLIES
RM Certified Expert
RM Certified Expert

Re: Merging textual data from different web pages

If those files are in different directories, you could use the Process Documents from Files operator. This way you can tag each directory with a label so that when you build a model (i.e. Naive Bayes) you could see how well specific documents classify. 

 

Since they are in a XLS links, you could use Get Pages operator in conjunction with a loop to extract each URL, get the page, and save it. 

Contributor I nourhan_taya
Contributor I

Re: Merging textual data from different web pages

Prof Thomas,
Yes i did that already using get pages but this process extracts each page as a separate document.Is there a method to merge these documents by their date for example?
RM Certified Expert
RM Certified Expert

Re: Merging textual data from different web pages

You could append them with the macro %{t} which will give you a timestamp. Then you'd have to build a process that converts that timestamp to say 2017-05-16, which you can then aggregate the documents on. 

Contributor I nourhan_taya
Contributor I

Re: Merging textual data from different web pages

Many thanks Prof.Thomas i really appreciate your help and i will try it.
Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed