Merging textual data from different web pages

nourhan_tayanourhan_taya Member Posts: 11 Contributor I
edited November 2018 in Help
Iam using text mining in financial markets prediction.I have around 6 articles daily in form of hyperlinks in excel sheet. I want to merge the articles of each day in one document automatically and i do not know how to do this.Thanks in advance

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    If those files are in different directories, you could use the Process Documents from Files operator. This way you can tag each directory with a label so that when you build a model (i.e. Naive Bayes) you could see how well specific documents classify. 

     

    Since they are in a XLS links, you could use Get Pages operator in conjunction with a loop to extract each URL, get the page, and save it. 

  • nourhan_tayanourhan_taya Member Posts: 11 Contributor I
    Prof Thomas,
    Yes i did that already using get pages but this process extracts each page as a separate document.Is there a method to merge these documents by their date for example?
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    You could append them with the macro %{t} which will give you a timestamp. Then you'd have to build a process that converts that timestamp to say 2017-05-16, which you can then aggregate the documents on. 

  • nourhan_tayanourhan_taya Member Posts: 11 Contributor I
    Many thanks Prof.Thomas i really appreciate your help and i will try it.
Sign In or Register to comment.