RapidMiner

Contributor I nourhan_taya
Contributor I

How to know the needed time for each operator to run in rapidminer

Hi, 

Iam applying text mining in financial markets prediction and i need to extract 1600 article from their links. When i use "get pages" operator, the running time reached 18 hours and i did not get results and i do not know when it would finish. Accordingly, i would ask if the rapidminer software is running normal or not.

(Note: i am using rapidminer 7.5. My PC uses Windows 10 and its processor is core i7 7th generation and 16 gb ram. The momory is 300 gb)

6 REPLIES
Community Manager Community Manager
Community Manager

Re: How to know the needed time for each operator to run in rapidminer

hello  @nourhan_taya - so process time varies a lot depending on many factors including your machine, the size and scope of the documents, etc...  One thing that I can definitely tell you is that RapidMiner loves RAM and multiple core processors.  FWIW, I just upgraded to 64GB of RAM with my 6-core Intel Xeon E5 to keep things humming along.

 

If I were you, I'd use the Sample operator and grab a small sample of your documents first.  Benchmark the sample and then gently increase so you can get a sense if the full number docs is going to take 2 days or 2 years.  Smiley Happy

 

[copying from this thread]

 

Scott

Scott Genzer
Senior Community Manager
RapidMiner, Inc.
Contributor I nourhan_taya
Contributor I

Re: How to know the needed time for each operator to run in rapidminer

Many thanks prof. sgenzer for reply. I Will try this solution😀
Highlighted
RM Certified Expert
RM Certified Expert
Solution

Re: How to know the needed time for each operator to run in rapidminer

To add to this, if you are calling some URLs the site will slow down the call response if you are making too many in a certain period of time.  
Solutions for this include: 

  • Limiting the number of calls to batches
  • Increasing the number of individual IPs making the call
  • Adding a delay between each call

However, without knowing the site I don't know how easy or difficult this might be.  

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com
Contributor I nourhan_taya
Contributor I

Re: How to know the needed time for each operator to run in rapidminer

Hi Mr JEdward,

Many thanks for your reply. I am retrieving the links from Daily Mail archives. I have already used the third solution and maximized the delay time to1000. I will try the the other solutions but i didn't understand the the second one. Does it means increasing the number of computers doing the process?

Thanks for help

RM Certified Expert
RM Certified Expert

Re: How to know the needed time for each operator to run in rapidminer

Yes, that's correct.  Increasing the number of computers used to do the process.  However, sometimes these computers aren't using different IPs, but all the outside connections go through a single pipe.  (common in small companies).   
To manage this you can allocate out the links to each (crawler) and have them download them individually.  

 

There are also webcrawl services that you can pay for which will scale-out to get around any restrictions the web-host might have.  

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --
www.RapidMinerChina.com
Contributor I nourhan_taya
Contributor I

Re: How to know the needed time for each operator to run in rapidminer

Many thanks Mr. JEdward i really appreciate your help Smiley Happy

Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed