Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Querying TERADATA takes too long

RapidMinerUser12RapidMinerUser12 Member Posts: 11 Learner I
Hi all,

I have a very big TERADATA database that has more than 35 000 000 rows. 
When I query the data for 1 000 000 rows, the Read Database operator executes in 12s, and when I tried to select all rows, the process ran for >40 minutes and I had to stop it.

My question is, is this waiting time normal? If not, how can I shorten it and import all of the data from TERADATA in RapidMiner? 
I want to do the ETL in RapidMiner.

Thank you in advance.

Best Answer

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    This is probably a memory issue. RapidMiner works by reading complete data sets into the main memory on your computer. If you overwhelm the existing memory with data (and this sounds like you're doing that), everything gets slow, e. g. because of swapping.

    It's better to process the 35 M rows in batches, like you did with the 1 Mio rows. You would for example use one of the Loop operators.

    With this kind of big data, always try to do as much as possible inside the database. It is better at filtering, joining and sorting than a separate in-memory process can ever be. 

    You don't even have to learn SQL for this if you use the In-Database Processing extension.

    Regards,

    Balázs
  • RapidMinerUser12RapidMinerUser12 Member Posts: 11 Learner I
    Hi, 

    Thanks for your swift answer.

    We have 256 GB of memory on our machines. The In-Database Processing doesn't work with Teradata.
    Our requirements are so that we do everything from ETL in RapidMiner, not with queries.

    Can you explain further how we can process the data in batches? We have to have some pointer-like indicator that tells the database where to start the next batch of data.

    Thank you in advance.
Sign In or Register to comment.