Performance Issues

agurruchagaagurruchaga Member Posts: 11 Contributor I
edited December 2018 in Help

Good morning,

 

I am having serious performance issues when it comes to handle large databases. I configured an oracle connection and when I try to filter a table (about 15 Million rows) by a simple attribute, I get "Retrieve [tablename]..." message and it just does nothing. When I access the table with a standard sql editor, I get the results almost instantly.  I don't know if this common when handling this large volumes of data or it should just work ok.

 

Thanks in advance.

Tagged:

Best Answers

  • Edin_KlapicEdin_Klapic Posts: 259  RM Data Scientist
    Solution Accepted

    Hi @agurruchaga,

     

    From your problem description where it states that the process outputs "Retrieve [tablename]" I assume you are using the Retrieve Operator and apply the Filter Examples Operator afterwards. The Retrieve Operator always fetches the full dataset and the filters are applied afterwards.

    Thus, I suggest to use the Read Database Operator and write the SQL query with your desired filter option in there. This way the filtering is executed directly in the database and only the relevant Examples are loaded in RapidMiner.

     

    Best,

    Edin

  • BalazsBaranyBalazsBarany Posts: 272   Unicorn
    Solution Accepted

    Hi!

     

    No need to create a temporary table, you can use SQL IN. 

    You'd aggregate the IDs from your Excel table using the Aggregate operator and the concatenation function. Then you would use Replace to change the separator to a comma. 

    For example, you get a value like "13,25,45,133". You then use Extract Macro to convert this to a macro named "filter" that you can use in the SQL query.

     

    The SQL query would look like this:

    SELECT ...

    FROM bigtable

    WHERE key IN (%{filter})

     

    Regards,

     

    Balázs

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,172   Unicorn

    I am interested in any of the developers' comments on this issue.  Ever since the "Stream Database" operator was deprecated, I have also had problems with large datasets.  Are there any suggested remedies or plans to bring back "Stream Database"?

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    agurruchaga
  • agurruchagaagurruchaga Member Posts: 11 Contributor I

    Hello Edin,

     

    This worked like a charm! Thank you!

     

    Greetings.

    sgenzerEdin_Klapic
  • agurruchagaagurruchaga Member Posts: 11 Contributor I

    Hello again,

     

    I was thinking about a similar but not same situation : Imagine I import a 100 row Excel sheet to Rapidminer that I wan't to join with a Table(15.000.000 rows) in my DB. If i wan't to perform a Join, the first step is to retrieve the database, so I would have the same problem I stated before, but can't use sql in this case since the 100 row table is not part of the DB. Is there a similar operator that allows to join without having to retrieve the entire DB?

     

    Thanks! 

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 323   Unicorn

    Hi,

     

    the solution to the second problem is actually very similar to the first one. You can use Read Excel and Write Database to import your excel sheet into the DB. Then you can simply join with a sql command.

     

    Best,

    Sebastian

    sgenzeragurruchaga
  • agurruchagaagurruchaga Member Posts: 11 Contributor I

    Hi Sebastian,

     

    I thought about that solution but the problem is that i do not have privileges to write in that database.

     

    I will have to think about another solution.

     

    Thanks!

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 323   Unicorn

    You might be able to create a temporary table.

     

    http://devzone.advantagedatabase.com/dz/webhelp/advantage9.0/adssql/using_temporary_tables_in_sql_statements.htm

     

    To be honest, I'm not sure how to implement this solution (maybe @BalazsBarany has a -better- idea). My guess is that you could convert the excel table to a CREATE TABLE statement, then use the Execute SQL operator to create the temporary table, make the join and finally retrieve the data. 

     

    If you make it work, I'll be very interested in seeing the solution :D

  • agurruchagaagurruchaga Member Posts: 11 Contributor I

    Sorry for the late reply, I think this solution is super! Fast and easy to use, thank you very much guys!

     

    Edit : It seems my oracle db has a 1.000 elements limitation in a list, so this might not be a solution in some cases.

     

    Greetings

    sgenzer
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 272   Unicorn

    Hi,

     

    you can use Loop Batches or something similar in that case ;-) Then you would make as many queries as necessary and union or append the results.

     

    Regards,

    Balázs

    sgenzer
Sign In or Register to comment.