The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Unable to read large table from oracle database - is there a table size limit?

syedghouri68syedghouri68 Member Posts: 6 Contributor I
edited December 2018 in Help

Hello

I was able to connect to oracle database and view list of tables in rapidminer studio. However when I am trying to view the content of one table which has around 5M rows (by double clicking on table name) nothing happens. I have also tried using Read Database operator and it takes ages, nothing happens. Even after two hours the Read Database operator is still running.

 

I also have small tables in oracle database which I was able to read and view one without any trouble.

 

I even have that same table with 5M rows in a csv format stored locally and rapidminer studio was able to read it successfully, although it took around 20 seconds to open.

For those who guessed it right - yes I have studio license.

 

Can someone please tell me is this a bug or is there any better way to read from oracle?

 

Kind regards,

Syed

 

Tagged:

Answers

  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    What column formats are the rows of this table in when stored in Oracle?  
    If you are selecting all columns from the table & it has some BLOB fields, then it's going to take a while.  Try running it as a query with 
    SELECT <id column of table> FROM tablename; 

    and seeing how quickly it brings back 1 row.  

     

    Next thing to look at are the Indexes on that Oracle table, are they as optimized as they could be?  
    Also check the packet size on the database is set to the maximum and that the driver matches that packet size. That should reduce the number of total packets sent between client & server when fetching larger result sets.  

    Other options to try are:

    • Stream Database operator (not used in a while so would need someone else to jump in for configuration tips)
    • Batching your data with SQL LIMIT statements & macros.  (this would also help to highlight if a certain record has an issue causing a lockup.  It isn't going to be as efficient as the above, but it is going to allow you to check regularly to see the progress).  

     

  • Options
    KostasBonikosKostasBonikos Member Posts: 25 Maven

    Hi Syed,

     

    A colleague and I tried to replicate the issue with an instance of Oracle DB and RapidMiner Studio and the load took under a minute to read 5m rows and 10 attributes (columns), however, it took up more than 25GB of RAM; I doubt that is the issue if you can load the CSV.

    - Is there any restriction or timeout on your network or VPN? We hear of data access issues a lot from people behind VPN.

    - How strict is your corporate firewall? is it a problem accessing the DB from other tools like DB front ends and extracting the same table?

    - Are there any restrictions in the amount of data you can query out of the Database?

    - How many columns does your table have?

     

    I look forward to hearing from you.

     

    Kind regards,

    Kostas

Sign In or Register to comment.