RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
Update Database is unexpectedly slow
It was much slower than it should have been: when testing an update, it finished in milliseconds, but in RapidMiner it took seconds instead.
Having already seen this kind of unexpected slowness with other software, I checked the access statistics in the database. The table showed many (and growing) full table scans, but a constant number of index scans. So the index of the primary key field was not being used.
Creating a second index on the id field with the double precision (floating point number) type helped, the throughput of Update Database improved thousand fold, the index was being used according to the table statistics, and the number of full table scans stayed constant. This is of course not a nice workaround, but for me it helped.
Update Database uses "prepared statements". In these, all columns get an assigned data type. Depending on the compatibility between types in the database, an index can or can't be used if the data type doesn't match. Here, the integer index was not used with the floating point data type.
Hopefully this will be fixed in one of the next releases. Until then, you can work around the problem by creating a second index with the appropriate data type in your database to speed up Update Database.
This is on PostgreSQL 10, your database might work differently.