ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
VOTING MATTERS!
IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.
NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Update Database is unexpectedly slow
BalazsBarany
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
I had a situation today where a big table (nicely set up with indexes) has to be updated with Update Database, with the integer ID field as the key (attribute filter type: single, attribute: id). The ID is displayed by RapidMiner as an integer, too.
It was much slower than it should have been: when testing an update, it finished in milliseconds, but in RapidMiner it took seconds instead.
Having already seen this kind of unexpected slowness with other software, I checked the access statistics in the database. The table showed many (and growing) full table scans, but a constant number of index scans. So the index of the primary key field was not being used.
Creating a second index on the id field with the double precision (floating point number) type helped, the throughput of Update Database improved thousand fold, the index was being used according to the table statistics, and the number of full table scans stayed constant. This is of course not a nice workaround, but for me it helped.
Update Database uses "prepared statements". In these, all columns get an assigned data type. Depending on the compatibility between types in the database, an index can or can't be used if the data type doesn't match. Here, the integer index was not used with the floating point data type.
Hopefully this will be fixed in one of the next releases. Until then, you can work around the problem by creating a second index with the appropriate data type in your database to speed up Update Database.
This is on PostgreSQL 10, your database might work differently.
It was much slower than it should have been: when testing an update, it finished in milliseconds, but in RapidMiner it took seconds instead.
Having already seen this kind of unexpected slowness with other software, I checked the access statistics in the database. The table showed many (and growing) full table scans, but a constant number of index scans. So the index of the primary key field was not being used.
Creating a second index on the id field with the double precision (floating point number) type helped, the throughput of Update Database improved thousand fold, the index was being used according to the table statistics, and the number of full table scans stayed constant. This is of course not a nice workaround, but for me it helped.
Update Database uses "prepared statements". In these, all columns get an assigned data type. Depending on the compatibility between types in the database, an index can or can't be used if the data type doesn't match. Here, the integer index was not used with the floating point data type.
Hopefully this will be fixed in one of the next releases. Until then, you can work around the problem by creating a second index with the appropriate data type in your database to speed up Update Database.
This is on PostgreSQL 10, your database might work differently.
Tagged:
1