Performance Issues

agurruchaga · February 2018

Good morning,

I am having serious performance issues when it comes to handle large databases. I configured an oracle connection and when I try to filter a table (about 15 Million rows) by a simple attribute, I get "Retrieve [tablename]..." message and it just does nothing. When I access the table with a standard sql editor, I get the results almost instantly. I don't know if this common when handling this large volumes of data or it should just work ok.

Thanks in advance.

Edin_Klapic · February 2018

Hi @agurruchaga,

From your problem description where it states that the process outputs "Retrieve [tablename]" I assume you are using the Retrieve Operator and apply the Filter Examples Operator afterwards. The Retrieve Operator always fetches the full dataset and the filters are applied afterwards.

Thus, I suggest to use the Read Database Operator and write the SQL query with your desired filter option in there. This way the filtering is executed directly in the database and only the relevant Examples are loaded in RapidMiner.

Best,

Edin

BalazsBarany · February 2018

Hi!

No need to create a temporary table, you can use SQL IN.

You'd aggregate the IDs from your Excel table using the Aggregate operator and the concatenation function. Then you would use Replace to change the separator to a comma.

For example, you get a value like "13,25,45,133". You then use Extract Macro to convert this to a macro named "filter" that you can use in the SQL query.

The SQL query would look like this:

SELECT ...

FROM bigtable

WHERE key IN (%{filter})

Regards,

Balázs

Telcontar120 · February 2018

I am interested in any of the developers' comments on this issue. Ever since the "Stream Database" operator was deprecated, I have also had problems with large datasets. Are there any suggested remedies or plans to bring back "Stream Database"?

agurruchaga · February 2018

Hello Edin,

This worked like a charm! Thank you!

Greetings.

agurruchaga · February 2018

Hello again,

I was thinking about a similar but not same situation : Imagine I import a 100 row Excel sheet to Rapidminer that I wan't to join with a Table(15.000.000 rows) in my DB. If i wan't to perform a Join, the first step is to retrieve the database, so I would have the same problem I stated before, but can't use sql in this case since the 100 row table is not part of the DB. Is there a similar operator that allows to join without having to retrieve the entire DB?

Thanks!

SGolbert · February 2018

Hi,

the solution to the second problem is actually very similar to the first one. You can use Read Excel and Write Database to import your excel sheet into the DB. Then you can simply join with a sql command.

Best,

Sebastian

agurruchaga · February 2018

Hi Sebastian,

I thought about that solution but the problem is that i do not have privileges to write in that database.

I will have to think about another solution.

Thanks!

SGolbert · February 2018

You might be able to create a temporary table.

http://devzone.advantagedatabase.com/dz/webhelp/advantage9.0/adssql/using_temporary_tables_in_sql_statements.htm

To be honest, I'm not sure how to implement this solution (maybe @BalazsBarany has a -better- idea). My guess is that you could convert the excel table to a CREATE TABLE statement, then use the Execute SQL operator to create the temporary table, make the join and finally retrieve the data.

If you make it work, I'll be very interested in seeing the solution

agurruchaga · March 2018

Sorry for the late reply, I think this solution is super! Fast and easy to use, thank you very much guys!

Edit : It seems my oracle db has a 1.000 elements limitation in a list, so this might not be a solution in some cases.

Greetings

BalazsBarany · March 2018

Hi,

you can use Loop Batches or something similar in that case ;-) Then you would make as many queries as necessary and union or append the results.

Regards,

Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Performance Issues

Best Answers

Answers