"Stream Database operator: metadata ?"

camielcoenen · March 2011

Hi,

I am working with a large dataset (approx. 250,000 rows and 300+ columns) which is loaded in a MySQL database table and would like to use the Stream Database operator to use this dataset in a proces. However, unlike the Read Database operator, the Stream Database operator doesn't output the metadata information, which makes it impossible to use other operators like Select Attributes in the steps following Stream Database. I am using RapidMiner 5.1 .

Matthias · March 2011

Hi,

I think all the Import Data Operators couldn't prepare the meta data informations directly.Because only when you start the process RM can read the meta data informations.
The easiest way is to save the dataset with the store operator at the repository. And then you have an fast acces to the dataset with the Retrieve operator. And alway the meta data informations.

Greetings

Matthias

camielcoenen · March 2011

Matthias wrote:

Hi,

I think all the Import Data Operators couldn't prepare the meta data informations directly.Because only when you start the process RM can read the meta data informations.
The easiest way is to save the dataset with the store operator at the repository. And then you have an fast acces to the dataset with the Retrieve operator. And alway the meta data informations.

Greetings

Matthias

Well, the "Read Database" operator does prepare the metadata information, even when a project has not been started or run yet. The "Stream Database" operator does not prepare the metadata information. So, why this difference ? Yes, I can use the Store operator, but it is basically the same as the "Read Database" operator. The "Stream Database" has the caching features I need.

Greetings,

Camiel

land · March 2011

Hi,
let me formulate it in this way: Do you use the Community Edition?

Greetings,
Sebastian

camielcoenen · March 2011

Sebastian Land wrote:

Hi,
let me formulate it in this way: Do you use the Community Edition?

Greetings,
Sebastian

Yes, I do use the Community Version. Does it make a difference in case of the Stream Database operator ?

Thanks,

Camiel

land · March 2011

Hi,
currently not, but as a community edition user you simply have to wait until someone has idle time to fix it. As an enterprise customer your wishes would have a "little" bit more importance to us. Not to mention that we could hire more guys helping us coding things if you would become enterprise customer.
Anyway I think that handling of large amounts of data will become an enterprise feature sooner or later. So I won't bet that the improvements of Stream Database will make it into the community edition.

Greetings,
Sebastian

camielcoenen · March 2011

Thanks,

Is it a JDBC connection issue that needs to be fixed ? The "Read Database", on the other hand, is working fine.

Nevertheless, I would like to know how to handle a large dataset in Rapidminer Community Edition, what kind of operators can be used to make the dataset more manageable? Are there tutorials/samples on how to do this ?

Greetings,

Camiel

land · March 2011

Hi,
aggregate it before loading it. Split the data set before loading it. Try to cluster things before by using samples where possible, apply in batches...

Well, everything depends on your problem. But the basic idea is to use only samples or batches where possible or to compress the data even before loading.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Stream Database operator: metadata ?"

Answers