integration of non-SQL database mongoDB client API

turicum · January 2010

Hi everybody,

I'd like to use RM with a non-SQL database, namely MongoDB (www.mongodb.org). A Java db client API is already available: would it be difficult to integrate it into RM?

Thanks!
Alex

land · January 2010

Hi Alex,
I think this depends

Probably it would not cost us more than one or two days to integrate it (without looking deeper in that matter), but if you are not familiar with RapidMiner's operator constructions, especially the InputReader, it might take some time. Unfortunately there's no special tutorial available for implementing new Input Operators and in fact, unfortunately I didn't manage to finish the "Extending RapidMiner" tutorial anyway...

Greetings,
Sebastian

turicum · January 2010

Hi Sebastian,

thank you for your reply!

Is there any .java file I can look at as an example and/or that I can extend to integrate mongodb's API?

thanks!
Alex

land · January 2010

Hi Alex,
the best would be making an extension for that. This way you are flexible with adding this operators to various RapidMiner 5 installations. You could take a look at the source code of any of our extensions for getting an impression, how this works and what is needed.

Your class should extend AbstractExampleSource, which has only 2 methods, you should implement: createExampleSet and getGeneratedMetaData. Last is optional but without this, all the nice meta data transformations available in RapidMiner 5 won't work.

You could take a look at any subclasses of AbstractExampleSource for getting an impression how things work.

Greetings,
Sebastian

turicum · January 2010

Hi Sebastian,

thank you for the suggestion, I'll check out those classes!

Cheers,
Alex

TobiasMalbrecht · January 2010

Hi Alex,

in fact, it might be even easier to extend [tt]AbstractDataReader[/tt] (which is a subclass of [tt]AbstractExampleSource[/tt]). There you only need to implement one method returning a [tt]DataSet[/tt] which is like a kind of iterator over the data and resembles a [tt]ResultSet[/tt]. Additionally, while constructing the DataSet you also might want to call the method setColumnNames(String[] columnNames) to name the columns correctly depending on your data. Using this mechanism, you will not need to care about meta data generation or the data generation itself - only the extraction of values from the data source needs to be implemented. Have a look at the [tt]CSVDataReader[/tt] or the [tt]DatabaseDataReader[/tt] and you will understand how it works. However be aware, that using this mechanism, your database will be accessed also for the generation of the meta data.

Kind regards,
Tobias

turicum · January 2010

Hi Tobias,

what do you mean by "your database will be accessed also for the generation of the meta data"?

Thanks
Alex

TobiasMalbrecht · January 2010

Hi Alex,

throughout process design, meta is propagated through the process to support the user in designing the process. Therefore, some data readers (those that extend [tt]AbstractDataReader[/tt]) pre-read some data (i.e. a couple of rows) already during the process design phase (when you have added the operator to the process and made the settings accordingly) to generate some meta data. Hence, if you implement your operator using the [tt]AbstractDataReader[/tt], data will be read from your database twice - once to generate some meta data during the proces design and a second time during process execution.

Kind regards,
Tobias

turicum · March 2010

HI!

Once I have extended AbstractDataReader to access the database, what's the easiest way to integrate the new class into RM?

Thanks
Alex

land · March 2010

Hi Alex,
the easiest way is to buy the "How to Extend RapidMiner 5.0" Tutorial in our web shop. It explains on 40 pages in detail what you have to do. Additionally it comes with a sample project for eclipse, that will make it very easy for you to deploy a new Extension.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

integration of non-SQL database mongoDB client API

Answers