Apache Derby

smackdown33 · September 2009

Hi,

I was wondering does anyone know how to use Apache Derby database with Rapidminer.

Thanks for your help.

smackdown33 · September 2009

The Problem:

I've created a database of attributes with java through Apache Derby, the Derby side of things is then shutdown. I then give rapidminer the URL address to the database but it cannot connect.

Any ideas????

Thanks

keith · September 2009

Hi,

I'm not familiar with Derby, but have some experience connecting to other databases through RM. Can you post the XML for the RM process you're actually trying? That would help remove some of the ambiguities in your question.

Does Derby have a JDBC driver? If so, then in theory at least RM should be able to read data through a DatabaseExampleSource operator.

Also, maybe this is a wording problem, but when you say "the Derby side of things is then shutdown", that makes me think that the Derby database process itself isn't running, in which case connecting to it is obviously a problem that would have nothing to do with RM :-) Can you describe what you mean another way?

Keith

smackdown33 · September 2009

Hi Keith,

thanks for the reply, ive got rapidminer to recognise Apache Derby by including the Derby client in the lib folder of rapidminer and restarting it.

However I am now faced with another issue regarding loading the data, when i try to run a simple test i get the following error:

[Fatal] IndexOutOfBoundsException occured in 1st application of DatabaseExampleSource (DatabaseExampleSource)
[Fatal] Process failed: operator cannot be executed (Index: 10, Size: 10). Check the log messages...

Would anyone be able to help with this please.

Thanks

smackdown33 · September 2009

Ive subsequently encountered another problem which is associated with selecting "work_on_database", which I am trying to use in order to save memory.

[Fatal] UserError occured in 1st application of DatabaseExampleSource (DatabaseExampleSource)
[Fatal] Process failed: Database error occurred: Lexical error at line 1, column 15. Encountered: "`" (96), after : "".

Does anyone have any ideas or know of this problem.

Thanks

keith · September 2009

As I mentioned, it's easier to help if you include your actual RM process, rather than just the error. A description of the table(s) you're selecting from would also be useful.

Keith

smackdown33 · September 2009

Hi,

pretty new to this so ive been trying to run the feature selection process in the wizard. The database table consists of 801 columns and 1300 rows.

I hope this is the information your after.

Thanks

land · September 2009

Hi,
what Keith was trying to say is, that you post the complete XML description of your process. Then it only takes seconds for us, loading your process in rapidminer, to take closer look.
To post it here, open your process in rapidMiner and switch to the XML Tag. Then mark the complete text and copy it into the clip board.
Here in the Forum, you should press the # button and paste the complete process with the inserted code tag.

Greetings,
Sebastian

smackdown33 · September 2009

Hi,

ok sorry, was a bit confused there, thanks for the reply. Please see process below:

<operator name="Root" class="Process" expanded="yes">
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
    </operator>
    <operator name="GA" class="GeneticAlgorithm" expanded="yes">
        <operator name="XVal" class="XValidation" expanded="yes">
            <operator name="Learner" class="LibSVMLearner">
                <list key="class_weights">
                </list>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Applier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluator" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
</operator>

Any help would be greatly appreciated, thanks.

keith · September 2009

If that's the actual process you're trying to run, it looks like you added a DatabaseExampleSource, but didn't t actually tell RM anything about where the database is or how to connect.

In the RM GUI, select the DatabaseExampleSource operator in your process chain, and click the Parameters tab. You should see a bunch of entries including, most critically, "database_system", "database_url", "username", and "password". This is how to tell RM what kind of database you're running, how to find it, and what credentials to use to connect. There are also "query" or "table_name" parameters that specify either a SQL query to execute, or the name of the table to retrieve.

The database_url parameter will be specific to your driver+database, so you may need to consult the Derby documentation to find out what the proper JDBC connection string looks like.

When you specify the parameters, the XML for that operator should look something like the following (I'm using SQL Server, not Derby, but it should give you an idea).


        <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
            <parameter key="database_system"	value="Microsoft SQL Server (Microsoft)"/>
            <parameter key="database_url"	value="jdbc:sqlserver://KEITHS_DB"/>
            <parameter key="username"	value="keith"/>
            <parameter key="password"	value="BTs7lFuaeWCRLJzv8cWy6g=="/>
            <parameter key="query"	value="select * from mytable"/>
        </operator>

Hope that helps,
Keith

smackdown33 · September 2009

Hi Keith, sorry I copied the wrong source for the process, thanks alot for your help thus far. Here is the source for the process below:

<operator name="Root" class="Process" expanded="yes">
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource" breakpoints="after">
        <parameter key="work_on_database"	value="true"/>
        <parameter key="database_url"	value="jdbc:derby://localhost:1527/reducedFeatureDB"/>
        <parameter key="username"	value="user1"/>
        <parameter key="password"	value="e7QsjGOrCv4="/>
        <parameter key="query"	value="select * from location"/>
        <parameter key="table_name"	value="LOCATION"/>
        <parameter key="label_attribute"	value="NUM"/>
    </operator>
    <operator name="GA" class="GeneticAlgorithm" expanded="yes">
        <operator name="XVal" class="XValidation" expanded="yes">
            <operator name="NearestNeighbors" class="NearestNeighbors">
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Applier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluator" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
</operator>

If I place a breakpoint on the DatabaseExampleSource and check whether the connection is ok and whether the data is loading correctly, all is fine. However when i try to run the process from then on I receive either an out of bounds exception or a lexical error, such as the one below:

[Fatal] UserError occured in 1st application of DatabaseExampleSource (DatabaseExampleSource)
[Fatal] Process failed: Database error occurred: Lexical error at line 1, column 15. Encountered: "`" (96), after : "".

Any help would be greatly appreciated

Thanks to those who have already given their time and knowledge.

haddock · September 2009

Hi there,

I'm not familiar with Derby, but am used to hoovering stuff to and from databases, and it looks like column 15 of the Location table is presenting an orphaned text delimiter, leaving unterminated text. To test just the connection try selecting just one column, other than 15! If that works you'll then need to SQueLch clean column 15.

smackdown33 · September 2009

Thanks for the help haddock, I changed my source to only input one column but still the same lexical error as before:

<operator name="Root" class="Process" expanded="yes">
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource" breakpoints="after">
        <parameter key="work_on_database"	value="true"/>
        <parameter key="database_url"	value="jdbc:derby://localhost:1527/reducedFeatureDB"/>
        <parameter key="username"	value="user1"/>
        <parameter key="password"	value="e7QsjGOrCv4="/>
        <parameter key="query"	value="select feat0 from location"/>
        <parameter key="table_name"	value="LOCATION"/>
        <parameter key="label_attribute"	value="NUM"/>
    </operator>
    <operator name="GA" class="GeneticAlgorithm" expanded="yes">
        <operator name="XVal" class="XValidation" expanded="yes">
            <operator name="NearestNeighbors" class="NearestNeighbors">
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Applier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Evaluator" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
</operator>

Error:
[Fatal] UserError occured in 1st application of DatabaseExampleSource (DatabaseExampleSource)
[Fatal] Process failed: Database error occurred: Lexical error at line 1, column 15. Encountered: "`" (96), after : "".

keith · September 2009

I think if you have "work_on_database" checked, RM only reads the table_name, and ignores the "query" parameter, so your one-column case wasn't actually being tested. With DatabaseExampleSource you set either table_name or query, but not both.

Does the problem still occur if you uncheck "work_on_database", for either the full table, or the 1 column case? 800 columns and 1300 rows isn't really a lot of data, so I wouldn't think it was necessary to run with work_on_database on to save memory.

If you are successfully loading the data, perhaps you can try immediately writing it back out to a CSV file using CSVExampleSetWriter. Then alter a process that reads the data in from the CSV rather than reading from the database. If you still get a problem, then the problem isn't the database connection itself, but something in the data.

Keith

smackdown33 · September 2009

Hi Keith, ive removed the query and unchecked the work_on_database and the problem does indeed disappear, but it is this part of the process that i need to keep as the 800 by 1300 is only just a small test set that i wrote to the database. This was in aid of getting RM and the database to connect properly and to ensure working order.

My real dataset will still be around 1300 examples but the number of attributes will range from 7200 - 108000 and when running the feature selection algorithm on such a dataset i run out of memory and im currently running 12Gb 1866MHz DDR3 on an I7.

Any Ideas?

Thanks for your help, much appreciated.

keith · September 2009

I have no experience with the "work_on_database" setting, so unfortunately I'm unable to offer any further suggestions. Sorry.

Keith

land · September 2009

Hi,
to my knowledge no row-oriented database can handle so many columns anyway. But if derby is able to, please tell me...

You should not use work_on_database, but instead use the CachedDatabaseExampleSource. It will only hold a subset in memory. But this does not neglect the problem with the number of columns...

Greetings,
Sebastian

keith · September 2009

Sebastian is correct. Oracle (1000) and SQL Server (1024) and Derby 10.5 (1012) all allow a max of about 1000 columns per table.

smackdown33 · September 2009

Thanks keith and sebastian, much appreciated. Im slowly exhausting all my options.

Ive subsequently reduced my dataset to only 1 row and 4 columns and used both the "DatabaseExampleSource" with "work_on_database" selected and I still received the lexical error. With work_on_database unchecked everything works fine.

Ive also tried this with cachedDatabaseExampleSource and it too comes back with a lexical error but for line 1 column 13.

Any Ideas.

Thanks

land · September 2009

Hi,
to put an end to this discussion:
a) Your database doesn't support your targeted number of attributes anyway. So it doesn't make sense to waste time in trying to "work on database". You simply have to load it via file or put it together in rapid miner storing all data in main memory.
b) If you have such a great number of attributes you cannot use LinearRegression. For computing the coefficients, a quadratical matrix with the dimension of the number of attributes must be created. This will definitively not fit into any memory, if you have 100.000 attributes.

To come to an conclusion: Use another learner like the SVM. Use its linear kernel and the results will be comparable. This should solve your problems.

Greetings,
Sebastian

smackdown33 · October 2009

Thanks alot for your answers guys, much appreciated.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Apache Derby

Answers