Custom jdbc connection with file uri uses Rapidminer program directory

MaartenK · March 2023

I am trying to make a connection to a DuckDB using the DuckDB jdbc driver. In rapidminer i choose the driver manually (duckdb_jdbc-0.1.7.jar.
I then configure a manual URI: jdbc:duckdb:/duckdb/test_de.duckdb. This is a file URI.
Rapidminer does not take this as an absolute path but prefixes the RM program directory. When I click "test connection" I get this error:

IO Error: Cannot open file "C:\Program Files\RapidMiner\RapidMiner Studio\/duckdb/test_de.duckdb": Het systeem kan het opgegeven pad niet vinden.

How can I use this driver and pass an absolute path?

jwpfau · March 2023

The DuckDB JDBC Driver isn't in great shape, here is a hack to at least read some data from a DuckDB that gives you a "GetTypeInfo" error.

It's not suitable for production, but might help to extract some data once.

<?xml version="1.0" encoding="UTF-8"?><process version="10.1.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="10.1.002" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="10.1.002" expanded="true" height="68" name="Retrieve DuckDB" width="90" x="45" y="34">
        <parameter key="repository_entry" value="/Connections/DuckDB"/>
      </operator>
      <operator activated="true" class="execute_script" compatibility="10.1.002" expanded="true" height="82" name="Execute Script" width="90" x="246" y="34">
        <parameter key="script" value="import com.rapidminer.connection.ConnectionInformationContainerIOObject;&#10;import com.rapidminer.extension.jdbc.tools.jdbc.StatementCreator;&#10;import com.rapidminer.extension.jdbc.tools.jdbc.PostgreSQLUtils;&#10;&#10;ConnectionInformationContainerIOObject inputData = input[0];&#10;String jdbcURL = inputData.getConnectionInformation().getConfiguration().getValue(&quot;db_config.manual_url&quot;);&#10;&#10;//String jdbcURL = &quot;jdbc:duckdb:C:\\duckdb\\test_de.duckdb&quot;;&#10;// Reuse the jdbc data types of postgres, since they mostly fit&#10;StatementCreator.CACHE.put(jdbcURL, PostgreSQLUtils.POSTGRES_DATA_TYPES);&#10;&#10;&#10;// This line returns the first input as the first output&#10;return inputData;"/>
        <parameter key="standard_imports" value="true"/>
      </operator>
      <operator activated="true" class="jdbc_connectors:read_database" compatibility="10.1.002" expanded="true" height="82" name="Read Database" width="90" x="380" y="34">
        <parameter key="define_connection" value="repository"/>
        <parameter key="connection_entry" value=""/>
        <parameter key="database_system" value="MySQL"/>
        <parameter key="define_query" value="table name"/>
        <parameter key="query" value="SELECT * FROM titanic;"/>
        <parameter key="use_default_schema" value="true"/>
        <parameter key="table_name" value="titanic"/>
        <parameter key="prepare_statement" value="false"/>
        <enumeration key="parameters"/>
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
      </operator>
      <connect from_op="Retrieve DuckDB" from_port="output" to_op="Execute Script" to_port="input 1"/>
      <connect from_op="Execute Script" from_port="output 1" to_op="Read Database" to_port="connection"/>
      <connect from_op="Read Database" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

If you need to write, make sure to set the batch size to 0, since batch is also not implemented in this driver.

Greetings,
Jonas

jwpfau · March 2023

Hi Maarten,

Since you are using Windows you would have to use a Windows style file path i.e.

jdbc:duckdb:C:\duckdb\test_de.duckdb

Edit: I just tried the DuckDB JDBC Driver and it's missing some required functionality:

https://github.com/duckdb/duckdb/issues/6759

Greetings,
Jonas

MaartenK · March 2023

I tried. It will give another error: GetTypeInfo with an exclamation mark. However, it also gives this error if I point to an non existing file name. But maybe the path issue is solverd and I ran into an other error.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Custom jdbc connection with file uri uses Rapidminer program directory

Best Answer

Answers