The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Timestamps from PostgreSQL: Different results in RapidMiner and RapidAnalytics"
Hi everyone,
I am facing a strange behavior of timestamps (or big numbers) in RapidMiner and RapidAnalytics when reading them from a PostgreSQL Database. I set up a very simple process, which only reads from the DB, sorts the resulting example set and stores it in the RA repository.
When running the same process on the RA server, I get a different and obviously wrong result
I'm using Rapidminer 5.3.007, RapidAnalytics CE 1.3.007 and PostgreSQL 8.4.4. RapidMiner uses the DB connection I've defined in RapidAnalytics.
Does anyone have an idea of what is going on here?
Thanks for help,
LarsH
I am facing a strange behavior of timestamps (or big numbers) in RapidMiner and RapidAnalytics when reading them from a PostgreSQL Database. I set up a very simple process, which only reads from the DB, sorts the resulting example set and stores it in the RA repository.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>When I execute
<process version="5.3.007">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_database" compatibility="5.3.007" expanded="true" height="60" name="Read Database (2)" width="90" x="246" y="30">
<parameter key="connection" value="mypsqldb"/>
<parameter key="query" value="select to_char('01.08.2012 01:01:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string, '01.08.2012 01:01:00+00'::timestamptz as zeit_ts, extract(epoch from timestamp with time zone '01.08.2012 01:01:00+00') as zeit_epoch union select to_char('01.08.2012 01:02:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string, '01.08.2012 01:02:00+00'::timestamptz as zeit_ts, extract(epoch from timestamp with time zone '01.08.2012 01:02:00+00') as zeit_epoch union select to_char('01.08.2012 01:03:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string, '01.08.2012 01:03:00+00'::timestamptz as zeit_ts, extract(epoch from timestamp with time zone '01.08.2012 01:03:00+00') as zeit_epoch;"/>
<enumeration key="parameters"/>
</operator>
<operator activated="true" class="sort" compatibility="5.3.007" expanded="true" height="76" name="Sort" width="90" x="447" y="30">
<parameter key="attribute_name" value="zeit_string"/>
</operator>
<operator activated="true" class="store" compatibility="5.3.007" expanded="true" height="60" name="Store" width="90" x="648" y="30">
<parameter key="repository_entry" value="../ExampleSets/Test2"/>
</operator>
<connect from_op="Read Database (2)" from_port="output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Store" to_port="input"/>
<connect from_op="Store" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
selectdirectly from RapidMiner, everything seems fine and I get
to_char('01.08.2012 01:01:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string,
'01.08.2012 01:01:00+00'::timestamptz as zeit_ts,
extract(epoch from timestamp with time zone '01.08.2012 01:01:00+00') as zeit_epoch
union
select to_char('01.08.2012 01:02:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string,
'01.08.2012 01:02:00+00'::timestamptz as zeit_ts,
extract(epoch from timestamp with time zone '01.08.2012 01:02:00+00') as zeit_epoch
union
select to_char('01.08.2012 01:03:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ') as zeit_string,
'01.08.2012 01:03:00+00'::timestamptz as zeit_ts,
extract(epoch from timestamp with time zone '01.08.2012 01:03:00+00') as zeit_epoch;
When running the same process on the RA server, I get a different and obviously wrong result
I'm using Rapidminer 5.3.007, RapidAnalytics CE 1.3.007 and PostgreSQL 8.4.4. RapidMiner uses the DB connection I've defined in RapidAnalytics.
Does anyone have an idea of what is going on here?
Thanks for help,
LarsH
Tagged:
0
Answers
do you have the same SQL driver for RapidMiner and RapidAnalytics?
Best,
Nils
Yes, I think so. I didn't change the driver that comes with RM and RA (postgresql-9.1-901.jdbc4.jar).
Over the weekend, I found out that the problem occurs on Linux and Windows. I've tested the following combinations:
RM: Ubuntu 12.04 -> RA: Ubuntu 12.04
RM: PC-BSD 9.1 -> RA: Ubuntu 10.04
RM: Windows 7 -> RA: Windows 7
All of them show the same behavior as described above.
Best,
Lars
I created an example set called 'Test' via "Import CSV" with the following content (automatically identified as 'integer' by RM): Then I executed the following process to get an example set that contains the number from 'Test' as date '01.08.2012 03:01:00 MESZ'. Doing this directly in Rapidminer, everything works well. But in Rapidanalytics, I get the same wrong result like in my PostgreSQL example.
And I found out another interesting detail: When I declare my CSV number 1343782860 to be 'numeric', Rapidminer imports it as 1343782900. Thats '01.08.2012 03:01:40 MESZ'! Perhaps that's the error that happens during the PostgreSQL import? But how can I fix it?
and ran it with Rapidminer and Rapidanalytics. Lots of 16-digit random numbers.
With Rapidminer, I get
but with Rapidanalytics, the last 7 digits are always zero?!
Of course, the numbers are "random", but is this behavior really correct?
No number is concatenated when storing the ExampleSet on disk. But when you run it in RapidAnalytics the results are stored in the Postgres SQL database you selected for RapidAnalytics.
And there must be a weird problem. Currenlty I have no Postgres DB available, so I can't reproduce the error, but could you please have a look at the setup of your database tables?
Is there something suspicious?
Best,
Nils
I've re-executed my 'random numbers' process and found out that RA stores the 16-digit numbers as 'real's, instead of 'double' or 'numeric'. Even timestamps are stored as 'real'! That's why some digits (and seconds...) get lost!
Best,
Lars
If you run a process from the RapidMiner GUI the result you see in the end is produced on the machine you run RapidMiner on.
It does not matter if the process is stored an RA or in a local Repository. Nevertheless the 'Store' operator will transfer the locally generated data to your RA instance and save it in the selected RA database if the process is stored on RapidAnalytics.
Could you please have a look what the data types the data tables have? For me it looks like this on MySQL: As you can see att_1 and the label values are stored as doubles.
Thus retrieving the values yields exactly the same results as executing the process from RapidMiner locally does.
Best,
Nils
assuming that your original problem with the timestamps is still current:
to_char('01.08.2012 01:01:00+00'::timestamptz,'DD.MM.YYYY HH24:MI:SS TZ')
Don't do this, it is not portable. It depends on the PostgreSQL configuration setting "datestyle" and also on the settings in the client session. That again can be set by the client and maybe the JDBC connector or RapidAnalytics set it up differently.
You can always check the value by putting "SHOW datestyle;" into a Read Database operator.
Your example outputs "08.01.2012" instead of "01.08.2012" in my RapidMiner because my PostgreSQL installation has "iso, mdy" configured - the default configuration.
See the "Date input" topic in the PostgreSQL documentation for safe syntax variants:
http://www.postgresql.org/docs/9.1/static/datatype-datetime.html