The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Datatable generated by Execute Python bad displayed
lionelderkrikor
RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi all,
Once again, it's to report a weird behaviour in RapidMiner :
I'm following a tutorial on time series which use RapidMiner.
For that, I'm using the library Quandl of Python (via Execute Python operator) to retrieve from the web
the stock prices that serve as entry dataset.
However, the first column (date-time column) contain only missing values "?" :
Here the process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="246" y="85">
<parameter key="script" value="import pandas as pd import quandl # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) def rm_main(): data = quandl.get("WIKI/GE", start_date="2016-01-04", end_date="2016-03-26",collapse = "daily",column_index =11,returns="numpy") data = pd.DataFrame(data) # connect 2 output ports to see the results return data"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="85">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<connect from_op="Execute Python" from_port="output 1" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I decided to execute the Python code in a Notebook and here it works perfectly fine (the datatable is good displayed) :
Can you help me to determine what's going on ?
Thanks you for your answers,
Best regards,
Lionel
Tagged:
0
Comments
I think this question came up before in the Community. I seem to remember it was related to formating the data-time.
Hi all,
After research in the communauty, it seems there is not solution to this problem.
Concretely what is done to solve this problem ?
Thanks you for your answers,
Best regars,
Lionel
I have found a temporary solution:
Basically it consists of converting the dates to string in Python and then parsing them in RapidMiner Studio.
Should RM be able to convert from pandas' dates automatically?
@SGolbert I think it should do that automatically because the Python Script operator translates the RM exampleset to the Pandas dataframe. I think this needs to be investigated by the RM Dev team.
Hi,
Thanks you @SGolbert . This solution works well and helps me a lot.
Best regards,
Lionel
moving to Product Feedback.
Scott
Dear all,
I agree with Thomas. Some tasks that cannot be performed by RM, can be performed with Python scripts.
It's frustrating not to be able to display the associated results in RM.
Regards,
Lionel
tagging @bhupendra_patil
Date values no longer become missing values when using Execute Python starting from Python Scripting Extension version 9.3.1.
Thanks for reporting this problem and for coming up with workarounds.
Best,
Peter