The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Missing dates
Hi,
I have a been given a dataset to predict employee termination rate, I have one attribute that is the date of termination and another that is Binominal and states if the employee has been terminated in the past or not if he/she has not then in the date of termination it is identified as a missing value. further, in the date of termination attribute the date is in two forms, for example, 11/02/15 and 12/05/2012 and therefore RM identifies it as polynomial and hence I cannot convert it into a date and I don't know how to treat these missing values.
I wanted to insert something like N/A in the missing values but that is I reckon not advisable.
In such a case is it advisable to use one of these as my label?
what is my best course of action, I hope I can get a quick response, thank you community.
cheers
Ryan
I have a been given a dataset to predict employee termination rate, I have one attribute that is the date of termination and another that is Binominal and states if the employee has been terminated in the past or not if he/she has not then in the date of termination it is identified as a missing value. further, in the date of termination attribute the date is in two forms, for example, 11/02/15 and 12/05/2012 and therefore RM identifies it as polynomial and hence I cannot convert it into a date and I don't know how to treat these missing values.
I wanted to insert something like N/A in the missing values but that is I reckon not advisable.
In such a case is it advisable to use one of these as my label?
what is my best course of action, I hope I can get a quick response, thank you community.
cheers
Ryan
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,depends on how malformed they are. One thing would be to filter them out up front. Or to replace them with a missing using Generate attributes withif(matches(...),MISSING_NOMINAL, date)or so.Best,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany2
Answers
Dortmund, Germany
cheers
Ryan
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.7.002" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="utility:create_exampleset" compatibility="9.7.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="85">
<parameter key="generator_type" value="comma separated text"/>
<parameter key="number_of_examples" value="100"/>
<parameter key="use_stepsize" value="false"/>
<list key="function_descriptions"/>
<parameter key="add_id_attribute" value="false"/>
<list key="numeric_series_configuration"/>
<list key="date_series_configuration"/>
<list key="date_series_configuration (interval)"/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="input_csv_text" value="date 11/02/15 "/>
<parameter key="column_separator" value=","/>
<parameter key="parse_all_as_nominal" value="true"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="trim_attribute_names" value="true"/>
</operator>
<operator activated="true" class="nominal_to_date" compatibility="9.7.002" expanded="true" height="82" name="Nominal to Date" width="90" x="246" y="85">
<parameter key="attribute_name" value="date"/>
<parameter key="date_type" value="date"/>
<parameter key="date_format" value="MM/dd/yy"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="keep_old_attribute" value="false"/>
</operator>
<connect from_op="Create ExampleSet" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
<connect from_op="Nominal to Date" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Dortmund, Germany
how do I deal with malformed dates?