The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
After Nominal to Text and Loop Collection, the date column gone
Hi everyone,
when doing some text mining, I would like to know the date of each article after tokenization. However, I only receive the text columns in the end, on which the sentiment dictionary is applied. Is there any possibility to keep the date column or add it again on the way?
Here is my code:
when doing some text mining, I would like to know the date of each article after tokenization. However, I only receive the text columns in the end, on which the sentiment dictionary is applied. Is there any possibility to keep the date column or add it again on the way?
Here is my code:
<context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34"> <parameter key="repository_entry" value="../Data/Lexis_Nexis_PAT"/> </operator> <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve (2)" width="90" x="45" y="289"> <parameter key="repository_entry" value="../Data/GRESD"/> </operator> <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.0.001" expanded="true" height="82" name="Dictionary-Based Sentiment (Documents)" width="90" x="246" y="289"> <parameter key="value_attribute" value="Klassifizierung"/> <parameter key="key_attribute" value="Wort"/> <parameter key="negation_attribute" value="Negationen"/> <parameter key="negation_window_size" value="5"/> <parameter key="use_symmetric_negation_window" value="true"/> </operator> <operator activated="true" class="set_role" compatibility="9.3.001" expanded="true" height="82" name="Set Role" width="90" x="112" y="187"> <parameter key="attribute_name" value="Datum"/> <parameter key="target_role" value="Datum"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="nominal_to_text" compatibility="9.3.001" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value="Body Teil 1"/> <parameter key="attributes" value="|Body Teil 1|Body Teil 2"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="nominal"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="file_path"/> <parameter key="block_type" value="single_value"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="single_value"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="text:data_to_documents" compatibility="8.2.000" expanded="true" height="68" name="Data to Documents" width="90" x="313" y="34"> <parameter key="select_attributes_and_weights" value="false"/> <list key="specify_weights"/> </operator> <operator activated="true" class="loop_collection" compatibility="9.3.001" expanded="true" height="82" name="Loop Collection" width="90" x="447" y="34"> <parameter key="set_iteration_macro" value="false"/> <parameter key="macro_name" value="iteration"/> <parameter key="macro_start_value" value="1"/> <parameter key="unfold" value="false"/> <process expanded="true"> <operator activated="true" class="text:tokenize" compatibility="8.2.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="45" y="34"> <parameter key="mode" value="non letters"/> <parameter key="characters" value=".:"/> <parameter key="language" value="English"/> <parameter key="max_token_length" value="3"/> </operator> <operator activated="true" class="text:transform_cases" compatibility="8.2.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="179" y="34"> <parameter key="transform_to" value="lower case"/> </operator> <operator activated="true" class="text:filter_stopwords_german" compatibility="8.2.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="313" y="34"> <parameter key="stop_word_list" value="Standard"/> </operator> <operator activated="true" class="text:filter_by_length" compatibility="8.2.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="514" y="34"> <parameter key="min_chars" value="3"/> <parameter key="max_chars" value="10000"/> </operator> <connect from_port="single" to_op="Tokenize (2)" to_port="document"/> <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/> <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/> <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/> <connect from_op="Filter Tokens (2)" from_port="document" to_port="output 1"/> <portSpacing port="source_single" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.0.001" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="581" y="187"> <list key="application_parameters"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="715" y="187"> <list key="function_descriptions"> <parameter key="#Pos_Wörter/(#Pos_Wörter+#Neg_Wörter)" value="Positivity/(Positivity-Negativity)"/> <parameter key="#Neg_Wörter/(#Pos_Wörter+#Neg_Wörter)" value="Negativity*-1/(Negativity*-1+Positivity)"/> <parameter key="Pos_Score" value="if(Positivity>(Negativity*-1),1,0)"/> <parameter key="Neg_Score" value="if((Negativity*-1)>Positivity,-1,0)"/> </list> <parameter key="keep_all" value="true"/> </operator> <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="849" y="187"> <list key="function_descriptions"> <parameter key="Sentiment_Score" value="if(Pos_Score>0,1,if(Neg_Score<0,-1,0))"/> </list> <parameter key="keep_all" value="true"/> </operator> <operator activated="true" class="write_excel" compatibility="9.3.001" expanded="true" height="103" name="Write Excel" width="90" x="983" y="187"> <parameter key="excel_file" value="D:\Franziska C. Weis\Masterarbeit\03 Datenanalyse\Rapid_Miner_Analysis_IZ.xlsx"/> <parameter key="file_format" value="xlsx"/> <enumeration key="sheet_names"/> <parameter key="sheet_name" value="RapidMiner Data"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="number_format" value="#.0"/> <parameter key="encoding" value="SYSTEM"/> </operator> <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/> <connect from_op="Retrieve (2)" from_port="output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/> <connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/> <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/> <connect from_op="Nominal to Text" from_port="example set output" to_op="Data to Documents" to_port="example set"/> <connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/> <connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/> <connect from_op="Apply Model (Documents)" from_port="exa" to_op="Generate Attributes" to_port="example set input"/> <connect from_op="Generate Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/> <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Write Excel" to_port="input"/> <connect from_op="Write Excel" from_port="through" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
0
Answers
MarlaBot
I can´t run the process since you retrieve information from some repository on your computer.
You could add the Generate ID Operator after the retrieve and the use that ID to join your results to your first DataSet.
unfortunately, when putting the Generate ID Operator directly after the retrieve, the ID also does not survive the rest of the process (the text mining and tokenization).
How can I solve that problem?
Many thanks!
thank you for your reply!
The date goes all along the way until the "Apply Model (Documents)" Operator - there, it gets lost.
Any suggestions here?
Thanks a lot!
this is almost for sure a bug. @sgenzer can you please create a ticket and assign me?
BR,
Martin
Dortmund, Germany
Dortmund, Germany