The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
WordList to Data gives me example set with no attribute names
mizunooto
Member Posts: 9 Contributor II
I'm running Process Documents to get a word list which I then convert to data using WordList to Data. All goes well until I try to select, filter or otherwise use the dataset thus created. I cannot see any attribute names in the data. I can manually type them in (e.g. in Select Attributes, but not all operators allow this), but subsequent operators in the chain do not see the names.
I have tried Synchronize Metadata and Materiaize Data, but neither of these helps.
In the example, the final Select Attributes operator does not allow me to select attributes from a list, although I can type "word" into the box when the filter type is "single".
Any ideas?
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.6.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
<list key="attribute_values">
<parameter key="text" value=""hello world out there""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="7.6.001" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="34">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="34">
<parameter key="create_word_vector" value="true"/>
<parameter key="vector_creation" value="TF-IDF"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="false"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="85">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:wordlist_to_data" compatibility="7.5.000" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="34"/>
<operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="715" y="34">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
<connect from_op="WordList to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Comments
Yes, this is an annoying RapidMiner bug. It's purely a metadata propagation problem. If you tried to use the "Synchronize Meta Data with Real Data" option but it didn't solve the problem, you can still add the attributes you want manually when selecting the "subset" option. Just type the name of the attributes into the box in the right hand side and then click the green plus button (see screenshot). They won't autopopulate but they will actually work when you run the process.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks - glad it's not just me!
Given that it's a bug, I've got a workaround which is to generate new attributes (equal to the existing attributes) after the dataset conversion. I can then select the new attributes as per normal and throw away the old, hidden attributes. A bit of a faff, but as the issue propagates all through the workflow, it's easier this way.
@mizunooto I'd love to see the process you are using to do that---I am assuming you are doing it in some kind of automated fashion and not just typing in all the new attribute names manually?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Imagination is a wonderful thing... at present I'm only using a couple of attributes so a manual operation is fine.
e.g. newword = word
Automation sounds like a great idea but I'll need to learn more to make it work.
yes that's a metadata propagation error for sure. Pushing this thread to Product Feedback.
Scott
Metadata problem recognized and reported to dev team. Please watch this board for updates.
SG
just to confirm with people watching this issue - toggling the "Synchronize Metadata with Real Data" feature does not resolve this, correct?
Correct, the "Sychronize Metadata" option doesn't fix this particular problem. I've found that it doesn't correct it 100% of the time, there are a few other cases where I have found similar behavior.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts