The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
which amazon instance to chose for a "loop in loop" process requiring a huge amount of memory
Hi everyone,
I have a "loop in loop" process:
- A loop value, with inside a process that loads an example set with 1000 reviews to filter
- the above nested in a loop attribute that loads a a dictionary => dataset of 15 columns that contains all the words to be founded in the reviews. The largest attribute contains 2500 values -rows.
It's impossible to run this process in rapidminer studio that freezes after a while, because of the number of columns that are created by the loop value operator (one column per word for each word of each attribute column of the dictionary: 12660 columns indeed.
I’ve launched first the process in rapidminer AI HUB with an instance r4.xlarge, but crashed, then I tried with a more powerfull one: r4.4xlarge (16 vCPu and 122 GiB memory), but crashed again after few minutes.
Is there a way to define the instance design, in consideration of the number of columns?
thanks in advance for any suggestion 
cheers
0
Contributor II
Answers
Loops inside loops have a huge computational complexity, I think it's O(n^2), which is fairly undesirable in any programming language, not just RapidMiner Studio. Perhaps there is a way to simplify the search by applying some tricks? Also, it sounds like you would benefit from tokenizing words rather than using columns for your searches.
Do you mind to share your process with us so that we can check if there is anything we can do?
About your question: is there a way to define the instance design, in consideration of the number of columns?
Number of columns isn't a real measure for memory consumption unless you know exactly how large it is and how's it composed; I think your big issue isn't memory but optimization, though. (I may be wrong but worth the shot).
All the best,
Rod.
Thanks a lot for your reply.
Enclose, process file and 2 excel files (dictionary and dataset).
Normally I use local data repositories to optimise time access (and sharing the project with rapidminer AI Hub) but for sharing with you, I've changed the process with excel files and read excel operators.
The goal of this work is to create an automatic labelling process in order to create a validation dataset for a deep learning classification task. Then this dataset will be manually validated. Therefore, the output must be columns with labels (the categories of the dictionary) containing ones or zeros.
The process contains notes regarding pending questions.
Thanks a lot for any suggestion!
have a good day!
Love rapidminer capabilities and rapidminer community
Best,
Please check if this is what you are trying to achieve.
You'll need to install the Text Mining extension in case you don't currently have it.
On the process pay special attention to the vector creation parameter.
And on the prune method (specially for memory handling) this will help you keep only the Columns that are actually important avoiding those that do not have any appearances.
<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.9.002" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34"> <parameter key="excel_file" value="C:\Users\MarcoBarradas\Downloads\dictionary.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="EATERS.true.polynominal.attribute"/> <parameter key="1" value="EATEN.true.polynominal.attribute"/> <parameter key="2" value="OTHERS.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> </operator> <operator activated="true" class="read_excel" compatibility="9.9.002" expanded="true" height="68" name="Read Excel (2)" width="90" x="112" y="289"> <parameter key="excel_file" value="C:\Users\MarcoBarradas\Downloads\dataset.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="Review ID.true.integer.attribute"/> <parameter key="1" value="Date.true.date.attribute"/> <parameter key="2" value="App Name.true.polynominal.attribute"/> <parameter key="3" value="App Store.true.polynominal.attribute"/> <parameter key="4" value="Language.true.polynominal.attribute"/> <parameter key="5" value="Country.true.polynominal.attribute"/> <parameter key="6" value="Rating.true.integer.attribute"/> <parameter key="7" value="Sentiment.true.polynominal.attribute"/> <parameter key="8" value="Version.true.polynominal.attribute"/> <parameter key="9" value="Subj&Body.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> </operator> <operator activated="true" class="nominal_to_text" compatibility="9.9.002" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="289"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Subj&Body"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="nominal"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="file_path"/> <parameter key="block_type" value="single_value"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="single_value"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="nominal_to_text" compatibility="9.9.002" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value="EATEN"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="nominal"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="file_path"/> <parameter key="block_type" value="single_value"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="single_value"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="9.3.001" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="447" y="34"> <parameter key="create_word_vector" value="true"/> <parameter key="vector_creation" value="Binary Term Occurrences"/> <parameter key="add_meta_information" value="false"/> <parameter key="keep_text" value="false"/> <parameter key="prune_method" value="none"/> <parameter key="prune_below_percent" value="3.0"/> <parameter key="prune_above_percent" value="30.0"/> <parameter key="prune_below_rank" value="0.05"/> <parameter key="prune_above_rank" value="0.95"/> <parameter key="datamanagement" value="double_sparse_array"/> <parameter key="data_management" value="auto"/> <parameter key="select_attributes_and_weights" value="false"/> <list key="specify_weights"/> <process expanded="true"> <operator activated="true" class="text:transform_cases" compatibility="9.3.001" expanded="true" height="68" name="Transform Cases (2)" width="90" x="112" y="34"> <parameter key="transform_to" value="lower case"/> </operator> <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize (2)" width="90" x="246" y="34"> <parameter key="mode" value="non letters"/> <parameter key="characters" value=".:"/> <parameter key="language" value="English"/> <parameter key="max_token_length" value="3"/> </operator> <connect from_port="document" to_op="Transform Cases (2)" to_port="document"/> <connect from_op="Transform Cases (2)" from_port="document" to_op="Tokenize (2)" to_port="document"/> <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="9.3.001" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="136"> <parameter key="create_word_vector" value="true"/> <parameter key="vector_creation" value="Term Occurrences"/> <parameter key="add_meta_information" value="true"/> <parameter key="keep_text" value="true"/> <parameter key="prune_method" value="none"/> <parameter key="prune_below_percent" value="3.0"/> <parameter key="prune_above_percent" value="30.0"/> <parameter key="prune_below_rank" value="0.05"/> <parameter key="prune_above_rank" value="0.95"/> <parameter key="datamanagement" value="double_sparse_array"/> <parameter key="data_management" value="auto"/> <parameter key="select_attributes_and_weights" value="false"/> <list key="specify_weights"/> <process expanded="true"> <operator activated="true" class="text:transform_cases" compatibility="9.3.001" expanded="true" height="68" name="Transform Cases" width="90" x="45" y="34"> <parameter key="transform_to" value="lower case"/> </operator> <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"> <parameter key="mode" value="non letters"/> <parameter key="characters" value=".:"/> <parameter key="language" value="English"/> <parameter key="max_token_length" value="3"/> </operator> <connect from_port="document" to_op="Transform Cases" to_port="document"/> <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/> <connect from_op="Tokenize" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <connect from_op="Read Excel" from_port="output" to_op="Nominal to Text (2)" to_port="example set input"/> <connect from_op="Read Excel (2)" from_port="output" to_op="Nominal to Text" to_port="example set input"/> <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/> <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/> <connect from_op="Process Documents from Data (2)" from_port="word list" to_op="Process Documents from Data" to_port="word list"/> <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>I made a change to output the result you expect for the first point.
For the second case could you share an example of dictionary and examples? I guess we could use some replace dictionary with some regex magic to accomplish that.
<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.9.002" expanded="true" height="68" name="Dictionary" width="90" x="45" y="34"> <parameter key="excel_file" value="C:\Users\MarcoBarradas\Downloads\dictionary.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="EATERS.true.polynominal.attribute"/> <parameter key="1" value="EATEN.true.polynominal.attribute"/> <parameter key="2" value="OTHERS.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> </operator> <operator activated="true" class="read_excel" compatibility="9.9.002" expanded="true" height="68" name="DataSet" width="90" x="45" y="238"> <parameter key="excel_file" value="C:\Users\MarcoBarradas\Downloads\dataset.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"> <parameter key="0" value="Review ID.true.integer.attribute"/> <parameter key="1" value="Date.true.date.attribute"/> <parameter key="2" value="App Name.true.polynominal.attribute"/> <parameter key="3" value="App Store.true.polynominal.attribute"/> <parameter key="4" value="Language.true.polynominal.attribute"/> <parameter key="5" value="Country.true.polynominal.attribute"/> <parameter key="6" value="Rating.true.integer.attribute"/> <parameter key="7" value="Sentiment.true.polynominal.attribute"/> <parameter key="8" value="Version.true.polynominal.attribute"/> <parameter key="9" value="Subj&Body.true.polynominal.attribute"/> </list> <parameter key="read_not_matching_values_as_missings" value="false"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role" width="90" x="179" y="238"> <parameter key="attribute_name" value="Review ID"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="nominal_to_text" compatibility="9.9.002" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="238"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Subj&Body"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="nominal"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="file_path"/> <parameter key="block_type" value="single_value"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="single_value"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="remember" compatibility="9.9.002" expanded="true" height="68" name="Remember" width="90" x="447" y="238"> <parameter key="name" value="Original_File"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="store_which" value="1"/> <parameter key="remove_from_process" value="true"/> </operator> <operator activated="true" class="concurrency:loop_attributes" compatibility="9.9.002" expanded="true" height="103" name="Loop Attributes" width="90" x="246" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="EATEN|EATERS|OTHERS"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="attribute_name_macro" value="loop_attribute"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="false"/> <process expanded="true"> <operator activated="true" class="select_attributes" compatibility="9.9.002" expanded="true" height="82" name="Select Attributes (3)" width="90" x="179" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="%{loop_attribute}"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="nominal_to_text" compatibility="9.9.002" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="%{loop_attribute}"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="nominal"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="file_path"/> <parameter key="block_type" value="single_value"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="single_value"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="9.3.001" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="447" y="34"> <parameter key="create_word_vector" value="true"/> <parameter key="vector_creation" value="Binary Term Occurrences"/> <parameter key="add_meta_information" value="false"/> <parameter key="keep_text" value="false"/> <parameter key="prune_method" value="none"/> <parameter key="prune_below_percent" value="3.0"/> <parameter key="prune_above_percent" value="30.0"/> <parameter key="prune_below_rank" value="0.05"/> <parameter key="prune_above_rank" value="0.95"/> <parameter key="datamanagement" value="double_sparse_array"/> <parameter key="data_management" value="auto"/> <parameter key="select_attributes_and_weights" value="false"/> <list key="specify_weights"/> <process expanded="true"> <operator activated="true" class="text:transform_cases" compatibility="9.3.001" expanded="true" height="68" name="Transform Cases (2)" width="90" x="112" y="34"> <parameter key="transform_to" value="lower case"/> </operator> <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize (2)" width="90" x="246" y="34"> <parameter key="mode" value="non letters"/> <parameter key="characters" value=".:"/> <parameter key="language" value="English"/> <parameter key="max_token_length" value="3"/> </operator> <connect from_port="document" to_op="Transform Cases (2)" to_port="document"/> <connect from_op="Transform Cases (2)" from_port="document" to_op="Tokenize (2)" to_port="document"/> <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="9.3.001" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="289"> <parameter key="create_word_vector" value="true"/> <parameter key="vector_creation" value="Term Occurrences"/> <parameter key="add_meta_information" value="true"/> <parameter key="keep_text" value="true"/> <parameter key="prune_method" value="none"/> <parameter key="prune_below_percent" value="3.0"/> <parameter key="prune_above_percent" value="30.0"/> <parameter key="prune_below_rank" value="0.05"/> <parameter key="prune_above_rank" value="0.95"/> <parameter key="datamanagement" value="double_sparse_array"/> <parameter key="data_management" value="auto"/> <parameter key="select_attributes_and_weights" value="false"/> <list key="specify_weights"/> <process expanded="true"> <operator activated="true" class="text:transform_cases" compatibility="9.3.001" expanded="true" height="68" name="Transform Cases" width="90" x="45" y="34"> <parameter key="transform_to" value="lower case"/> </operator> <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"> <parameter key="mode" value="non letters"/> <parameter key="characters" value=".:"/> <parameter key="language" value="English"/> <parameter key="max_token_length" value="3"/> </operator> <connect from_port="document" to_op="Transform Cases" to_port="document"/> <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/> <connect from_op="Tokenize" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="select_attributes" compatibility="9.9.002" expanded="true" height="82" name="Keep Numeric" width="90" x="447" y="289"> <parameter key="attribute_filter_type" value="value_type"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.9.002" expanded="true" height="82" name="Remove The Rating" width="90" x="581" y="289"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Rating"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="generate_aggregation" compatibility="9.9.002" expanded="true" height="82" name="Total_Column" width="90" x="715" y="289"> <parameter key="attribute_name" value="Total_%{loop_attribute}"/> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="aggregation_function" value="sum"/> <parameter key="concatenation_separator" value="|"/> <parameter key="keep_all" value="true"/> <parameter key="ignore_missings" value="true"/> <parameter key="ignore_missing_attributes" value="false"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.9.002" expanded="true" height="82" name="Keep_ID_and_Total" width="90" x="849" y="289"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="Review ID|Total_%{loop_attribute}"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="true"/> </operator> <connect from_port="input 1" to_op="Select Attributes (3)" to_port="example set input"/> <connect from_port="input 2" to_op="Process Documents from Data" to_port="example set"/> <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/> <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/> <connect from_op="Process Documents from Data (2)" from_port="word list" to_op="Process Documents from Data" to_port="word list"/> <connect from_op="Process Documents from Data" from_port="example set" to_op="Keep Numeric" to_port="example set input"/> <connect from_op="Keep Numeric" from_port="example set output" to_op="Remove The Rating" to_port="example set input"/> <connect from_op="Remove The Rating" from_port="example set output" to_op="Total_Column" to_port="example set input"/> <connect from_op="Total_Column" from_port="example set output" to_op="Keep_ID_and_Total" to_port="example set input"/> <connect from_op="Keep_ID_and_Total" from_port="example set output" to_port="output 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="source_input 3" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="loop_collection" compatibility="9.9.002" expanded="true" height="68" name="Join with Previous DS" width="90" x="447" y="34"> <parameter key="set_iteration_macro" value="false"/> <parameter key="macro_name" value="iteration"/> <parameter key="macro_start_value" value="1"/> <parameter key="unfold" value="true"/> <process expanded="true"> <operator activated="true" class="recall" compatibility="9.9.002" expanded="true" height="68" name="Recall" width="90" x="179" y="34"> <parameter key="name" value="Original_File"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="remove_from_store" value="true"/> </operator> <operator activated="true" class="concurrency:join" compatibility="9.9.002" expanded="true" height="82" name="Join" width="90" x="246" y="187"> <parameter key="remove_double_attributes" value="true"/> <parameter key="join_type" value="inner"/> <parameter key="use_id_attribute_as_key" value="true"/> <list key="key_attributes"/> <parameter key="keep_both_join_attributes" value="false"/> </operator> <operator activated="true" class="remember" compatibility="9.9.002" expanded="true" height="68" name="Remember (2)" width="90" x="380" y="34"> <parameter key="name" value="Original_File"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="store_which" value="1"/> <parameter key="remove_from_process" value="true"/> </operator> <connect from_port="single" to_op="Join" to_port="right"/> <connect from_op="Recall" from_port="result" to_op="Join" to_port="left"/> <connect from_op="Join" from_port="join" to_op="Remember (2)" to_port="store"/> <portSpacing port="source_single" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <operator activated="true" class="recall" compatibility="9.9.002" expanded="true" height="68" name="Final Data Set" width="90" x="648" y="34"> <parameter key="name" value="Original_File"/> <parameter key="io_object" value="ExampleSet"/> <parameter key="remove_from_store" value="true"/> </operator> <connect from_op="Dictionary" from_port="output" to_op="Loop Attributes" to_port="input 1"/> <connect from_op="DataSet" from_port="output" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/> <connect from_op="Nominal to Text" from_port="example set output" to_op="Remember" to_port="store"/> <connect from_op="Remember" from_port="stored" to_op="Loop Attributes" to_port="input 2"/> <connect from_op="Loop Attributes" from_port="output 1" to_op="Join with Previous DS" to_port="collection"/> <connect from_op="Final Data Set" from_port="result" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>Perhaps there’s no relationship with that, but it seems to be corelated with the value of the column « verbatim size »
You just need to adjust the Remove Rating (Select Attribute Operator) inside the Loop Attributes Operator.
In that I removed all the Numeric Attributes that we had before creating the new counting Attributes.
If you change the attribute filter type to subset you can remove as many numeric attributes ( Rating and Verbatim) as you need to avoid any adding up other numbers to your totals.