The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Exampleset : Flatten
I have an ExampleSet that is the output of applying a Word2Vec model on a document (technically, a collection with 1 document). The result is one row per word, and then 1 column for each dimension, as expected. In my case, the W2V model was created to generate a layer of size 100.
My question. I simply want to average all of the rows together so that I have a 1x100 ExampleSet. What is the best way to do that in RM 9.9? In python, this would be a simple numpy or pandas operation along an axis.
My question. I simply want to average all of the rows together so that I have a 1x100 ExampleSet. What is the best way to do that in RM 9.9? In python, this would be a simple numpy or pandas operation along an axis.
0
Best Answer
-
jwpfau Employee-RapidMiner, Member Posts: 303 RM EngineeringHi,
i hope i've got it right this time.<?xml version="1.0" encoding="UTF-8"?><process version="9.9.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="subprocess" compatibility="9.9.000" expanded="true" height="82" name="Apply Word2Vec" width="90" x="45" y="34"> <process expanded="true"> <operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" origin="GENERATED_TUTORIAL" width="90" x="45" y="187"> <parameter key="number_of_iterations" value="5"/> <parameter key="iteration_macro" value="iteration"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="text" value="Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Nam liber tempor **** soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur"/> <parameter key="add label" value="false"/> <parameter key="label_type" value="nominal"/> </operator> <connect from_op="Create Document" from_port="output" to_port="output 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> <description align="center" color="transparent" colored="false" width="126">Get a Collection of documents</description> </operator> <operator activated="true" class="loop_collection" compatibility="9.9.000" expanded="true" height="82" name="Loop Collection" origin="GENERATED_TUTORIAL" width="90" x="179" y="187"> <parameter key="set_iteration_macro" value="false"/> <parameter key="macro_name" value="iteration"/> <parameter key="macro_start_value" value="1"/> <parameter key="unfold" value="false"/> <process expanded="true"> <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" origin="GENERATED_TUTORIAL" width="90" x="112" y="34"> <parameter key="mode" value="non letters"/> <parameter key="characters" value=".:"/> <parameter key="language" value="English"/> <parameter key="max_token_length" value="3"/> </operator> <connect from_port="single" to_op="Tokenize" to_port="document"/> <connect from_op="Tokenize" from_port="document" to_port="output 1"/> <portSpacing port="source_single" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> <description align="center" color="transparent" colored="false" width="126">Tokenize</description> </operator> <operator activated="true" class="word2vec:Word2Vec_Learner" compatibility="1.0.000" expanded="true" height="68" name="Word2Vec " origin="GENERATED_TUTORIAL" width="90" x="313" y="187"> <parameter key="Minimal Vocab Frequency" value="1"/> <parameter key="Layer Size" value="200"/> <parameter key="Window Size" value="7"/> <parameter key="Use Negative Samples" value="0"/> <parameter key="Iterations" value="5"/> <parameter key="Down Sampling Rate" value="1.0E-4"/> </operator> <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document (2)" origin="GENERATED_TUTORIAL" width="90" x="45" y="34"> <parameter key="text" value="Lorem ipsum"/> <parameter key="add label" value="false"/> <parameter key="label_type" value="nominal"/> </operator> <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize (2)" origin="GENERATED_TUTORIAL" width="90" x="179" y="34"> <parameter key="mode" value="non letters"/> <parameter key="characters" value=".:"/> <parameter key="language" value="English"/> <parameter key="max_token_length" value="3"/> </operator> <operator activated="true" class="collect" compatibility="9.9.000" expanded="true" height="82" name="Collect" origin="GENERATED_TUTORIAL" width="90" x="313" y="34"> <parameter key="unfold" value="false"/> </operator> <operator activated="true" class="word2vec:Apply_Word2Vec" compatibility="1.0.000" expanded="true" height="103" name="Apply Word2Vec (Documents) " origin="GENERATED_TUTORIAL" width="90" x="514" y="85"/> <connect from_op="Loop" from_port="output 1" to_op="Loop Collection" to_port="collection"/> <connect from_op="Loop Collection" from_port="output 1" to_op="Word2Vec " to_port="doc"/> <connect from_op="Word2Vec " from_port="mod" to_op="Apply Word2Vec (Documents) " to_port="mod"/> <connect from_op="Create Document (2)" from_port="output" to_op="Tokenize (2)" to_port="document"/> <connect from_op="Tokenize (2)" from_port="document" to_op="Collect" to_port="input 1"/> <connect from_op="Collect" from_port="collection" to_op="Apply Word2Vec (Documents) " to_port="doc"/> <connect from_op="Apply Word2Vec (Documents) " from_port="exa" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="aggregate" compatibility="9.9.000" expanded="true" height="82" name="Aggregate" width="90" x="179" y="34"> <parameter key="use_default_aggregation" value="true"/> <parameter key="attribute_filter_type" value="regular_expression"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="regular_expression" value="dimension.*"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="default_aggregation_function" value="average"/> <list key="aggregation_attributes"/> <parameter key="group_by_attributes" value=""/> <parameter key="count_all_combinations" value="false"/> <parameter key="only_distinct" value="false"/> <parameter key="ignore_missings" value="true"/> </operator> <operator activated="true" class="rename_by_replacing" compatibility="9.9.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="313" y="34"> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="replace_what" value="average\((.*)\)"/> <parameter key="replace_by" value="$1"/> </operator> <connect from_op="Apply Word2Vec" from_port="out 1" to_op="Aggregate" to_port="example set input"/> <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/> <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Greetings,
Jonas0
Answers
do you mean something like this?
Greetings,
Jonas
Each dimension in the 1x200 would be the average across the rows form the original document. In this case, the average for each dimension across the two rows, given the document only had two tokens.