The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Exampleset : Flatten

btibertbtibert Member, University Professor Posts: 146 Guru
I have an ExampleSet that is the output of applying a Word2Vec model on a document (technically, a collection with 1 document).  The result is one row per word, and then 1 column for each dimension, as expected.  In my case, the W2V model was created to generate a layer of size 100.

My question.  I simply want to average all of the rows together so that I have a 1x100 ExampleSet.  What is the best way to do that in RM 9.9?  In python, this would be a simple numpy or pandas operation along an axis.


Best Answer

  • jwpfaujwpfau Employee-RapidMiner, Member Posts: 303 RM Engineering
    Solution Accepted
    Hi,

    i hope i've got it right this time.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.9.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="9.9.000" expanded="true" height="82" name="Apply Word2Vec" width="90" x="45" y="34">
            <process expanded="true">
              <operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" origin="GENERATED_TUTORIAL" width="90" x="45" y="187">
                <parameter key="number_of_iterations" value="5"/>
                <parameter key="iteration_macro" value="iteration"/>
                <parameter key="reuse_results" value="false"/>
                <parameter key="enable_parallel_execution" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" origin="GENERATED_TUTORIAL" width="90" x="45" y="34">
                    <parameter key="text" value="Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.   &#10;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.   &#10;&#10;Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.   &#10;&#10;Nam liber tempor **** soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.   &#10;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis.   &#10;&#10;At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur"/>
                    <parameter key="add label" value="false"/>
                    <parameter key="label_type" value="nominal"/>
                  </operator>
                  <connect from_op="Create Document" from_port="output" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <description align="center" color="transparent" colored="false" width="126">Get a Collection of documents</description>
              </operator>
              <operator activated="true" class="loop_collection" compatibility="9.9.000" expanded="true" height="82" name="Loop Collection" origin="GENERATED_TUTORIAL" width="90" x="179" y="187">
                <parameter key="set_iteration_macro" value="false"/>
                <parameter key="macro_name" value="iteration"/>
                <parameter key="macro_start_value" value="1"/>
                <parameter key="unfold" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" origin="GENERATED_TUTORIAL" width="90" x="112" y="34">
                    <parameter key="mode" value="non letters"/>
                    <parameter key="characters" value=".:"/>
                    <parameter key="language" value="English"/>
                    <parameter key="max_token_length" value="3"/>
                  </operator>
                  <connect from_port="single" to_op="Tokenize" to_port="document"/>
                  <connect from_op="Tokenize" from_port="document" to_port="output 1"/>
                  <portSpacing port="source_single" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <description align="center" color="transparent" colored="false" width="126">Tokenize</description>
              </operator>
              <operator activated="true" class="word2vec:Word2Vec_Learner" compatibility="1.0.000" expanded="true" height="68" name="Word2Vec " origin="GENERATED_TUTORIAL" width="90" x="313" y="187">
                <parameter key="Minimal Vocab Frequency" value="1"/>
                <parameter key="Layer Size" value="200"/>
                <parameter key="Window Size" value="7"/>
                <parameter key="Use Negative Samples" value="0"/>
                <parameter key="Iterations" value="5"/>
                <parameter key="Down Sampling Rate" value="1.0E-4"/>
              </operator>
              <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document (2)" origin="GENERATED_TUTORIAL" width="90" x="45" y="34">
                <parameter key="text" value="Lorem ipsum"/>
                <parameter key="add label" value="false"/>
                <parameter key="label_type" value="nominal"/>
              </operator>
              <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize (2)" origin="GENERATED_TUTORIAL" width="90" x="179" y="34">
                <parameter key="mode" value="non letters"/>
                <parameter key="characters" value=".:"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <operator activated="true" class="collect" compatibility="9.9.000" expanded="true" height="82" name="Collect" origin="GENERATED_TUTORIAL" width="90" x="313" y="34">
                <parameter key="unfold" value="false"/>
              </operator>
              <operator activated="true" class="word2vec:Apply_Word2Vec" compatibility="1.0.000" expanded="true" height="103" name="Apply Word2Vec (Documents) " origin="GENERATED_TUTORIAL" width="90" x="514" y="85"/>
              <connect from_op="Loop" from_port="output 1" to_op="Loop Collection" to_port="collection"/>
              <connect from_op="Loop Collection" from_port="output 1" to_op="Word2Vec " to_port="doc"/>
              <connect from_op="Word2Vec " from_port="mod" to_op="Apply Word2Vec (Documents) " to_port="mod"/>
              <connect from_op="Create Document (2)" from_port="output" to_op="Tokenize (2)" to_port="document"/>
              <connect from_op="Tokenize (2)" from_port="document" to_op="Collect" to_port="input 1"/>
              <connect from_op="Collect" from_port="collection" to_op="Apply Word2Vec (Documents) " to_port="doc"/>
              <connect from_op="Apply Word2Vec (Documents) " from_port="exa" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="aggregate" compatibility="9.9.000" expanded="true" height="82" name="Aggregate" width="90" x="179" y="34">
            <parameter key="use_default_aggregation" value="true"/>
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="regular_expression" value="dimension.*"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes"/>
            <parameter key="group_by_attributes" value=""/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="9.9.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="313" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="average\((.*)\)"/>
            <parameter key="replace_by" value="$1"/>
          </operator>
          <connect from_op="Apply Word2Vec" from_port="out 1" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
    Jonas

Answers

  • jwpfaujwpfau Employee-RapidMiner, Member Posts: 303 RM Engineering
    Hi,

    do you mean something like this?

    <?xml version="1.0" encoding="UTF-8"?><process version="9.9.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="9.9.000" expanded="true" height="82" name="Apply Word2Vec" width="90" x="45" y="34">
            <process expanded="true">
              <operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" origin="GENERATED_TUTORIAL" width="90" x="45" y="187">
                <parameter key="number_of_iterations" value="5"/>
                <parameter key="iteration_macro" value="iteration"/>
                <parameter key="reuse_results" value="false"/>
                <parameter key="enable_parallel_execution" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document" origin="GENERATED_TUTORIAL" width="90" x="45" y="34">
                    <parameter key="text" value="Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.   &#10;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.   &#10;&#10;Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.   &#10;&#10;Nam liber tempor **** soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.   &#10;&#10;Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis.   &#10;&#10;At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur"/>
                    <parameter key="add label" value="false"/>
                    <parameter key="label_type" value="nominal"/>
                  </operator>
                  <connect from_op="Create Document" from_port="output" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <description align="center" color="transparent" colored="false" width="126">Get a Collection of documents</description>
              </operator>
              <operator activated="true" class="loop_collection" compatibility="9.9.000" expanded="true" height="82" name="Loop Collection" origin="GENERATED_TUTORIAL" width="90" x="179" y="187">
                <parameter key="set_iteration_macro" value="false"/>
                <parameter key="macro_name" value="iteration"/>
                <parameter key="macro_start_value" value="1"/>
                <parameter key="unfold" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize" origin="GENERATED_TUTORIAL" width="90" x="112" y="34">
                    <parameter key="mode" value="non letters"/>
                    <parameter key="characters" value=".:"/>
                    <parameter key="language" value="English"/>
                    <parameter key="max_token_length" value="3"/>
                  </operator>
                  <connect from_port="single" to_op="Tokenize" to_port="document"/>
                  <connect from_op="Tokenize" from_port="document" to_port="output 1"/>
                  <portSpacing port="source_single" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <description align="center" color="transparent" colored="false" width="126">Tokenize</description>
              </operator>
              <operator activated="true" class="word2vec:Word2Vec_Learner" compatibility="1.0.000" expanded="true" height="68" name="Word2Vec " origin="GENERATED_TUTORIAL" width="90" x="313" y="187">
                <parameter key="Minimal Vocab Frequency" value="1"/>
                <parameter key="Layer Size" value="200"/>
                <parameter key="Window Size" value="7"/>
                <parameter key="Use Negative Samples" value="0"/>
                <parameter key="Iterations" value="5"/>
                <parameter key="Down Sampling Rate" value="1.0E-4"/>
              </operator>
              <operator activated="true" class="text:create_document" compatibility="9.3.001" expanded="true" height="68" name="Create Document (2)" origin="GENERATED_TUTORIAL" width="90" x="45" y="34">
                <parameter key="text" value="Lorem ipsum"/>
                <parameter key="add label" value="false"/>
                <parameter key="label_type" value="nominal"/>
              </operator>
              <operator activated="true" class="text:tokenize" compatibility="9.3.001" expanded="true" height="68" name="Tokenize (2)" origin="GENERATED_TUTORIAL" width="90" x="179" y="34">
                <parameter key="mode" value="non letters"/>
                <parameter key="characters" value=".:"/>
                <parameter key="language" value="English"/>
                <parameter key="max_token_length" value="3"/>
              </operator>
              <operator activated="true" class="collect" compatibility="9.9.000" expanded="true" height="82" name="Collect" origin="GENERATED_TUTORIAL" width="90" x="313" y="34">
                <parameter key="unfold" value="false"/>
              </operator>
              <operator activated="true" class="word2vec:Apply_Word2Vec" compatibility="1.0.000" expanded="true" height="103" name="Apply Word2Vec (Documents) " origin="GENERATED_TUTORIAL" width="90" x="514" y="85"/>
              <connect from_op="Loop" from_port="output 1" to_op="Loop Collection" to_port="collection"/>
              <connect from_op="Loop Collection" from_port="output 1" to_op="Word2Vec " to_port="doc"/>
              <connect from_op="Word2Vec " from_port="mod" to_op="Apply Word2Vec (Documents) " to_port="mod"/>
              <connect from_op="Create Document (2)" from_port="output" to_op="Tokenize (2)" to_port="document"/>
              <connect from_op="Tokenize (2)" from_port="document" to_op="Collect" to_port="input 1"/>
              <connect from_op="Collect" from_port="collection" to_op="Apply Word2Vec (Documents) " to_port="doc"/>
              <connect from_op="Apply Word2Vec (Documents) " from_port="exa" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="9.9.000" expanded="true" height="82" name="Generate Aggregation" width="90" x="179" y="34">
            <parameter key="attribute_name" value="dim_avg"/>
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="attribute" value="word"/>
            <parameter key="attributes" value=""/>
            <parameter key="regular_expression" value="dimension.*"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="aggregation_function" value="average"/>
            <parameter key="concatenation_separator" value="|"/>
            <parameter key="keep_all" value="true"/>
            <parameter key="ignore_missings" value="true"/>
            <parameter key="ignore_missing_attributes" value="false"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.9.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="word|dimension_aggregated"/>
            <parameter key="regular_expression" value="(dimension|document).*"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <connect from_op="Apply Word2Vec" from_port="out 1" to_op="Generate Aggregation" to_port="example set input"/>
          <connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Greetings,
    Jonas


  • btibertbtibert Member, University Professor Posts: 146 Guru
    edited April 2021
    Not quite.  Your Word2Vec model had a layer size of 200. In your example, I would have expected the output to be  a 1 row by 200 features exampleset. 

    Each dimension in the 1x200 would be the average across the rows form the original document.  In this case, the average for each dimension across the two rows, given the document only had two tokens.
  • btibertbtibert Member, University Professor Posts: 146 Guru
    That's it!  I was playing around with Aggregate earlier to no success.  Thank you for this.
Sign In or Register to comment.