The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Flatten an Exampleset

btibertbtibert Member, University Professor Posts: 146 Guru
edited February 2021 in Help
In the event that I had a dataset, say, that was 28x28.  I want to take this 2D matrix and make it 1D with 784 attributes.  

The obvious analogy here is the MNIST dataset in taking the 28x28 image to 1-row-per-image with the 784 features per image.

Admittedly I very rarely have the need to work with collections, and flows to assemble a dataset, but it's not jumping out to me as to how I can actually flatten an ExampleSet.

Context: With the image tools, we can a collection of length 3, one exampleset per color channel.  


Best Answer

  • pschlunderpschlunder Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research
    Solution Accepted
    Thanks for the details. Currently we don't have a reshape operator, also you can use "Execute Python Script" to call numpy's reshape.

    The process I've shared though, should flatten you any example set.

    You did not miss anything, right now there's no option to go from Tensor to ExampleSet, besides applying a model and getting the result than as an ExampleSet/Collection of ExampleSets. Both Tensor to ExampleSet and reshape are definetly operators we'd need to add in the future, as well as visualizations for Tensors.

    Please let me know if you have any problems with the flattening process provided.

Answers

  • pschlunderpschlunder Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research
    edited February 2021

    I'm assuming your question is how to flatten an ExampleSet. I'll explain in a moment, but let me first mention, that you might not need to flatten your data here. Given the assumption you're working with both the Deep Learning and Image Handling Extensions, then you don't necessarily need to transform the image. You can read in images as is, with the "Read Image Meta Data" Operator, perform transformations like like normalization inside the "Pre-Process Images" operator and then pipe the output to the Deep Learning (Tensor) Operator to directly train ANNs on given images. The tutorial process of the "Pre-Process Images" operator of the Image Handling Extension might be of interest for you.
    The "Read Image as ExampleSet" Operator is mostly there for doing manual manipulations of image data that go beyond transformations we're currently supporting with the included transformer operators (the ones going into the pre-processing operator).

    Coming back to your question. If targeting the training of an ANN and you're sure, that you want to flatten your image, than the target is to have one image per Example (so per row). You can achieve this by still using the "Read Image Meta Data" and "Pre-Process Images" operators to obtain a tensor of all images you want to use for training. But follow it up with a "Deep Learning (Tensor)" operator, disable the advanced option "infer input shape" and select "Convolutional flattened". Than you can manually input the dimensions and internally the images will be flattened before being used as an input.

    But if you want to manually convert a matrix into a row vector for something else, then you can do it like this (requires the merge operator from the operator toolbox):

    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.8.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="loop_examples" compatibility="9.8.001" expanded="true" height="103" name="Loop Examples" width="90" x="179" y="34">
            <parameter key="iteration_macro" value="example"/>
            <process expanded="true">
              <operator activated="true" class="filter_example_range" compatibility="9.8.001" expanded="true" height="82" name="Filter Example Range" width="90" x="112" y="34">
                <parameter key="first_example" value="%{example}"/>
                <parameter key="last_example" value="%{example}"/>
                <parameter key="invert_filter" value="false"/>
              </operator>
              <connect from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
              <connect from_op="Filter Example Range" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="operator_toolbox:merge" compatibility="2.9.000" expanded="true" height="82" name="Merge Attributes" width="90" x="380" y="34">
            <parameter key="handling_of_duplicate_attributes" value="rename"/>
            <parameter key="handling_of_special_attributes" value="keep_first_special_other_regular"/>
            <parameter key="handling_of_duplicate_annotations" value="rename"/>
          </operator>
          <connect from_port="input 1" to_op="Loop Examples" to_port="example set"/>
          <connect from_op="Loop Examples" from_port="output 1" to_op="Merge Attributes" to_port="example set 1"/>
          <connect from_op="Merge Attributes" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Hope this helps,
    Philipp
  • btibertbtibert Member, University Professor Posts: 146 Guru
    edited February 2021
    Thanks.  I do want to flatten it, but this is not for a deep learning application admittedly.  My need is to help build the intuition of PCA, which is why I am aiming to have this as an ExampleSet.  I see the ability to go from ExampleSet to Tensor, but the inverse does not seem to exist, though I could be missing something obvious.

    And perhaps to ask my question differently.  Is it possible to change the shape of ExampleSets.  For example, we could take the 28x28 and make it a 1 x 784, or take that same 1x784 and reshape to 28x28?  I am thinking about numpy.reshape.
  • btibertbtibert Member, University Professor Posts: 146 Guru
    Thanks, I will give it a spin.  For additional context, this is an MBA Course, where we intentionally do not write code.  I love RapidMiner for this reason, as I often discuss that we are visually programming; no-code tools.  As such, I don't want to introduce python for this task but will review your process above.  Thanks again.
  • pschlunderpschlunder Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research
    Great, let me know if there are any more problems :)
Sign In or Register to comment.