The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Preprocessing: Select attributes from the knowledge of an other table

MoWeiMoWei Member Posts: 18 Maven
edited November 2018 in Help

Hello everybody,

I have only recently started working with RapidMiner and am quite new in the field of machine learning. For a project at my university I'm mainly concerned with the preprocessing of data.

At the moment I am asking myself if it is possible to link two tables so that one table can take "knowledge" from the other table. I don't want to join them. Afterwards I would like to have further two single tables. A concrete example: I add a large dataset (from now on called dataset 1) that contains numerous attributes and examples. Furthermore I add a table in which all available attributes which are shown in the dataset 1 are listed (from now on called process value table). For the analysis of a certain question (from now on called analyse 1), however, I usually only need a small number of the given attributes. In the process value table the rows contain all attributes from dataset 1 and there is a column called "Required for analysis 1". If the value of this column is set to "Yes" for an attribute, then this attribute is needed for "Analysis 1". Now I would like to build my model so that I can say: All attributes where the word "Yes" is written in the column "Required for analysis 1" of the process value table should be selected from data set 1 and played back in order to start the analysis with the selected data afterwards.

I tried do showed it better with some pics, but I can't upload picture at the moment. I have to spend more time here, the system says :(

Does anyone have an idea how to implement this in RapidMiner? Maybe about some detours, transforming the process value table or something else?

I would be very grateful if someone can help me.

Many thanks in advance

Best regards

Moritz

Best Answer

  • MoWeiMoWei Member Posts: 18 Maven
    edited November 2018 Solution Accepted
    Hey Jan,

    many thanks for your answer. 

    I need a long time to know how to put your XML-Code in RM to see what is really happing in the process. I have never done it before. First I tried to understand the XML-Code and recreate in RM by myself, but anywhere I made a mistake :D But now I know how to put someone else's XML-Code into RM, thx. That should help me for my next problems :D

    Your build process does almost exactly what I wanted to have. Thank you very much! It is not completely correct that both tables have the same attribute names. It is a little bit different. In the table which says "which attribute to use" the actual attribute names of the data set staying among each other in the rows. So in this table "Use for analysis 1?" is an attribute and the actual attributes from the data set are "examples". But I think I can work with the operator "Transpose" to fix this problem, don't you? Or do you have an better idea? Sry, I hope you understand what I wanted to say, my English is not the best. Normally I wanted to underpin it with pictures but as you could read it does not work at the moment.

    Next step is to understand what you are doing in your process exactly and understand what all the operators do detailed. My plan is to have more columns f.e. "Use for analysis 1", "Use for analysis 2" and so on. Later I want use concrete values (f.e. min or max values) out of the "which attribute to use" table, which later should have more information than "Yes, I need them" or "No, I need them not".

    All in all thank you pretty much. I would be awesome if you can help me on way to reach my target :)

    Best regards

    Moritz





Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    i am bit confused, but it pretty much sounds like you want to do a Join operation first and then filter?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MoWeiMoWei Member Posts: 18 Maven

    Hi Martin,

    thanks for your answer.

    No, I don't want to join them. Two individual tables are to remain. I only want to select the attributes from the data set (table 1) that are marked with "Yes I need them" in the process value table (table 2). So I want to use the "knowledge" from the process value table to work with the data set, but don't put them together.

    Do you know how long I have been here before can upload some pictures?

    Thank you

    Greetings

    Moritz

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    @MoWei sorry about the upload pic problem. It's a new community site and I'm trying to work out all the small kinks. It should be fixed soon. Scott
  • jczogallajczogalla Employee-RapidMiner, Member Posts: 144 RM Engineering
    Hi @MoWei!
    If I understood you correctly, you can do something similar tot he following sample process:
    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003"><br>  <context><br>    <input/><br>    <output/><br>    <macros/><br>  </context><br>  <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process"><br>    <process expanded="true"><br>      <operator activated="true" class="generate_data" compatibility="9.0.003" expanded="true" height="68" name="Generate Data" width="90" x="45" y="187"><br>        <description align="center" color="transparent" colored="false" width="126">Your data with all attributes</description><br>      </operator><br>      <operator activated="true" class="generate_data_user_specification" compatibility="9.0.003" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34"><br>        <list key="attribute_values"><br>          <parameter key="att1" value="&quot;no&quot;"/><br>          <parameter key="att2" value="&quot;yes&quot;"/><br>          <parameter key="att3" value="&quot;yes&quot;"/><br>          <parameter key="att4" value="&quot;no&quot;"/><br>          <parameter key="att5" value="&quot;yes&quot;"/><br>        </list><br>        <list key="set_additional_roles"/><br>        <description align="center" color="transparent" colored="false" width="126">Your information which attributes to use</description><br>      </operator><br>      <operator activated="true" class="concurrency:loop_attributes" compatibility="9.0.003" expanded="true" height="103" name="Loop Attributes" width="90" x="313" y="34"><br>        <parameter key="reuse_results" value="true"/><br>        <process expanded="true"><br>          <operator activated="true" class="extract_macro" compatibility="9.0.003" expanded="true" height="68" name="Extract Macro" width="90" x="112" y="34"><br>            <parameter key="macro" value="att_in_use"/><br>            <parameter key="macro_type" value="data_value"/><br>            <parameter key="attribute_name" value="%{loop_attribute}"/><br>            <parameter key="example_index" value="1"/><br>            <list key="additional_macros"/><br>            <description align="center" color="transparent" colored="false" width="126">Get the value of the attribute information</description><br>          </operator><br>          <operator activated="true" class="branch" compatibility="9.0.003" expanded="true" height="103" name="Branch" width="90" x="380" y="34"><br>            <parameter key="condition_type" value="expression"/><br>            <parameter key="condition_value" value="#{loop_attribute}==&quot;no&quot;"/><br>            <parameter key="expression" value="%{att_in_use}==&quot;no&quot;"/><br>            <process expanded="true"><br>              <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="112" y="85"><br>                <parameter key="attribute_filter_type" value="single"/><br>                <parameter key="attribute" value="%{loop_attribute}"/><br>                <parameter key="invert_selection" value="true"/><br>                <description align="center" color="transparent" colored="false" width="126">Remove current attribute &lt;br/&gt;(single selection + invert)</description><br>              </operator><br>              <connect from_port="condition" to_port="input 1"/><br>              <connect from_port="input 1" to_op="Select Attributes" to_port="example set input"/><br>              <connect from_op="Select Attributes" from_port="example set output" to_port="input 2"/><br>              <portSpacing port="source_condition" spacing="0"/><br>              <portSpacing port="source_input 1" spacing="0"/><br>              <portSpacing port="source_input 2" spacing="0"/><br>              <portSpacing port="sink_input 1" spacing="0"/><br>              <portSpacing port="sink_input 2" spacing="0"/><br>              <portSpacing port="sink_input 3" spacing="0"/><br>            </process><br>            <process expanded="true"><br>              <connect from_port="condition" to_port="input 1"/><br>              <connect from_port="input 1" to_port="input 2"/><br>              <portSpacing port="source_condition" spacing="0"/><br>              <portSpacing port="source_input 1" spacing="0"/><br>              <portSpacing port="source_input 2" spacing="0"/><br>              <portSpacing port="sink_input 1" spacing="0"/><br>              <portSpacing port="sink_input 2" spacing="0"/><br>              <portSpacing port="sink_input 3" spacing="0"/><br>              <description align="center" color="transparent" colored="true" height="104" resized="true" width="233" x="100" y="22">Output = Input, nothing to do</description><br>            </process><br>            <description align="center" color="transparent" colored="false" width="126">If the information says &amp;quot;no&amp;quot;, remove that attribute from your data</description><br>          </operator><br>          <connect from_port="input 1" to_op="Extract Macro" to_port="example set"/><br>          <connect from_port="input 2" to_op="Branch" to_port="input 1"/><br>          <connect from_op="Extract Macro" from_port="example set" to_op="Branch" to_port="condition"/><br>          <connect from_op="Branch" from_port="input 1" to_port="output 1"/><br>          <connect from_op="Branch" from_port="input 2" to_port="output 2"/><br>          <portSpacing port="source_input 1" spacing="0"/><br>          <portSpacing port="source_input 2" spacing="0"/><br>          <portSpacing port="source_input 3" spacing="0"/><br>          <portSpacing port="sink_output 1" spacing="0"/><br>          <portSpacing port="sink_output 2" spacing="0"/><br>          <portSpacing port="sink_output 3" spacing="0"/><br>        </process><br>        <description align="center" color="transparent" colored="false" width="126">Loop attributes of your information table</description><br>      </operator><br>      <connect from_op="Generate Data" from_port="output" to_op="Loop Attributes" to_port="input 2"/><br>      <connect from_op="Generate Data by User Specification" from_port="output" to_op="Loop Attributes" to_port="input 1"/><br>      <connect from_op="Loop Attributes" from_port="output 2" to_port="result 1"/><br>      <portSpacing port="source_input 1" spacing="0"/><br>      <portSpacing port="sink_result 1" spacing="0"/><br>      <portSpacing port="sink_result 2" spacing="0"/><br>    </process><br>  </operator><br></process><br><br>

    It is my understanding that both tables have the same attribute names, correct? One table has the data, the other has one (or more) examples that say if that attribute is needed. You can extract that value as a macro and branch on that macro value. If the value says "no", just remove this attribute from the dataset (this does not change the original dataset). At the end you get a table with all needed attributes.

    I hope this helps! Cheers
    Jan
  • MoWeiMoWei Member Posts: 18 Maven
    edited November 2018
    Hi @jczogalla,

    thank you for your answer. I need a long time to know how to put your XML-Code into RM. I have never done it before. First I tried to understand the XML-Code and to recreate your process, but anywhere I made I mistake. But I now how to do it so that it should help me for my next problems. :D

    Your build process does almost exactly what I wanted to have. Thank you pretty much! Just the the understanding that both tables have the same attribute names is noch completely correct. In the table "which attributes to use" the attributes of the data set are below each other in the rows and "Use for analysis 1" is a column. So that means in this table the attributes of the data set are the "examples" and "Use for analysis 1" is the attribute. But I thin can fix this problem with the "transpose" operator, don't you? Or do you have an better idea? Sry, I hope you understand what I mean, my English is not the best. I wanted to underpin it all with pictures, but as you can read it is not possible to upload pictures at the moment.

    My next step ist understand what your build process does exactly and understand all the operators detailed. Like you said I have more attributes in the "which attributes to use" table f.e. "Use for Analysis 2?", "Use for analysis 3?" and so on. An the end I want to use concrete values (f.e. min or max values) out of the "which attributes to use" table to use it to analyse my data. Hopefully it will work.

    All in all thank you pretty much. It would be awesome if you could help me with the next problems and to reach my targets.

    Best regards

    Moritz
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Select by Weights will help you to do what you are asking about. You give it an exampleset as an input along with a set of attribute weights (which you can easily create in RapidMiner as another exampleset and then Set Role) and it will return only the attributes that have a weight in the range you specify.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.