Preprocessing: Select attributes from the knowledge of an other table
Hello everybody,
I have only recently started working with RapidMiner and am quite new in the field of machine learning. For a project at my university I'm mainly concerned with the preprocessing of data.
At the moment I am asking myself if it is possible to link two tables so that one table can take "knowledge" from the other table. I don't want to join them. Afterwards I would like to have further two single tables. A concrete example: I add a large dataset (from now on called dataset 1) that contains numerous attributes and examples. Furthermore I add a table in which all available attributes which are shown in the dataset 1 are listed (from now on called process value table). For the analysis of a certain question (from now on called analyse 1), however, I usually only need a small number of the given attributes. In the process value table the rows contain all attributes from dataset 1 and there is a column called "Required for analysis 1". If the value of this column is set to "Yes" for an attribute, then this attribute is needed for "Analysis 1". Now I would like to build my model so that I can say: All attributes where the word "Yes" is written in the column "Required for analysis 1" of the process value table should be selected from data set 1 and played back in order to start the analysis with the selected data afterwards.
I tried do showed it better with some pics, but I can't upload picture at the moment. I have to spend more time here, the system says
Does anyone have an idea how to implement this in RapidMiner? Maybe about some detours, transforming the process value table or something else?
I would be very grateful if someone can help me.
Many thanks in advance
Best regards
Moritz
Best Answer
-
MoWei Member Posts: 18 MavenHey Jan,
many thanks for your answer.
I need a long time to know how to put your XML-Code in RM to see what is really happing in the process. I have never done it before. First I tried to understand the XML-Code and recreate in RM by myself, but anywhere I made a mistake But now I know how to put someone else's XML-Code into RM, thx. That should help me for my next problems
Your build process does almost exactly what I wanted to have. Thank you very much! It is not completely correct that both tables have the same attribute names. It is a little bit different. In the table which says "which attribute to use" the actual attribute names of the data set staying among each other in the rows. So in this table "Use for analysis 1?" is an attribute and the actual attributes from the data set are "examples". But I think I can work with the operator "Transpose" to fix this problem, don't you? Or do you have an better idea? Sry, I hope you understand what I wanted to say, my English is not the best. Normally I wanted to underpin it with pictures but as you could read it does not work at the moment.
Next step is to understand what you are doing in your process exactly and understand what all the operators do detailed. My plan is to have more columns f.e. "Use for analysis 1", "Use for analysis 2" and so on. Later I want use concrete values (f.e. min or max values) out of the "which attribute to use" table, which later should have more information than "Yes, I need them" or "No, I need them not".All in all thank you pretty much. I would be awesome if you can help me on way to reach my target
Best regards
Moritz0
Answers
i am bit confused, but it pretty much sounds like you want to do a Join operation first and then filter?
Best,
Martin
Dortmund, Germany
Hi Martin,
thanks for your answer.
No, I don't want to join them. Two individual tables are to remain. I only want to select the attributes from the data set (table 1) that are marked with "Yes I need them" in the process value table (table 2). So I want to use the "knowledge" from the process value table to work with the data set, but don't put them together.
Do you know how long I have been here before can upload some pictures?
Thank you
Greetings
Moritz
thank you for your answer. I need a long time to know how to put your XML-Code into RM. I have never done it before. First I tried to understand the XML-Code and to recreate your process, but anywhere I made I mistake. But I now how to do it so that it should help me for my next problems.
Your build process does almost exactly what I wanted to have. Thank you pretty much! Just the the understanding that both tables have the same attribute names is noch completely correct. In the table "which attributes to use" the attributes of the data set are below each other in the rows and "Use for analysis 1" is a column. So that means in this table the attributes of the data set are the "examples" and "Use for analysis 1" is the attribute. But I thin can fix this problem with the "transpose" operator, don't you? Or do you have an better idea? Sry, I hope you understand what I mean, my English is not the best. I wanted to underpin it all with pictures, but as you can read it is not possible to upload pictures at the moment.
All in all thank you pretty much. It would be awesome if you could help me with the next problems and to reach my targets.
Best regards
Moritz
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts