The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Simple preprocessing methods
Hello there,
I'm looking for simple preprocessing methods.
Maybe I'm just blind but I can't find anything that matches my criteria.
1. A simple recoding.
For example produce an atrribute B out of an existing attribute A (containing values from 1 to 5) by the following rules:
A: 1,2 --> B: 1
A: 3,4 --> B: 2
A: 5 --> B: 3
2. A simple condition-knot.
For example produce an attribute B out of an existing attribute A (containing ages of humans) like this:
A: 1-18 / to 18 --> B: 1 or "young"
A: 19-40 --> B: 2 or "midage"
A: from 41--> B: 3 or "old"
Thank you in advance.
Greets from the baltic sea,
Sebastian L.
I'm looking for simple preprocessing methods.
Maybe I'm just blind but I can't find anything that matches my criteria.
1. A simple recoding.
For example produce an atrribute B out of an existing attribute A (containing values from 1 to 5) by the following rules:
A: 1,2 --> B: 1
A: 3,4 --> B: 2
A: 5 --> B: 3
2. A simple condition-knot.
For example produce an attribute B out of an existing attribute A (containing ages of humans) like this:
A: 1-18 / to 18 --> B: 1 or "young"
A: 19-40 --> B: 2 or "midage"
A: from 41--> B: 3 or "old"
Thank you in advance.
Greets from the baltic sea,
Sebastian L.
0
Answers
there is a simple operator called UserBasedDiscretization what exactly does what your are searching for. To solve your second problem you might edit the list as follows:
First line is called young and its upper limit is 18. So the interval will be negative infinity to 18
Second line is called midage and its upper limit is 40. The interval will be >= 18 and < 40.
This would look like tihs in XML To solve the first problem you could make use of UserBasedDiscretization and another operator called NominalNumbers2Numerical. I think you can quite comprehend what this leads to Just enter as new value a number like "1" or "2" and then use this operator to change that attribute into a numerical one, if you need it numerical. If you need to process changes only on one or a few of all attributes, use AttributeSubsetPreprocessing, to select the attributes the inner operators should work on.
Hope I could help,
Greetings Sebastian
Unfortunately it's not working as it should be.
When I use the knot Nominal2Numeric the values are changed completely.
A "48" maybe is changed to a "1". (not the mentioned recoding)
The problem is that without this knot the recoding isn't done on this value.
A second problem accured. How can I delete rows with missing values. Incomplete ones in other words.
You told me to use AttributeSubsetPreprocessing, to select the attributes I need to process changes on.
But this knot is only able to selct one attribute. Isn't it?
Maybe there is a possibility to define more than one atrribute in "attribute_name_regex"?
I need to define the attributes by name wich the following is processed on.
Thx for any help.
hm, thats three questions in a single posting ... so, here we go ... You might have missed that (the other) Sebastian has recommended the NominalNumbers2Numeric operator, not the Nominal2Numeric operator! This should work as expected.
This can be done by filtering the example set with the operator called ExampleFilter and setting the condition_class parameter to "no_missing_attributes". The "attribute_name_regex" parameter indeed does allow regular expressions to define the attributes. Hence, the operators inside the AttributeSubsetPreprocessing are applied on all attributes matching the regular expressions. If you want e.g. to apply the inner operators on say two attributes called age and weight, the corresponding regular expression which lets you chose these attributes is age|weight . You may find additional information on regular expressions in the Rapidminer tutorial which is available on the documentation area of our website:
http://rapid-i.com/content/view/36/83/lang,de/
How that helps to solve your problems,
regards,
Tobias
You halped me a lot. It's working now very well.
But theres another question. You wrote:
I'm not familiar with regular expressions. You gave an example with an " | " to combine two attributes.
The tutorial is in this case a bit "meager". Is there any good explaination of all expressions? (wildcards, ...)
Here my case:
I've 56 attributes (e.g. ID, age, regio, ANT_U30, ANT_U35, ... , P_Expert, P_Vkude,...).
Now I wanna filter all attributes beginning with "ANT_" because there are twelve of them and I don't wanna write them all down separately.
In short a shortcut for "ANT_U20|ANT_U25|ANT_U30|ANT_U35|...".
Thx in advance.
Sebastian
The pattern you are looking for is: ANT_*
where * is representing any letter.
To learn more about regular expressions:
basic concepts: http://en.wikipedia.org/wiki/Regular_expression
tutorial for regular expressions in java : http://www.javaregex.com/tutorial.html (weird design, but the tutorial is nice)
hope this was helpful
greetings
Steffen