Simple preprocessing methods

Contemno · June 2008

Hello there,

I'm looking for simple preprocessing methods.
Maybe I'm just blind but I can't find anything that matches my criteria.

1. A simple recoding.
For example produce an atrribute B out of an existing attribute A (containing values from 1 to 5) by the following rules:
A: 1,2 --> B: 1
A: 3,4 --> B: 2
A: 5 --> B: 3

2. A simple condition-knot.
For example produce an attribute B out of an existing attribute A (containing ages of humans) like this:
A: 1-18 / to 18 --> B: 1 or "young"
A: 19-40 --> B: 2 or "midage"
A: from 41--> B: 3 or "old"

Thank you in advance.
Greets from the baltic sea,
Sebastian L.

land · June 2008

Hi Sebastian,
there is a simple operator called UserBasedDiscretization what exactly does what your are searching for. To solve your second problem you might edit the list as follows:
First line is called young and its upper limit is 18. So the interval will be negative infinity to 18
Second line is called midage and its upper limit is 40. The interval will be >= 18 and < 40.

This would look like tihs in XML


          <parameter key="young"	value="18.0"/>
          <parameter key="midage"	value="40.0"/>
          <parameter key="old"	value="2000.0"/>

To solve the first problem you could make use of UserBasedDiscretization and another operator called NominalNumbers2Numerical. I think you can quite comprehend what this leads to

Just enter as new value a number like "1" or "2" and then use this operator to change that attribute into a numerical one, if you need it numerical. If you need to process changes only on one or a few of all attributes, use AttributeSubsetPreprocessing, to select the attributes the inner operators should work on.

Hope I could help,
Greetings Sebastian

Contemno · June 2008

Thx for your answer.
Unfortunately it's not working as it should be.

When I use the knot Nominal2Numeric the values are changed completely.
A "48" maybe is changed to a "1". (not the mentioned recoding)

The problem is that without this knot the recoding isn't done on this value.

A second problem accured. How can I delete rows with missing values. Incomplete ones in other words.

You told me to use AttributeSubsetPreprocessing, to select the attributes I need to process changes on.
But this knot is only able to selct one attribute. Isn't it?
Maybe there is a possibility to define more than one atrribute in "attribute_name_regex"?
I need to define the attributes by name wich the following is processed on.

Thx for any help.

TobiasMalbrecht · June 2008

Hi Sebastian,

hm, thats three questions in a single posting ... so, here we go ...

Contemno wrote:

Thx for your answer.
Unfortunately it's not working as it should be.

When I use the knot Nominal2Numeric the values are changed completely.
A "48" maybe is changed to a "1". (not the mentioned recoding)

The problem is that without this knot the recoding isn't done on this value.

You might have missed that (the other) Sebastian has recommended the NominalNumbers2Numeric operator, not the Nominal2Numeric operator! This should work as expected.

Contemno wrote:

A second problem accured. How can I delete rows with missing values. Incomplete ones in other words.

This can be done by filtering the example set with the operator called ExampleFilter and setting the condition_class parameter to "no_missing_attributes".

Contemno wrote:

You told me to use AttributeSubsetPreprocessing, to select the attributes I need to process changes on.
But this knot is only able to selct one attribute. Isn't it?
Maybe there is a possibility to define more than one atrribute in "attribute_name_regex"?
I need to define the attributes by name wich the following is processed on.

The "attribute_name_regex" parameter indeed does allow regular expressions to define the attributes. Hence, the operators inside the AttributeSubsetPreprocessing are applied on all attributes matching the regular expressions. If you want e.g. to apply the inner operators on say two attributes called age and weight, the corresponding regular expression which lets you chose these attributes is age|weight . You may find additional information on regular expressions in the Rapidminer tutorial which is available on the documentation area of our website:

http://rapid-i.com/content/view/36/83/lang,de/

How that helps to solve your problems,
regards,
Tobias

Contemno · June 2008

Thank you so much Tobias.
You halped me a lot. It's working now very well.

But theres another question. You wrote:

If you want e.g. to apply the inner operators on say two attributes called age and weight, the corresponding regular expression which lets you chose these attributes is age|weight .

I'm not familiar with regular expressions. You gave an example with an " | " to combine two attributes.
The tutorial is in this case a bit "meager". Is there any good explaination of all expressions? (wildcards, ...)

Here my case:
I've 56 attributes (e.g. ID, age, regio, ANT_U30, ANT_U35, ... , P_Expert, P_Vkude,...).
Now I wanna filter all attributes beginning with "ANT_" because there are twelve of them and I don't wanna write them all down separately.
In short a shortcut for "ANT_U20|ANT_U25|ANT_U30|ANT_U35|...".

Thx in advance.
Sebastian

steffen · June 2008

Hello Sebastian

The pattern you are looking for is: ANT_*
where * is representing any letter.

To learn more about regular expressions:
basic concepts: http://en.wikipedia.org/wiki/Regular_expression
tutorial for regular expressions in java : http://www.javaregex.com/tutorial.html (weird design, but the tutorial is nice)

hope this was helpful

greetings

Steffen

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Simple preprocessing methods

Answers