"How to aggregate age in polynominal type"
Hello, all
In my raw data set, I got age attribute as following example:
John, 64,
Alice, 33years,
Bob, 22years,
Mike, 50
So some of the value with a redundant 'years' at the end. What I eventually need is to check the average of age, and also group example into age group ( 0-9,10-19,20-29.and etc )
1) If i set the attribute type as integer when read file, then every example with redundant 'years' will only get as missing value.
2) If I read the attribute as polynominal, and then use replace operator to remove the redudant part from attribute value. and then apply nominal to numeric, but still what I got is not a column with numeric type
is there a workaround for that?
Answers
Hi,
read in as polynominal, than use Replace on it with a regex like:
(\d+).+
replace by
$1
that way you have only the digits in the column.
Then you transform it to numerical with the Parse Numbers operator.
Afterwards, you can use one of the Discretize operators to get bins and Aggregate to get avg() per Bin.
Cheers,
Martin
Edit: And here is an example for it. Maybe we need to adjust this regex a bit
Dortmund, Germany