The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[Solved] Calculating the deltas between following examples
Dear all,
I have a data set like this
i=id att1 att2
1 5 1
2 8 4
3 3 3
4 4 7
Now I would like to transform this into a new example set by applying the following rule:
Subtract example i+1 of attribute x by example i of the same attribute (e.g. "8-5")
Even better would be a custom formula that allows to calculate the percental change between two following examples (e.g. "(8-5)/5*100" )
I tried the "distance transformation" operator of the series extension for Rapidminer. However, it only provides absolutes while it remains unclear wheter the delta is positive or negative. Moreover, this operator additionally requires transformation from data to series and back.
Another way I could think of is to use the "windowing" operator by generating additional attributes shifted by one example. Then one could apply the "generate attributes" operator for calculation. However, I wasn't able so far to figure out a working process.
Especially as I have to run it with different attributes all the time so that an automated handling of the attribute's names would be highly appreciated.
Search tags "delta" and "distance" revealed no useful results.
Looking forward to hearing from you
Sachs
I have a data set like this
i=id att1 att2
1 5 1
2 8 4
3 3 3
4 4 7
Now I would like to transform this into a new example set by applying the following rule:
Subtract example i+1 of attribute x by example i of the same attribute (e.g. "8-5")
Even better would be a custom formula that allows to calculate the percental change between two following examples (e.g. "(8-5)/5*100" )
I tried the "distance transformation" operator of the series extension for Rapidminer. However, it only provides absolutes while it remains unclear wheter the delta is positive or negative. Moreover, this operator additionally requires transformation from data to series and back.
Another way I could think of is to use the "windowing" operator by generating additional attributes shifted by one example. Then one could apply the "generate attributes" operator for calculation. However, I wasn't able so far to figure out a working process.
Especially as I have to run it with different attributes all the time so that an automated handling of the attribute's names would be highly appreciated.
Search tags "delta" and "distance" revealed no useful results.
Looking forward to hearing from you
Sachs
0
Answers
Loop though the examples using macros.
Best H
I think you're on the right track with the series/windowing operators. The ones you're looking for are "Lag" (which finds the previous value) and "Differentiate" (which finds the difference in absolute (signed) terms). Then all you need to need to do is generate the % based on these two values. Since both operators require an attribute as argument, you need to wrap them in a Loop Attributes Operator to repeat for multiple attributes in an example set. I'll attach examples for single and multiple attributes using the Iris dataset (nonsense values of course), which you should be able to adapt - as soon as I've worked out how!
Cheers,
Russ
Looks like attachments aren't possible (really?!), here's the XML, just cut out, save and import.
Single Attribute:
Multiple Attributes: HTH,
Russ
Hi there!
That really worked out Thank you!
I would have never expected the function "difference" under an operator called "differentiate".
Isn't that something completly different?
Anyway, glad to have this operator being part of Rapidminer
@haddock: Just to get the idea behing your approach: Do you mean something like in the attached code? While I loop through the examples I store the last one in a macro to do calculation before iterating to the next example.
Observation 1: The very first calculated value is wrong because I need to initialize the macro. Of course, this could be filtered / corrected later after the loop.
Observation 2: It is not possible to use the "generate attributes" operator in the loop because that way it would overwrite the new attribute all the time and in the end it would read the same value in all lines.
That's probably not surprising to the more experienced user but I wanted to share what I came across on my learning curve.
PS: Indeed, there is no upload function - at least not to my knowledge as I was looking for it also.
Thank you all
Sachs
Glad it helped
Russ