Access next row data
Hello.
I'm doing my final work at university and I get some doubts.
In first place I wanna know if there's some way to access data in the next row.
In order to access previous row data I used Lag series operator but I can't find the way to do so on the next register.
My data is like this:
Discussion Userid Parent Created Modified
1 1 0 12 14
1 2 82 15 16
1 1 85 17 20
1 3 85 22 24
2 45 0 26 32
2 48 89 33 34
2 46 90 34 35
I wanna calculate, for each userid, difference between modified(i+1)-created(i).
The attribute parent=0 means that's the first message on a discussion.
With that I wanna to calculate how many time is the between a message from a userid and his response.
For the first row I wanna 1 1 0 (16-12)=4
How can I do that? Is there a way to know what row corresponds to the last message of a discussion? How can I underline the previous row of a row with parent=0?
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
You can generate temporary unique ids using the Generate ID operator upstream and do the joins downstream. Then you can use a Select Attributes with invert toggled on to select that ID column attribute out. I do this all the time.
2
Answers
If you reverse the Sort order of your dataset then you should be able to use Lag again for this.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I don't understand.....can you explain me better?
hello @bea11005 - perhaps this will help.
Basically you need to Sort each discussion first, then Lag. See my process.
Scott
I can't use Loop Values......the process ends with no exit....
hello @bea11005 - I'd recommend posting your XML process here (see "Read Before Posting" on right when you reply) and attach your dataset. This way we can replicate what you're doing and help you better.
Scott
Telcontar120 I can reverse the order twice on modified attribute because if I do, messages change their order and my process wouldn't be correct......
Hi,
if you have unique keys (IDs) in your example set, you can create a copy of it using Multiply, sort that the way you want, generate the required attribute, and join back based on the ID.
Regards,
Balázs
I don't have unique id's......so I can't.
Other thing I wanna know is that if it's possible to split my data depending on the value of attribute discussion.
I wanna calculate difference between messages until I arrive to the last message of a discussion, where the distance will be 0 because ther'e no next message. I need this modified(i+1)-created(i) for all the messages except de last in a discussion.
I've tried Loop values but I can't get any exit of this process...... how can I do both things?
ooooo...... that's a good idea....I will try with the ID's generation but it seems it will work...
Now I wanna know how to split data depending on value of discussion attribute....
Hi!
Loop Values is the operator you need. Inside the loop you can access the current value with the %{loop_value} macro by default. See the attached example:
Make sure that "Enable parallel execution" is switched off.
Also, the loop attribute needs to be nominal. You can either create a copy of your original attribute and convert that to nominal (with Numerical to Polynominal or Format Numbers) or just convert the original if you don't need it in the numeric format later.
Regards,
Balázs