The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Data transformation
SagioProject
Member Posts: 2 Contributor I
Hi everybody,
For a university project I need to transform a dataset, but I really don't know how to do it in RapidMiner. Could you help me out?
The dataset contains event logs captured from a website. Attributes are timestamp, ip_address, browser_info and some other less important ones.
I generated a new attribute Date, which is only the date, without time.
Then I generated a new attribute Session_ID by concatenating the Date, ip_address and browser_info attributes.
The examples with the same Session_ID are events that occurred on the same day, by the same ip address and by the same browser.
What I now want to do is to split up these sessions. If there is a gap between 2 successive events of 30 minutes or more, I want them to be splitted in 2 different groups. I want to do that by generating a new attribute Session_in_day, which can be 1, 2, 3, ... according to the "smaller session" this example is in.
In MatLab I was more or less able to write a program to do this, but I have no clue how to do this in RapidMiner. Anyone?
For a university project I need to transform a dataset, but I really don't know how to do it in RapidMiner. Could you help me out?
The dataset contains event logs captured from a website. Attributes are timestamp, ip_address, browser_info and some other less important ones.
I generated a new attribute Date, which is only the date, without time.
Then I generated a new attribute Session_ID by concatenating the Date, ip_address and browser_info attributes.
The examples with the same Session_ID are events that occurred on the same day, by the same ip address and by the same browser.
What I now want to do is to split up these sessions. If there is a gap between 2 successive events of 30 minutes or more, I want them to be splitted in 2 different groups. I want to do that by generating a new attribute Session_in_day, which can be 1, 2, 3, ... according to the "smaller session" this example is in.
In MatLab I was more or less able to write a program to do this, but I have no clue how to do this in RapidMiner. Anyone?
0
Answers
you can create a new attribute, which indicates whether there was a pause or not.
To do so i would recommend using the time series extensions lag operator. Sort by Timestamp, use the Lag operator to get a new coloumn with the previous timestamp and use Generate attributes with or whatever you are comforable with.
Cheers,
Martin
Dortmund, Germany