The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Finding Peak Times in a timeseries dataset
Hi there,
I am working with a dataseries that has a date-time stamp in one column. I am looking for a way to identify what are the peak times over the duration of the collected date-time stamps, is there a way to handle this in Rapidminer? If further details are needed, please let me know. Thanks.
Tagged:
0
Answers
How do you define "peak" for this purpose? Finding a single maximum in a series is easily done using a number of different operators. But finding "peaks" might imply some kind of underlying periodic function or a variable definition of what exactly constitutes a peak. That kind of analysis is a bit trickier---you might want to check out the Series extension from the marketplace and look at some of the operators in there.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks for the quick reply. By peak I am referring to the time of a given day that is the highest. I am trying to determine at what times of the day usage is highest , the time has been recorded in 30 minute intervals over a 140 day period. I hope this clarifies. Is there a particular operator in the time series extension package you would recommend?
It sounds like you have many separate days worth of data, so if you are looking for patterns, you can simply aggregate by time of day (if you have 30 minute intervals then you should have 48 data points per day) and then calculate the average and variance of each one---this will give you a sense of which times are more likely to be higher than others. You can also get the minimum and maximum for each time of day to see how that compares to the average.
However, if you are looking to identify the specific time slot on each individual day that corresponds to the maximum value for that day, the process is going to be more complex---you'll have to aggregate by each day to calculate the maximum by day, and then identify which particular timeslot matches that value.
Neither of these processes would require the series extension, by the way. That's more useful if you are trying to do things like calculate moving averages, do smoothing of series data, or any time series forecasting such as ARIMA.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
you can also use "Generate Attributes" and create a new attribute that "gets" the hour of the timestamp. Then you can cleanly aggregate, etc...
Scott