identify consecutive dates/values

lghansse · April 2018

Hi everyone,

I'm trying to use rapidminer to filter examples in an exampleset with at least 3 consecutive dates (see below). So more specifically, I want to filter those ID's that have dates where there is no more than -+ 1 month between two dates.

id	date
1	20/01/2018
1	21/02/2018
2	19/01/2018
2	19/02/2018
2	21/04/2018
2	21/05/2018
3	22/03/2018
3	22/04/2018
3	23/05/2018
3	22/07/2018
3	23/08/2018
3	22/09/2018
4	20/01/2018
4	21/02/2018
4	21/03/2018
5	19/01/2018
5	20/02/2018
5	19/05/2018
5	19/06/2018
5	20/08/2018
5	20/09/2018
6	22/01/2018
6	21/02/2018
6	20/04/2018
6	20/05/2018
6	20/07/2018
6	21/08/2018
6	20/09/2018

In order to do that I created a script in where I generated a date_id that was made up of the year + month, and lagged on date_id in order to calculate the difference between two dates (which could not be more than 1). Working with days was not really an option since the transaction was/is not always exactly one month apart. So my data looked more or less like this:

id	date	date_id	date_id-1	difference
1	20/01/2018	20180	?	?
1	21/02/2018	20181	20180	1
2	19/01/2018	20180	?	?
2	19/02/2018	20181	20180	1
2	21/04/2018	20183	20181	2
2	21/05/2018	20184	20183	1
3	22/03/2018	20182	?	?
3	22/04/2018	20183	20182	1
3	23/05/2018	20184	20183	1
3	22/07/2018	20186	20184	2
3	23/08/2018	20187	20186	1
3	22/09/2018	20188	20187	1
4	20/01/2018	20180	?	?
4	21/02/2018	20181	20180	1
4	21/03/2018	20182	20181	1
5	19/01/2018	20180	?	?
5	20/02/2018	20181	20180	1
5	19/05/2018	20184	20181	3
5	19/06/2018	20185	20184	1
5	20/08/2018	20187	20185	2
5	20/09/2018	20188	20187	1
6	22/01/2018	20180	?	?
6	21/02/2018	20181	20180	1
6	20/04/2018	20183	20181	2
6	20/05/2018	20184	20183	1
6	20/07/2018	20186	20184	2
6	21/08/2018	20187	20186	1
6	20/09/2018	20188	20187	1

So in my example ID's 4 en 6 should be included. My plan was to use an aggregate function on 'difference' and compare them to the number of examples in each loop on ID, to idenify consecutive dates (because the aggregate would be equal to the number of examples -1, since every first row is empty). However, this method is only good for identifying cases that have only exactly three dates that are consecutive dates, or cases where every date is consecutive. So, I'm a bit at loss of what I could do to make this work (maybe I'm making things needlessly complicated, but I have no knowledge of a function in RM that allows to identify consecutive values or dates).

Lise

Telcontar120 · April 2018

I think you have overcomplicated this. There is a datediff function in RapidMiner that you can use to do something very similar. Just lag the original date (not your polynominal id version) and then use the datediff function, and you can then convert it to whatever unit you want (it's calculated in milliseconds) and filter for whatever period gap you want (e.g., 1 month).

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

identify consecutive dates/values

Answers