Multiple date attributes to adjust/change
Hi Fellas,
I am new to RapidMiner.
The very first task I am trying to do requires identification of the earliest date from a large dataset containing 8 date attributes and few thousands of examples. Due to low quality data, some date attributes missing, some mistyped.
So far, I created a process, that is identifying the earliest date from the 8 and generates a new attribute called ’Earliest date’. Now, I have 8 date attributes, but will use only the 9th as nominal value for further transformation of the dataset.
Before doing that, I would need to filter the wrong values from the 8 attributes such as -1 or dates in the future like 2071 and so on. Without proper cleansing the 9th attribute is wrong is few cases returning 1899 and similar.
Is there any way to filter out the dates preferably without repeating the same operator 8 times? Outliers for dates or something close to that? I am not familiar with macros yet, but perhaps that is the missing piece…
Thanks!
Answers
Hi,
have you tried Loop Attributes?
Best,
Martin
Dortmund, Germany
hello @benicsakp - welcome to the community. Yes as @mschmitz said, Loop Attributes is likely what you need. Please post your XML process (see "Read Before Posting" on right when you reply) so we can see exactly what your process looks like.
Scott
Hi @benicsakp
I think a solution for this certain part would depend on what exactly you want to do with wrong dates (skip these examples, replace these dates with certain value etc etc). For example, if you expect all the dates to be in a certain range (like, between 1/1/2012 and today), you may try use here DECLARE MISSING VALUES operator with corresponding expression filter which would turn all values out of the pre-defined range into missing values, which you could then treat accordingly (replace or whatever).
Vladimir
http://whatthefraud.wtf
Exactly. I filtered the really bad examples (all dates missing) and now, I would like to change the not too bad dates in a subprocess.
For example:
IF dateattribute < 01.01.2017, THEN dateattribute = 01.01.2017
and
IF dateattribute > 01.01.2018 THEN dateattribute = 31.12.2018
Must be very easy to do for sevearal attributes, but could not find the solution yet.
- I checked Loop attributes as @mschmitz suggested, but no clue how to change the subset of attributes.
- Declare missing values is interesting, but I had the same issue as with loop attributes. Identification is easy, but the cannot make the changes.
- Following @sgenzer suggestion, try to post the XML here.
Hi @benicsakp,
This following example process shows the usage of Loop Attributes and contains a Generate Attribute function where your IF - ELSE statement is included.
More details on the available functions within Generate Attributes and how they are used can be found here
Best regards,
Edin