The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"How to impute missing values"

ccapraccapra Member Posts: 6 Contributor II
edited June 2019 in Help
I have a survey dataset.

The survey design allows people to enter information about a single event more than once without repeating some details such as the host's contact info, and the event name & date.

This creates rows where some columns have missing data where the missing data is essentially the same data as in the same column's previous row.

Like this:

Sally    Smith         sally.smith@email.com     Jan 1 2012    Special Event      Downtown            One cool thing about the event
Joe      Shchoe      joe.sch@email.com         Feb 2 2012     Dumb Event        Riverside               One cool thing about the event
                                                                                                                                                     Another cool thing about the event
                                                                                                                                                     Joe had a lot to say about this event
Betty   Boop          betty.boop@email.com   Jan 5 2012      Odd Event        Out in the Boonies   One mildly cool thing about the event

********

So as you can see - Joe Schloe entered 3 rows of data & only had to put in redundant info once - and now I need to impute the value of the missing cells to the data above.  (i.e. copy Joe's contact & event data from the second row into the third & fourth rows.

I'm very new to RM & have only used some simple operators and never worked with either a subprocess nor with a 'learner' - but I think I need to use the 'impute missing values' process here - is that right?

And if so - how do I proceed?  (and - I don't know how it's supposed to look, but when I go into the impute missing values operator, there is nothing 'inside' it - I sorta thought it would have the subprocesses contained within, but it does not - so, am I just mis-expecting, or is there something wrong with my program?

Or - should I create a macro?  something like 'if a cell is empty, copy the cell from above'?  If so, how would I do that?

Thanks!
   





Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    If you always want to replace missing values with the first non-empty row above the current row you can follow these steps:
    1. install the Series Extension from the marketplace (Tools -> Updates and Extensions)
    2. use the operator Replace Missing Values (Series) with replacement set to "previous value"

    Best regards,
    Marius
Sign In or Register to comment.