The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Imputing challenge with a best possible ensemble
Experts,
I have the attache sample data and i wanted to impute with an ensembles, and pick the best imputation model. The data i have is 24 months of data and need to impute the missing months based on best possible algorithm like average/nearest available (knn)/linear regression etc., I tried doing that but i'm seeing data being imputed in columnar way (i.e) it is taking average on column for different id's and applying that.. what we need is to perform a row wise average as opposed to columnar.
I would greatly appreciate a sample process with ensemble that would impute based on row values
Many Thanks
S
I have the attache sample data and i wanted to impute with an ensembles, and pick the best imputation model. The data i have is 24 months of data and need to impute the missing months based on best possible algorithm like average/nearest available (knn)/linear regression etc., I tried doing that but i'm seeing data being imputed in columnar way (i.e) it is taking average on column for different id's and applying that.. what we need is to perform a row wise average as opposed to columnar.
I would greatly appreciate a sample process with ensemble that would impute based on row values
Many Thanks
S
0
Answers
Does this process answer to your need ?
Regards,
Lionel
As Always Many Thanks for your expert help.
I must admit that I'm little lost with the Impute Missing Values in your process...
Like in a Hollywood fantastic movie, the datas disappear mysteriously when they enter in the Impute Missing Values operator...!!!
More seriously, when I set a breakpoint before the Impute Missing Values, I obtain logically the following example set :
When I set a breakpoint before the model inside the Impute missing values operator, I obtain an empty example set !! :
So when the process is executed, RM raises logically an error ("example set is empty")
But I have to add that the datas seem to be spreading well (when I click on the ouput port of Impute missing values operator, I obtain an example set with no missing values !!) :
Someone has an idea of what's going on ?
Regards,
Lionel
NB : the process :
FWIW I would strongly reconsider using Neural Nets to impute missing values on such a small data set. More traditional methods such as interpolation or k-NN would likely give you better results.
Scott
So my actual example data set has 970 rows and out of that I only have 16 rows with complete records (i.e) without any missing values for 24 months periods, in such minority complete cases, what would be the best approach here? Would it is still makes sense to learn from complete records?
When i try enabling learn from complete data set i get very very poor or rather completely wrong result here.. it is taking the client ID and filling in the missing values ??
Further when I run by disabling learn from complete option and run the KNN regression, i just get the same value replicated across missing months as opposed to true imputation values--What approach can i take here to not have the same value repeated for missing months
Lastly, if i try to build a ensemble here, is there any way i can evaluate a imputation performance of various learners ??
As Always Thank you for your valuable advice and time.
One other thing to know is that you are using a pretty small data set. You're not going to get great results no matter what you throw at it. As we say here, "you cannot make a silk purse out of a sow's ear."
Scott
As you said KNN doesn't seem to work here. I will try working using a linear regression or probably use Replace Missing Values (Series) to interpolate these, since my dataset is a time series data .. I hope it makes sense.