The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Prediction/ Overview in Dataset
I want to predict the missing dates for "End Date Real" on the bottom.
As well, how do I get along with these missing data in the middle?
As well I want to find the relevance of the attributes (cavity and weight) regarding the lead time of an assignment during correction (C1, C2...) as well as the whole reliability on the planned dates (
Start Date | End Date Contract | End Date Real). |
I hope this aint to special. What do I have to do? Do you have any hints how I could start?
Greets Newbie 01
1
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi @Newbie_01,
first you have to decide if you have a classification or a regression problem.
Classification is a yes/no decision.
Regression is numeric prediction. Predicting the missing date looks like this, but you might want to recode your dates to numbers, e. g. the number of days after 2019-01-01 or some other start date. Use Generate Attributes and the Date functions there for this.
There are modeling algorithms that can cope with missing data, but it can be better to make an informed decision on how to handle and fill in missings. For example, you might know that C2 is always at least one week after C1, and so on. You would then fill in the missing data with the appropriate value. You could also try the Impute Missing Values operator to do this automatically.
Some learning algorithms include the attribute importance in their output. There are also operators called Weight by ..., that rank the importance of the attributes according to their algorithm. But you will often get different or even contradictory answers from different algorithms. If you have access to AutoModel, there is also a variable importance ranking there.
I hope this helps you to start with your analysis.
Regards,
Balázs7