The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Dataset with binary flag columns
Hi all,
For an assignment I have to design a model that can predict the price category of a house (five categories) based on multiple variables. A description of the variables can be found below.
I have imported the (training) dataset into Rapid Miner and assigned the proper role and type to each of them. However, I am running into the following and cannot find an answer on the internet or within this community.
For an assignment I have to design a model that can predict the price category of a house (five categories) based on multiple variables. A description of the variables can be found below.
I have imported the (training) dataset into Rapid Miner and assigned the proper role and type to each of them. However, I am running into the following and cannot find an answer on the internet or within this community.
The binomial variables beginning with 'flg_missing' indicate for each row whether the value is missing in the dataset. So for example, if the 'flg_missing_year' for row X has the value 1, then the value in the column 'year' is missing. For some columns this can also be determined from the column itself (for example, for the column 'year', the dataset returns 0 which of course means a missing value). However, for other columns the value 0 can also be interpreted as a logical value (e.g. 'n_weeks_old', which indicates how long a house has been for sale). Because this means that there is not a logical missing value placed in the dataset for each column, I am not getting anywhere with the 'missing value' operators.
Because of this, it seems to me that the only logical way for Rapidminer to correctly interpret the missing values is by using these binary 'flg_missing' columns. I just have no idea how.
Is there anyone who can help me on my way with this? At first I was thinking of creating extra columns where based on conditional conditions the value is included in the column (if it is not a missing value) and otherwise not.
The solution does not seem to be simply removing the rows with missing values. When I do this I am left with ±50 rows from the dataset that now has almost 5000.
To be clear, it is absolutely not my intention to have the model built entirely by others without making any effort myself. I have already done a lot of research myself, but simply can't figure it out.
Thanks in advance for the responses!
Tagged:
0