The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"How to declare filtered values/non relevant values in RM?"
CausalityvsCorr
Member Posts: 17 Contributor II
Hello,
How to denote in RapidMiner, that certain type of missingness should not be taken into account while making calculations?
The type of this missingness is not missing at random (MAR) nor missing completely at random (MCAR), but instead the parameter value is missing because logically the value cannot exist (e.g. due to some other parameter). At least in my case “Filter Examples” nor “Filter Parameters” does not help, because if using them, all the data is gone due to high amount of this kind of missing values.
I tried also Declare missing values options (and for it instead of empty cell a code NR, not relevant) but the output from this operator ended up to the need to use normal “replace missing value –processes which leads to biased results.
This kind of feature can be called as “declare filtered value” or “ declare non-relevant values”.
regards P/K
How to denote in RapidMiner, that certain type of missingness should not be taken into account while making calculations?
The type of this missingness is not missing at random (MAR) nor missing completely at random (MCAR), but instead the parameter value is missing because logically the value cannot exist (e.g. due to some other parameter). At least in my case “Filter Examples” nor “Filter Parameters” does not help, because if using them, all the data is gone due to high amount of this kind of missing values.
I tried also Declare missing values options (and for it instead of empty cell a code NR, not relevant) but the output from this operator ended up to the need to use normal “replace missing value –processes which leads to biased results.
This kind of feature can be called as “declare filtered value” or “ declare non-relevant values”.
regards P/K
Tagged:
0
Answers
Although I will not be able to give you an answer to your question, you statement on bias is an issue which needs further attention in the RM community. Outliers can obviously also generate bias and should be considered when keeping them. Replace missing values by.. or any other editing RM provides to "clean" your data is as I must admit a new, but up to now not accepted approach for classic scientists. Therefore, we need to define a policy after communication with peers from statistic societies on how this can be used (or not) in scientific research and if used how it should be mentioned and motivated in a paper.
Cheers
Sven