Missing Values, Inconsistent Values
Question
Hello,
I would like to solve the kaggle 2013 Expedia competition, but unfortunately rapidminer can't handle the dataset.
Any tips how to work this out?
Answer
Nice to hear from you. I think the problem with this data set is that it uses the word “NULL” to indicate missing values. I was successfully loading the data with the following two changes:
- De-select “Skip Comments” (might not be necessary for this data set actually but can cause problems if there are “#” symbols in the data)
- When you go through the input wizard, change the type of the column “prop_review_score” to “Polynominal”
Reason: RM guesses the type of the columns based on a sample from the data loaded at the beginning of the wizard. This column is completely numerical in the data from the sample but has the value NULL later on. This causes the problem since NULL is not a number RM can parse. You can import the data first (might take 10 or so minutes) and fix it later with the operator “Declare Missing Value”.
In case of any problems like that, also feel free to ask our user community .
And of course this is exactly what our customer success team is more than happy to help you with ;-)
Have fun,
Ingo