The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
NullPointerException
Hi,
I am a newbie to RapidMiner. I am trying to use Expectation Maximization to cluster some data. I have a around 500 000 of data rows in .csv file. I am using the process "Read CSV" -> Normalise -> Replace Missing Vlaues -> Clustering
However i always get a nullpointer exception at the clustering time
I am doing something wrong here?
Thanks in advance
Darme
I am a newbie to RapidMiner. I am trying to use Expectation Maximization to cluster some data. I have a around 500 000 of data rows in .csv file. I am using the process "Read CSV" -> Normalise -> Replace Missing Vlaues -> Clustering
However i always get a nullpointer exception at the clustering time
I am doing something wrong here?
Thanks in advance
Darme
0
Answers
If there is no such dialog, please post your process setup and give us a detailed description of your data (number and types of attributes, and any particularities).
Best regards,
Marius
Thank you for your prompt reply. Following is the error massage i get.
The setup does not seem to contain any obvious errors, but you should check the log massages or activate the debug mode in the settings dialog in order to get more information about this problem
The log contains the following
subprocess 'Main Process'
+- Read CSV[1] (Read CSV)
+- Normalize[1] (Normalize)
+- Replace Missing Values[1] (Replace Missing Values)
==> +- Clustering[1] (Expectation Maximization Clustering)
Apr 23, 2013 4:49:13 PM SEVERE: java.lang.NullPointerException
the data has 11 attributes which are of types text, number and date. In the normalise process i have set value type to numeric
In the clustering i have set randomly assigned examples
In the Replace Missing Values i have set attribute filter type to all and default to average
do you need any more information? Please let me know
Thanks again
Darme
it seems that you also have missing values in your nominal and/or date attributes. You should remove/replace all missing values before applying Expectation Maximum Clustering.
Best regards,
Marius
I added two Replace Missing Vlaues steps to the below process. One has attribute filter type , "value_type" set to text with default set to value and replenishment set as "extra"
The other has the value-type "date" and replenishment value of 23/4/2013.
Still i get the same error. Am i still on the wrong path. Please help.
Thank you very much
Darme
Additionally, try to set a breakpoint before the clustering operator and inspect the metadata for missing values.
Best regards,
Marius
Once again thank you for your advices.
I have attached the code of the process i am using and i believe all the required information is there.
Since i have a very large set of data, if a breakpoint is set for clustering then i think i need to iterate for each row of data one by one.
Is there a way to stop when a value is missing, similar to setting conditions to breakpoints?
Thanks and Regards
Darrshan
Code:
Anyway, my suspect is that in the second Replace Missing Values operator you should select valye_type nominal, polynominal or binominal instead of text (text is a special data type used only in the Text Processing extension).
Experiment with that setting, *and* check the result with a breakpoint.
Best regards,
Marius
As you have advised i changed the settings of Replace Missing Values operator and also changed the read csv operators data types accordingly.
Still i am getting the same result
Also i created break points before clustering and in the meta data view the "Missing value" column shows only "?" I also set break points at each step and looked at the meta data and the result was same.
Furthermore i created the given schema on a MS SQL server evaluation edition and ran a query to retrieve null values for the given data set. The result was that there are no null values.
Do you think something else has gone wrong? Any more information needed?
Thanks again
Darme
I updated to 5.3.008 and still get the same error. Could it be that some setting/configuration issue?
Could you send me your xml file so that i can check it here?
Many thanks again
Darme
I tried out RM version 5.3.8 with modifications to the process. But still the result is same.
I have attached herewith the xml code
Seems something is fundamentally wrong either in the way i am doing or in the data.
Could you please share your xml to try out with my data?
Thanks alot
Darme
For the NullPointerException we have already created an intern ticket.
I used the above process with using output as "exe" and got rid of the NullPointerException.
However i have some issues with the result.
1. In the "Replace Missing value" for date, i have provided value as zero and all of the date values have been replaced by "Jan 1, 1970"
2. In the "Replace Missing value" for real, i have set the default value as average and in most of the columns the actual values have been replaced by the average figure
3. In the "Replace Missing value" for binomial, i have set the default value as "BFI" and all of the actual values have been replaced with this.
Is it possible for me to do the clustering with the actual values? Is there any reason why the tool replaces actual values with the values for replacement?
In another experiment, keeping all of the above as same but i altered "Replace Missing value" for date, by setting a default value of 1/1/2009.Then again i got the NullPointerException.
Could you explain this behaviour?
Once again thank you for your understanding and continues help with this regard and hope for solutions for my questions
Regards
Darme
I managed to get results by trying out various options in the tool. Mainly I used attribute_type for all attributes rather than their data types and set one as the prediction. I guess if we keep attributes in some data types there could be nullpointer exception possibly because data type mismatches. Please correct me if I am wrong here.
Once again thank you very much for all your help with this regard
P.S shall I put this issue in to solved state
Regards
Darrshan