parse numbers output not numerical
I am reading a .csv file that has some numbers formatted as currency, eg $1,000 or $500. These are read by RapidMiner as polynominal. So I am using the Replace operator to remove the $ and , characters. The $ removal works fine and the , removal is also fine, but oddly for sums of $999 and below, which did not have a comma in them, I receive an error message: "No Number: according to the specified format, 500 cannot be parsed as a number". There are no spaces or other nuisances. Any ideas what could cause this? Thanks...
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi,
this sounds a bit odd. Could you provide an example process? And did you tried Trim to remove leading and ending white spaces?
~Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany0 -
dhampton Member Posts: 14 Contributor II
Certainly is easy when you know how! Many thanks Brian, I really like this one-size-fits-all replacement operator and will include it in our training.
For the benefit of others: Brian's killer replacement operator has this in the 'replace what' parameter: [-!"#$%&'()*+,/:;<=>?@\[\\\]_`{|}~a-zA-Z\s]
David
1
Answers
Have you tried the Parse Numbers operator and set the separator paremter to a comma?
You can just be extra cautious and replace all characters that won't parse with the replace operator. It works for me on your dataset.
Hi JEdward
Really helpful, thank you... but I do not know how to use the xml code you have provided, could you please tell me where to go to learn how to do that?
Hi Martin
Many thanks for the speedy response.
My original csv file does not have any spaces in it. But, your Trim operator suggestion worked! So, many thanks. In case you are interested the file is attached but I'm counting this one as solved.
Thanks Thomas, yes I am using the Parse Numbers operator... that's what is giving me the error message.
I think you were referring to the decimal separator character? Trouble is that if I change that to a comma then 1,000,000 becomes 1.000.000 which doesn't read as a number
If you enable the XML view in Studio, then you can copy the XML provided and replace the default XML, and then hit the green check mark at the top of the window. That will render the process in the main process view and you will be able to see the operators and their configuration. Sharing the raw XML is thus an easy way of sharing a RapidMiner process and you will see it commonly done this way on the community forum posts.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts