lower case without removing numbers
Hi all,
I am trying to join two tables together. Because they are both open fields, there are some differences in names regarding punctuation, use of capitals etc.
I have now transformed the variables to match on like this to overcome this problem: replaceAll(lower(trim(VARIABLE)), "[. ,()-:?]","").
However, this code also deletes all numbers in the variable name and I don't want that.
Any thoughts?
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
If these are nominal values, then I would definately use the Replace operator and do the regex first to remove the puncuation AND then do the lowering/trimming.
Just doing a regex with \W selects all the puncuation and then replace with either a "_" or nothing will get rid of the puncutation. If that's not the desired effect then I would consider splitting off the numbers, doing the lower/trim, and then rejoin the numbers.
0
Answers
It sounds like you'll need the Rename By Replacing operator and write some REGEX to skip over the numbers in your attribute names.
yes agreed. I would also just try changing the order of that expression. Looks like you're trying to use RegEx after "lower" and "trim". Try doing the replaceAll RegEx first.
Scott
Thanks. I have tried that, but unfortunately it doesn't work. It removes all characters and spaces, but when converting to lower case it still removes the numbers.
Any thoughts on what to add to make sure it doesn't do that ?
This works! Thanks!