The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to split strings contained in a text column of csv file into words
Ayushi_Aggarwal
Member Posts: 6 Learner I
As of now, I am reading a CSV file which has review(text), n1, n2, n3, overall (text) columns.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row Review
1 Poor service
2 There were torn seats
What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.
Please help.
I am using select attributes to include only review column, which gives me an output in rapidminer of the form:
Row Review
1 Poor service
2 There were torn seats
What i want to do is split the contents of Review column into individual words like : Poor, service, There, etc.
I am using Process documnets to data > Tokenize but somehow not getting the required output.
Please help.
Tagged:
0
Best Answers
-
David_A Administrator, Moderator, Employee-RapidMiner, RMResearcher, Member Posts: 297 RM ResearchHi,if you don't necessarily have to use the Text extension. You could also simply use the "Split" Operator (not to confuse with "Split Data") and use a regular expression. I would say something simple like \s+|\W+ should do the trick (to split along spaces or non word characters (letters and numbers).Best,
David
6 -
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornCan you be more clear about why Tokenize is not giving you what you expect? What are you getting? If you share your process and a data sample it will be easier to troubleshoot. In general Tokenize should do exactly what you are asking for, take a text column and split it up into individual words.5