The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Identify similar strings of only one attribute
Hello,
I would like to identify a degree of similarity between strings all belonging to a single attribute of type text. The reason is that I have strings that present tests performed in the hospital in the form: exam_a;exam_b;exam_c. I would also like to identify when they occur in different order but always with the same elements: exam_c;exam_b;exam_a.
I would like to identify a degree of similarity between strings all belonging to a single attribute of type text. The reason is that I have strings that present tests performed in the hospital in the form: exam_a;exam_b;exam_c. I would also like to identify when they occur in different order but always with the same elements: exam_c;exam_b;exam_a.
Please help me.
Thanks
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,Have a look at the operator fuzzy matching and Generate Levensthein Distance in operator toolbox extension. I think what you want to do is to replace the ; with a space and then do a fuzzy matching using TOKEN_SET_RATIO or so as a measure.Cheers,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5
Answers
It can help me but not quite what I want to do. I have situations in which I have strings of length 1 but also of length 20 (depending on the number of exams). Besides that, I have situations of missing values. I considered the Jaccard index idea by working on values separated by; but what happens is a word-by-word comparison (taking into account that by splitting the shorter strings are still commensurate with the longer string by adding missing values). I would like to think in terms of sets, then compare the words of one string with the words of a second string. What do you think about it? How could I do it?
Dortmund, Germany
I think I found the solution with Jaccard. However, before applying it, I would like to sort the data. To do this I am transposing and then sorting the columns. I have a problem with the transpose: applying the operator I am shown only the column of type ID and I cannot find all the other necessary columns. Why?
Dortmund, Germany
So if I have 1158 attributes, do I have to do 1158 sort? My idea was to use a Loop.
Dortmund, Germany