The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Words/String Matching Producing true or false
I have a data set for example:
Internal Experience Functional Area |
Marketing & Sales |
Marketing & Sales |
Controlling/Accounting |
Marketing & Sales|Marketing & Sales |
General Management |
Marketing & Sales |
Logistics|Logistics|Logistics|Logistics |
Logistics |
Marketing & Sales |
I want to match it with my requirement xlsx file which contain column:
Match words |
sales |
This matching is string and is not case sensitive meaning even if it is small letters and capital it should work.After matching it should give me result as true or false or 1 or 0. Result should look like this.
Internal Experience Functional Area | Matching result |
Marketing & Sales | TRUE |
Marketing & Sales | TRUE |
Controlling/Accounting | FALSE |
Marketing & Sales|Marketing & Sales | TRUE |
General Management | FALSE |
Marketing & Sales | TRUE |
Logistics|Logistics|Logistics|Logistics | FALSE |
Logistics | FALSE |
Marketing & Sales | TRUE |
I dont know how it can be done. please help
Tagged:
0
Answers
Hi @asn4293
Let's assume that 'Area' is a short name for the attribute containing strings.
Use 'Generate Attributes' operator to create new attribute named 'MatchingResult', with the following parameters:
attribute name: MatchingResult
function expressions: contains(lower([Area]), 'sales')
This would generate 'true' value in case lowercase 'Area' contains 'sales' substring, and 'false' otherwise.
Vladimir
http://whatthefraud.wtf
@kypexin
Thank you for your feedback, but this is only reasonable when we have one search and we can write query everytime, I have approximately 1000 things to match with huge data, in that case this would not be a suitable case.
I want to specify column where there are words to be matched with each other.
Hi @asn4293
So the task becomes much more generalized, where you have to fuzzy match two columns of text attributes, which technically makes many-to-many matching. This sounds like a bit tricky task to be acomplished with RapidMiner, at least I cannot come up with an easy solution right out of my head... However, my suggestions are:
If you could share your actual files you need to match, we could probably try to play around with these to get a faster solution with RM.
Vladimir
http://whatthefraud.wtf
Hi @asn4293,
I may have found a solution playing around with Process Documents from data (from the Text Processing Extension):
Note that I generated a couple of test example sets with R, but that's only for my convenience (R is not at all necessary). The idea is to tokenize the string, then filter only the tokens matching the keywords and then proof whether the resulting string is empty.
I leave it up to you to refactor this "quick and dirty" solution XD
Kind regards,
Sebastian
@SGolbert pretty neat!
Vladimir
http://whatthefraud.wtf
I dont know how to put R coding in can you please help to rectify it @SGolbert. One file has data in it, second file it is getting data from.
Data file
This file is the drop down which is data to look into