The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Approach to standardize merchant names -Tagging
Experts,
I'm in the process of the standardizing our transaction type and bucket them in a correct category.
For example we have companies like below. The biggest challenge is tagging and putting them in appropriate bucket. There are lot of variations with transaction types. What machine learning model can we use here to tackle this monstrous tagging work. Are there any sample model that is built to address such use cases. any reference to it is greatly apprenticed.
I'm in the process of the standardizing our transaction type and bucket them in a correct category.
For example we have companies like below. The biggest challenge is tagging and putting them in appropriate bucket. There are lot of variations with transaction types. What machine learning model can we use here to tackle this monstrous tagging work. Are there any sample model that is built to address such use cases. any reference to it is greatly apprenticed.
CatgType | Matched | Actual Entry |
HR | ADP | Adp |
Travel | Airbnb | Airbnb |
Travel | Alaska Air | AlaskaAirlinesInc |
HR | Allied Delta | Allied Delta |
G&A | Amazon | Amazon |
Server | AWS | Amazon Web Services |
Credit Crd | American Express | American Express |
Travel | American Air | AmericanAirlines |
Credit crd | American Express | Amex Epayment |
Insurance | Anthem | Anthem Bc |
0
Answers
Is there a example on how we scrape a google web page and achieve this? Attached is what i wanted to extract.
First, name matching and grouping different naming of the company to be same ex:- AWS, Amazon Web Services, Amazon Web Services Inc, Amazon Web Services Llc etc., to same company
Second, use Google Search or use wiki API (this isn't as consistent as google) passing company names and scrap the data. In the below example it should be courier delivery services company
https://en.wikipedia.org/w/api.php?action=opensearch&search=FEDEX&limit=1&format=json
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=FEDEX
So i think i got theory part, but now how to do this in RM is where i have BIG GAP any sample process to get me started is greatly appreciated.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I'd follow the advice above and work in 2 steps. First have a kind of 'translation list' where I'd use regex to convert most known variations to a common label. So (AWS|Amazon.*web.*services) becomes AWS or so. Dirty job but someone has to do it.
Next I'd do something as in attached example, where you can use a simple list with all of the entities you like to find (I've made something similar to look for brands etc in reviews) and the process will 'tag' these in the text. This can be relatively easy converted to more official tagging so you create for instance your own entity recognition model in for instance Spacy, and integrate this using python.