Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Approach to standardize merchant names -Tagging
Experts,
I'm in the process of the standardizing our transaction type and bucket them in a correct category.
For example we have companies like below. The biggest challenge is tagging and putting them in appropriate bucket. There are lot of variations with transaction types. What machine learning model can we use here to tackle this monstrous tagging work. Are there any sample model that is built to address such use cases. any reference to it is greatly apprenticed.
I'm in the process of the standardizing our transaction type and bucket them in a correct category.
For example we have companies like below. The biggest challenge is tagging and putting them in appropriate bucket. There are lot of variations with transaction types. What machine learning model can we use here to tackle this monstrous tagging work. Are there any sample model that is built to address such use cases. any reference to it is greatly apprenticed.
CatgType | Matched | Actual Entry |
HR | ADP | Adp |
Travel | Airbnb | Airbnb |
Travel | Alaska Air | AlaskaAirlinesInc |
HR | Allied Delta | Allied Delta |
G&A | Amazon | Amazon |
Server | AWS | Amazon Web Services |
Credit Crd | American Express | American Express |
Travel | American Air | AmericanAirlines |
Credit crd | American Express | Amex Epayment |
Insurance | Anthem | Anthem Bc |
0
Answers
Is there a example on how we scrape a google web page and achieve this? Attached is what i wanted to extract.
First, name matching and grouping different naming of the company to be same ex:- AWS, Amazon Web Services, Amazon Web Services Inc, Amazon Web Services Llc etc., to same company
Second, use Google Search or use wiki API (this isn't as consistent as google) passing company names and scrap the data. In the below example it should be courier delivery services company
https://en.wikipedia.org/w/api.php?action=opensearch&search=FEDEX&limit=1&format=json
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=FEDEX
So i think i got theory part, but now how to do this in RM is where i have BIG GAP any sample process to get me started is greatly appreciated.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I'd follow the advice above and work in 2 steps. First have a kind of 'translation list' where I'd use regex to convert most known variations to a common label. So (AWS|Amazon.*web.*services) becomes AWS or so. Dirty job but someone has to do it.
Next I'd do something as in attached example, where you can use a simple list with all of the entities you like to find (I've made something similar to look for brands etc in reviews) and the process will 'tag' these in the text. This can be relatively easy converted to more official tagging so you create for instance your own entity recognition model in for instance Spacy, and integrate this using python.