The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Text Classification/Labeling using Description

rikin_j_parekhrikin_j_parekh Member Posts: 1 Learner I
edited March 2020 in Help

Hi All,

 

I am new to RapidMiner and would like to perform labeling on a 'Long Description' column using a CSV file. I will be working with 2 columns mainly, 'Long Description' and 'Label'. The 'Label' is applied based on the 'Long Description' value. I have 1000 rows out of which 80% of 'Label' values are already applied as a training set. I wish to populate the remaining 20% 'Label' values using the 'Long Description' value.

All Label Values - 

Cancellation
Price Increase
Normal Payment
Payoff
Price Decrease
Installer Installation Issue
Past Due Payment
Change Order
Incentive Payment
Assumption
Completion Certificate
Interest
Referral

Example -

Long Description - Please review change order in installation phase - loan amount increasing from USD 21;851.00 to USD 24;501.00
Label - Price Increase

Long Description - Cancellation request with SPV Assignment

Label - Cancellation

How should I proceed with this using RapidMiner and what should be the steps to perform the same?

 

Thanks

Best Answer

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    You should search the forums for some of the threads on text mining, you will find a lot of helpful information there.  This is a classic classification problem.  You'll use your "long description" as the text, process and tokenize it, and then use the resulting word vectors to predict the label.

    However, you may find that you need to consolidate labels.  You have a lot of distinct values, and classification problems increase in complexity when you have have a lot of potential individual label values to predict.  So you may find better success by grouping some of the existing labels together into larger categories.  That's something that you will need to play around with manually, there's not an easy way to automate that in RapidMiner.

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.