The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
To classify customer queries using RapidMiner
Akshay_Gupta
Member Posts: 3 Contributor I
I have a huge excel file, which has general attributes like BILLID, CLAIM ID, ITERATION ID, QUERY TEXT etc. I want to classify these queries to identify major bulk of problem areas. I tried employing Decision Tree, but it did not render any insights. Most of what I researched I stumbled upon using filter keywords, Tokenizing, Stemming to identify keyword frequency, But I couldn't use it for excel attribute, irrespective of that I am not sure that would help me in finding clusters/buckets of complete query statement rather than just keywords. Any help in finding right direction would be of much help, Thank You.
0
Answers
You need to set a label to classify your dataset and if you have some sort of text to process, you'd need the Text Processing extension too.
Attached is a sample process that I used to classify some HR data that also used open ended survey questions.
I tried to replicate your process, but
1. It doesn't show any attribute in Remap Binomials operator,
2. The decision tree generated individual branches for each example, no relevant insights, I think I am not using it properly.
3. Word frquency looks irregular, For example at times it is showing count for "approv" "provide" or "support" instead of supportings.
Thank you for such a prompt reply, This is my first process @ Rapidminer. My expectation is to classify textual queries(thousands) into some handful of classes, For example "Tax related", "Incorrect/Empty details", "Supportings not Attached". Queries are remarks put up by officer to intiator or QTI(Query to Initiator). I am looking a way to automate these query handling proccesses in order to improve efficiency and productivity. Hence I am starting with the first step of identifying major problem/Query areas - classification.
I am sorry, If I sound immature, I have taken 3 weeks of internet scouring before posting here. This field is quite new in my environment. Hence I couldn't get much help from around, It would help in great deal, if you can share resources about some example processes from where I can learn by comparison.
Do you have some sample data to share too? Would make troubleshooting this easier.
Thanks!
Hi Thank you for your time and attention, PFA the data file. I hope it helps the context of my problem. These are queries related to payable, i want to automate query handling processes, hence I am trying to classify these queries into handful of focus areas and think about solutions from there.
So your data set has some data quality issues. I just Filtered them out by selecting only ER- records. Then I made some tweaks to the process and got a results.
See below for the XML.