The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Auto Categorization of documents
Can you guide me in auto categorization of documents.
So, in DB we have a lot of long description of ticket data ( email conversation, or resolution data etc) , i need to train the classifier such that any new incoming ticket should be auto categorized to the right category.
STEPS TAKEN till now.
1) Tried to do unsupervised learning, to form clusters of words.
2) Used Naive Bayes classifier, but here I have manually labelled the training data set.
Any way which you can suggest in which I can do auto labelling of the text which can be used as training data.
Eagerly looking for your help.
Tagged:
0
Answers
Do a search through the community forums for some sample processes, that'll get you started.
Thanks Master.
Is it possible to get the cluster (with keywords in it ) and try to classify the new text to fall in respective cluster ?
Yes.
Give this a try.
My use case goes like.
1) Long description of a problem statement (Uncategorized)
2) Form categories out of it, based on keywords/phrases/POS tagging.
3) Assign the above mentioned category to the new incoming text.
I get the clustered model in place. Where I have keywords and in which cluster it falls in .
Now how can I take this unsupervised learning to make a supervised learning model to further classify an incoming text to make it fall in a cluster or a category
You can add the cluster as a label (there's an option for that), and then use that label to build a predictive model, if you want to try to replicate document classification into those same clusters in the future.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Can you please elucidate on the steps taken after clustering in done (Clustered Set and Cluster Model). How can we make use of this to do a Supervised Learning ?
Once you have the cluster assigned as a label, you would then have a dataset that you could use with any of the standard machine learning approaches to classification. There are a number of helpful RapidMiner video tutorials on building such models available in the resources on this site: https://rapidminer.com/getting-started-central/
There are also as guided processes available directly from within RapidMiner (just click on the "Learn" button on the splash screen after startup).
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hello Brian,
So the use case in a simple sense is somewhat a combination of Supervised and unsupervised learning, Topic modelling LDA.
1) Documents(rows) typically will have 'detailed description' of a problem pertaining to any field. [For eg. CPU Usage, Memory Issue, Network Error]. This data is UN TAGGED (un labelled)
2) Now we need to find out keywords(N grams, POS) from each category and make a rule book, which says, these kind of words/phrases falls into certain category ( Here in short we are doing clustering by fetching relevant words/phrases for each category) (Un supervised learning)
3) Now based on above step, we want to TAG a new incoming document (by analysing the content in it, keywords/phrases) (SUpervised Learning)
any updates ?
Hi @sangeet,
@Thomas_Ott and I have already provided some direction about how we'd approach the problem. Based on your described use case, we recommended the following:
So I'm not sure what else you are expecting at this point. Did you have a more specific question, or a problem that you ran into when you tried to complete the steps above? Please remember that this is a free user community forum. If you are interested in a more detailed consulting project, you can feel free to PM me.
Best,
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts