The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Labelling training cases in polynominal text clasification task
Each case in my dataset contains multiple sentences as shown below.
"Criterion 4Writing General – language and grammar and referencing.UnacceptableSentence structure and grammar inadequate for clarity and/or incomplete referencing of sourced material.AcceptableSentence structure and grammar adequate, but errors cause distraction and/or errors in referencing.GoodSentence structure and grammar adequate, with minor errors that do not distract reader from the main message.Very GoodSentence structures and grammar are good with correct referencing of all sourced material.ExcellentEmploys words with fluency for ease of reading. Writing and references are essentially error free."
I would like to classify the cases according to their main focus. My labels are ["Information Literacy", Written Communication", Digital Literacy"...] 8 in total.
When developing the training set some cases clearly relate to one area such as Information Literacy... In those instances my training data looks like this:
ID, Text, Lable
01 "string", "Information Literacy"
However, some cases relate to multiple labels.
My question is how should these cases be documented in the training set?
Hope that makes sense.
Tagged:
0
Best Answer
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 UnicornHello,Let's use a simpler case here to make an example.
Label | Text<br>weather | This will be a cold winter<br>food | a few sandwiches for me<br>weather | It's raining today<br>food | Give me some coffee<br>sports | Michael Jordan is the greatest basketball player ever <br>
The result for this one should be:weather, food | Today it was cold, I made coffee and sandwiches.<br>
Right?What I did to solve this was to train three different models (8, in your case). One that can recognize weather from not-weather, other that can recognize food from not-food, and a third one that can recognize sports from not-sports.You can make use of Multiply, Macros, and a few other things to train multiple models and then apply these models iteratively.It's not the most elegant solution and maybe @Telcontar120 has another one. I'll try to find an example to share with you, ok?All the best,Rodrigo.
6
Answers