I need help for improve my Confusion Matrix
Hi im new in Rapidminer, and i start a project to clasificate news in 2 types, 1= gender violence and 2= generic news.
This is my transforms etc that i used for :
https://gyazo.com/aae6e06bc6f1f7623881876b6ed06d23
This is my procesing text transformations:
https://gyazo.com/ed3d19158038d8c52f02db98ae13ba67
My cross validation:
https://gyazo.com/af702c35350dcd735ae61bf2f2322586
My result:
https://gyazo.com/a7752521e886c68be0490d3150b78635
Myquestion is how can i improve my result, and also for example if i choose a animal violence news, rapidminer fail and select a animal violence news as gender violence, iit is possible to create a rule so that in case it finds for example the word Animal, Dog, etc classify it as generic news?
Thanks for help, sorry for my english
Answers
hello @luistops - welcome to the Community. We'd be happy to help you with this. Can you please post your process XML in this thread using the </> button? Thanks.
Scott
hello @luistops - ok thanks for that. So for starters to improve your model I would encourage you to optimize your parameters (see https://www.youtube.com/watch?v=nXjjAA2mDMY). As for reducing/combining classes, I'd need to see your dataset to understand better what you're trying to do.
Scott
Ok thank you so much for help, i include my datashet. my proyect just try to diference and clasificate news in 2 categories:
1. No gender violence
2. Gender violence
hello @luistops - ok I think I get your idea. I did not have your Spanish stopword dictionary so I just greyed that out. As for animal, dog, etc.. I put some replace token lines in your Process Documents operator that perhaps (?) is what you're looking for. Note that the "test set" I created is rather small so don't get too excited by your 100% accuracy result.
Scott
Just looking through your initial confusion matrix you had some decent results off the bat. What k value were you using?
In conjuction with what @sgenzer said, optimization is a must step. I would also look at using the Deep Learning algo as well as a LinearSVM too.
Thank you for the help i used k=2, now i have one doubt i have 2 result, but i dont understand where its come from, i have this result https://gyazo.com/00d0fc3aa5ff57cc7eb20b8d6bc45bca thats its great.
And i have this one https://gyazo.com/79891aae0d4958d6d9b790d7bc934557
Whats is the diferent of this result because i think they use the same algorithm to calculate the result
Again thanks
So there's another tab in the Results view that you need to look at: ParameterSet (Optimize Parameters (Grid)). This will tell you the value of k (assuming you used my process) that was optimal.
Scott