The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
evaluating text
Hi,
I'm trying to test, how well a simple machine does at predicting a property of a text (specifically sarcasm).
I have my data in a massive table, where one colomn is the source, one is the label, that should be predicted and the last colomn is the text, the algorithm(s) should analyze.
The problem is without some tool to extract meaning or sentiment the results are (not surprisingly) abysmal.
Both the promotional texts on the Rapid-miner main page and the professor, who suggested I use Rapid Miner, imply that there are such tools already part of Rapid Miner, however I have not yet found anything in the documentation /manual.
What are these tools called/how are they used?
Tagged:
0
Best Answers
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi @MarkusW,
RapidMiner has a Marketplace that you find in the menu ("Extensions"). There you will find the Text Processing and Web Mining extensions.
There's a full Text Mining course in the Academy:
https://academy.rapidminer.com/courses/text-and-web-mining-with-rapidminer
Regards,
Balázs1 -
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
Yes, sarcasm detection is a big challenge and simple models don't cut it.
Have you seen "Automatic Classification of Documents" in the Academy course?
It explains the Process Documents operator. The only addition you would need here is "Generate n-Grams (Terms)". This will create new attributes of term combinations like "not very good" and "i really liked it". Of course, all combinations of subsequent words will be created, so this gives you a massive number of new attributes. This might help you with the sarcasm or not.
Naive Bayes and SVM are the modeling algorithms well suited for this situation. Other algorithms will take ages and don't perform well on this kind of data, with the possible exception of Deep Learning, but you'll need massive resources to execute that.
Regards,
Balázs0
Answers
MeaningCloud Text Analytics
Extract Content is in the Web Mining extension.
In the Operator Toolbox you have two sentiment-related operators, these work unter some conditions (language etc.). You can take a look at them.
If they are not good enough for your content, you'll need to build a sentiment model yourself using the methods in the Academy course. Sentiment will be the label here; if you don't have the labels yet, you'll need to score a couple of hundred typical texts yourself and use the manually assigned sentiment as the label. Then you would predict the sentiment in the first step, change the result to a normal attribute, and then use your label together with this new attribute.
"Analyzing the content" is a very human-like activity. Text mining methods work by looking at terms or combinations of terms. You have full control over the process in RapidMiner, or you use an external service that does similar things in the background.
Regards,
Balázs
if you want to detect sarcasm as the label in your data but you don't have labeled data, then you can't use classical data mining here.
You might be able to find a company that offers sarcasm detection as a service and use that. Or if you really need this for a company, you'll get some assistants to label a couple of hundred documents/texts so you can bootstrap a model.
RapidMiner will help you when you have a labeled data set. The text mining operators are described in the Academy text mining course. You can use terms (n-grams) in the process.
Regards,
Balázs