"Simple Text Classification - Help"

User2170 · February 2011

Hello,

I am trying to classifiy documents (.txt) [sort into groups].

What I've dont so far:

Process Documents from Files (2 categories / classes) -> Tokenize -> Filter Stopwords ==> Learner ==> Apply Model (the document to classify comes from Read Document -> Process Documents (Tokenize, Filter) as you can see below:

There are 6 documents for each class (Process Documents from Files) and a single document to classify.

Is this the right way to classify text / documents in Rapidminer ? I am asking because the results are confusing..just to make sure, I want Rapidminer to tell me "Your single .txt file belongs to class/category A or B".

Thanks in advanced!

B_ · February 2011

Search for this post in BI Processes "Example - Classify Text Language" and remove the NGgram operator. You will have a working text classifier. I use it for several text classification applications.

land · February 2011

Hi,
you will have to make sure that in the apply case the same word lists are used! Otherwise there won't be the same attributes and the TF-IDF will differ! So forward them from the process documents operator in training part to the input port of Process Documents on application part.

We have a Webinar that will introduce you to the text classification tasks more detailed.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Simple Text Classification - Help"

Answers