The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Simple Text Classification - Help"

User2170User2170 Member Posts: 2 Contributor I
edited May 2019 in Help
Hello,

I am trying to classifiy documents (.txt) [sort into groups].

What I've dont so far:

Process Documents from Files (2 categories / classes) -> Tokenize -> Filter Stopwords ==> Learner ==> Apply Model (the document to classify comes from Read Document -> Process Documents (Tokenize, Filter) as you can see below:

image

There are 6 documents for each class (Process Documents from Files) and a single document to classify.

Is this the right way to classify text / documents in Rapidminer ? I am asking because the results are confusing..just to make sure, I want Rapidminer to tell me "Your single .txt file belongs to class/category A or B".

Thanks in advanced!

Answers

  • B_B_ Member Posts: 70 Maven
    Search for this post in BI Processes "Example - Classify Text Language" and remove the NGgram operator.  You will have a working text classifier.  I use it for several text classification applications.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you will have to make sure that in the apply case the same word lists are used! Otherwise there won't be the same attributes and the TF-IDF will differ! So forward them from the process documents operator in training part to the input port of Process Documents on application part.

    We have a Webinar that will introduce you to the text classification tasks more detailed.

    Greetings,
      Sebastian
Sign In or Register to comment.