The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
classification or clustering
Hi,
I am currently busy with a dataset that contains of text. I have questions how to handle this dataset.
- because of the size of the dataset i want to use the filter example for one type of title and sample to decrease the number of items. But how can this be done exactly?
- I want to apply necessary classifications to solve the business problem. I use the operators: Retrieve- nominal to text- process documents and tokenize. Can somebody help me what i do wrong here?
I am currently busy with a dataset that contains of text. I have questions how to handle this dataset.
- because of the size of the dataset i want to use the filter example for one type of title and sample to decrease the number of items. But how can this be done exactly?
- I want to apply necessary classifications to solve the business problem. I use the operators: Retrieve- nominal to text- process documents and tokenize. Can somebody help me what i do wrong here?
0
Answers
The Filter Examples operator has operators for nominal attributes like "contains", "starts with" or "matches". These should help you filter the title.
Sampling is done with one of the Sample operators.
Academy video: https://academy.rapidminer.com/learn/video/sampling-weighting-intro
I don't think that you're doing something wrong with the steps you're describing in your document classification. You should have a target (label) attribute for the classification and apply a learner like Naive Bayes or Support Vector Machine in a cross validation to the data.
Text Mining is a large topic. Please check out this course in the Academy:
https://academy.rapidminer.com/courses/text-and-web-mining-with-rapidminer
Regards,
Balázs