The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"convert document files to transaction dataset"
I am a new on text mining and rapidminer. I want to prepare a dataset to create a model with my algorithm. The dataset should contain one row for each text document and each row consists of words contained in the document (separated by comma). Moreover,the words in dataset should be passed the preprocessing steps. token, stop word remove,stem, n-gram.
Please help me
Thank you
Please help me
Thank you
Tagged:
0
Answers
Here is an example process with two hard-coded documents (use "Process Documents from Files" to read from a set of files). Inside the "Process Documents" operator you will see a "Tokenize" and "Filter stopwords" operator. The resulting example set can be used to learn models like with any other numerical data set. In text mining it is common to use the SVM for classification, e.g..