The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] Select attributes only shows metadata and no variables?
kasper2304
Member Posts: 28 Contributor II
Hi out there.
I am working on a text mining project where i need to create a subset of variables for further dimensionality reduction before using training my model. Having watched the videos online i have come to the conclusion that the "select attributes" node is the one i have to use.
Here is what i have done so far.
I have created two folders on my hard drive. One folder containing positive cases and another folder containing negative cases giving me a total of 300 cases. Somehow RapidMiner manages to get two extra cases which i believe is the "folders" themselves which i will have to remove, but first things first.
I used "Process documents from files" and loaded the two directories with class name "1" and "0". Within the "process documents from files" node i have "transform cases", "tokenize", "filter stop words", "extract token number", "extract length", "aggregate token length", "stem snowball" and "filter tokens".
The settings of "process documents from files" node are:
use file extension as type = TRUE
create wor dvector = TRUE
add meta information = TRUE
prune method = PERCENTUAL
This gives me around 150 variables where i need to kick some of them out before doing dimensionality reduction. As an example "names" does not make much sense to do any analysis with in my case.
THE PROBLEM:
The problem arises when i use the "select attribute" node. It should in my world be straight forward to attach the node to my "process documents from files" node and then simply select/de-select the variables i want to continue with. BUT the only variables that is displayed when i try to use subset option is four metadata attributes... In my world all the 150 variables should be displayed... So is this a bug or do i have some settings wrong somewhere?
Best
Kasper
I am working on a text mining project where i need to create a subset of variables for further dimensionality reduction before using training my model. Having watched the videos online i have come to the conclusion that the "select attributes" node is the one i have to use.
Here is what i have done so far.
I have created two folders on my hard drive. One folder containing positive cases and another folder containing negative cases giving me a total of 300 cases. Somehow RapidMiner manages to get two extra cases which i believe is the "folders" themselves which i will have to remove, but first things first.
I used "Process documents from files" and loaded the two directories with class name "1" and "0". Within the "process documents from files" node i have "transform cases", "tokenize", "filter stop words", "extract token number", "extract length", "aggregate token length", "stem snowball" and "filter tokens".
The settings of "process documents from files" node are:
use file extension as type = TRUE
create wor dvector = TRUE
add meta information = TRUE
prune method = PERCENTUAL
This gives me around 150 variables where i need to kick some of them out before doing dimensionality reduction. As an example "names" does not make much sense to do any analysis with in my case.
THE PROBLEM:
The problem arises when i use the "select attribute" node. It should in my world be straight forward to attach the node to my "process documents from files" node and then simply select/de-select the variables i want to continue with. BUT the only variables that is displayed when i try to use subset option is four metadata attributes... In my world all the 150 variables should be displayed... So is this a bug or do i have some settings wrong somewhere?
Best
Kasper
0
Answers
The attribute names are determined from the data at run time so the meta data can't get hold of them. A work around is to store the example set in the repository using "Store" and fetch it again using "Retrieve".
regards
Andrew
I was actually just about to try that work around.
Best
Kasper