The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Processing input and output for Test Data - SVM
HeikoeWin786
Member Posts: 64 Contributor I
Dear all,
I am noticed that when I perform the same data processing steps for testing(unseen/unlabel dataset), the output of processing eliminates the column (it eliminates the regular attributes and only return the label) i.e. different from the output of processing for training dataset where the output of training dataset return the label and regular attributes.
It is showing the error also.
It would be truly great if someone can advise me or educate me in this modelling process.
Thanks much in advance.
I am noticed that when I perform the same data processing steps for testing(unseen/unlabel dataset), the output of processing eliminates the column (it eliminates the regular attributes and only return the label) i.e. different from the output of processing for training dataset where the output of training dataset return the label and regular attributes.
It is showing the error also.
It would be truly great if someone can advise me or educate me in this modelling process.
Thanks much in advance.
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi @HeikoeWin786 ,on 1) SVM can handle only numerical attributes, yes. This is why you need to do the tf/idf to convert your texts to a numerical vector. However, it can use binominal labels. so this should work.on 2) seems good to me then.Best,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5
Answers
Dortmund, Germany
Thanks a lot for your response.
It is not failing, it is a warning. But, why this warning is showing even tho it is not missing.
However, what I am confused is: the exa of process document from data(2) operator is eliminating the columns.
E.g. for the exa of process document from data(1) operator, the words are tokenized and transform into a number of regular attributes. However, for (2), the exe is showing 0 attributes and only one label attributes.
I am not able to understand whether I am doing it correctly or not.
What I understood is the pre-processing for training and unseen/unlabel dataset should be the same and we take the word output of pre-processing from training dataset as the input for unseen/unlabel dataset.
Looking forward for your kind explanation.
thanks.
Dortmund, Germany
Thanks for your prompt reply. Well noted for the warnings case.
However, I have 2 extended questions:
1) I am trying to perform SVM on the dataset where customer review as polynominal and sentiment score as bionominal. I had read the tutorials and figured out that SVM can only handle numerical and needed to convert nominal to numerical. However, is it to convert both customer reviews ans sentiment score to numerical? In which steps we need to convert? After processed the data? I am a bit confused of how sentiment analysis work in SVM in rapidminer. The RM tutorial under the sample templates is using text and binominal and not even converting to numerical.
2) I used tokenize, transform cases, filter stopwords, stem porter and filter by token length inside the processed document. It is the same for both preprocessing (1) and (2).
Thanks much for your explanation in advance!
Thanks much for your kind explanation. That helps!
1 last question if I may.
I set "nominal to text" - as "ALL", not selecting as the single attribute e.g. customer review text.
In this case, will the label attribute will also change to Text?
Is it necessary to exclude label attributes from normal to text?
I use set role operator before nominal to text operator.
Thanks,
Dortmund, Germany