Processing input and output for Test Data - SVM

HeikoeWin786 · July 2020

Dear all,

I am noticed that when I perform the same data processing steps for testing(unseen/unlabel dataset), the output of processing eliminates the column (it eliminates the regular attributes and only return the label) i.e. different from the output of processing for training dataset where the output of training dataset return the label and regular attributes.
It is showing the error also.
It would be truly great if someone can advise me or educate me in this modelling process.
Thanks much in advance.

Image: https://us.v-cdn.net/6030995/uploads/editor/cl/s0wynopha3ax.png

MartinLiebig · July 2020

Hi @HeikoeWin786 ,

on 1) SVM can handle only numerical attributes, yes. This is why you need to do the tf/idf to convert your texts to a numerical vector. However, it can use binominal labels. so this should work.

on 2) seems good to me then.

Best,

Martin

MartinLiebig · July 2020

Hi,

this is a warning, not an error. Are you sure its failing when you run it?

Cheers,

Martin

HeikoeWin786 · July 2020

Hello @mschmitz

Thanks a lot for your response.
It is not failing, it is a warning. But, why this warning is showing even tho it is not missing.
However, what I am confused is: the exa of process document from data(2) operator is eliminating the columns.
E.g. for the exa of process document from data(1) operator, the words are tokenized and transform into a number of regular attributes. However, for (2), the exe is showing 0 attributes and only one label attributes.
I am not able to understand whether I am doing it correctly or not.
What I understood is the pre-processing for training and unseen/unlabel dataset should be the same and we take the word output of pre-processing from training dataset as the input for unseen/unlabel dataset.

Looking forward for your kind explanation.

thanks.

MartinLiebig · July 2020

Hi @HeikoeWin786 ,

generally what you do looks correct to me, but i of course would need to check details to see whats going on. Especially of interest is what happens "inside" Process Documents from Data (2).

The thing with warnings is, that we need to generate the warnings from the meta data we have. We transfer the header of your table through the process, without having the real data, since we did not execute the process yet. So some things are not known to us. One example for this is: What colums are created in the Process Documents? You can only know that when you run it.

That's the reason why you sometimes get "unfitting" warnings. But this is also why these are warnings, not errors.

Cheers,

Martin

HeikoeWin786 · July 2020

Hello @mschmitz
Thanks for your prompt reply. Well noted for the warnings case.
However, I have 2 extended questions:
1) I am trying to perform SVM on the dataset where customer review as polynominal and sentiment score as bionominal. I had read the tutorials and figured out that SVM can only handle numerical and needed to convert nominal to numerical. However, is it to convert both customer reviews ans sentiment score to numerical? In which steps we need to convert? After processed the data? I am a bit confused of how sentiment analysis work in SVM in rapidminer. The RM tutorial under the sample templates is using text and binominal and not even converting to numerical.
2) I used tokenize, transform cases, filter stopwords, stem porter and filter by token length inside the processed document. It is the same for both preprocessing (1) and (2).

Thanks much for your explanation in advance!

HeikoeWin786 · July 2020

@mschmitz
Thanks much for your kind explanation. That helps!
1 last question if I may.
I set "nominal to text" - as "ALL", not selecting as the single attribute e.g. customer review text.
In this case, will the label attribute will also change to Text?
Is it necessary to exclude label attributes from normal to text?
I use set role operator before nominal to text operator.

Thanks,

MartinLiebig · July 2020

Hi,

if you check "include special attributes" then the label is also converted to text. If you don't check it, its not.

Best,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Processing input and output for Test Data - SVM

Best Answer

Answers