The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Processing input and output for Test Data - SVM

HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor I
edited July 2020 in Help
Dear all,

I am noticed that when I perform the same data processing steps for testing(unseen/unlabel dataset), the output of processing eliminates the column (it eliminates the regular attributes and only return the label) i.e. different from the output of processing for training dataset where the output of training dataset return the label and regular attributes.
It is showing the error also.
It would be truly great if someone can advise me or educate me in this modelling process.
Thanks much in advance.


Tagged:

Best Answer

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    this is a warning, not an error. Are you sure its failing when you run it?

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor I
    Hello @mschmitz

    Thanks a lot for your response.
    It is not failing, it is a warning. But, why this warning is showing even tho it is not missing.
    However, what I am confused is: the exa of process document from data(2) operator is eliminating the columns. 
    E.g. for the exa of process document from data(1) operator, the words are tokenized and transform into a number of regular attributes. However, for (2), the exe is showing 0 attributes and only one label attributes. 
    I am not able to understand whether I am doing it correctly or not.
    What I understood is the pre-processing for training and unseen/unlabel dataset should be the same and we take the word output of pre-processing from training dataset as the input for unseen/unlabel dataset. 

    Looking forward for your kind explanation.

    thanks.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    generally what you do looks correct to me, but i of course would need to check details to see whats going on. Especially of interest is what happens "inside" Process Documents from Data (2).

    The thing with warnings is, that we need to generate the warnings from the meta data we have. We transfer the header of your table through the process, without having the real data, since we did not execute the process yet. So some things are not known to us. One example for this is: What colums are created in the Process Documents? You can only know that when you run it.
    That's the reason why you sometimes get "unfitting" warnings. But this is also why these are warnings, not errors.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor I
    Hello @mschmitz
    Thanks for your prompt reply. Well noted for the warnings case.
    However, I have 2 extended questions:
    1) I am trying to perform SVM on the dataset where customer review as polynominal and sentiment score as bionominal. I had read the tutorials and figured out that SVM can only handle numerical and needed to convert nominal to numerical. However, is it to convert both customer reviews ans sentiment score to numerical? In which steps we need to convert? After processed the data? I am a bit confused of how sentiment analysis work in SVM in rapidminer. The RM tutorial under the sample templates is using text and binominal and not even converting to numerical.
    2) I used tokenize, transform cases, filter stopwords, stem porter and filter by token length inside the processed document. It is the same for both preprocessing (1) and (2).

    Thanks much for your explanation in advance!
  • HeikoeWin786HeikoeWin786 Member Posts: 64 Contributor I
    @mschmitz
    Thanks much for your kind explanation. That helps!
    1 last question if I may.
    I set "nominal to text" - as "ALL", not selecting as the single attribute e.g. customer review text.
    In this case, will the label attribute will also change to Text?
    Is it necessary to exclude label attributes from normal to text?
    I use set role operator before nominal to text operator.

    Thanks,
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    if you check "include special attributes" then the label is also converted to text. If you don't check it, its not.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.