Missing Attribute when applying model although Training exampleSet and real DataSet are the same

Tennessee · November 2019

Hi,

I am new to rapidMiner and I am trying to classifly YouTube Comments on Innovation Products into Customer Requirement or not.

Both ExampleSets should be the same as I used the wordlist from the training data and applied it to the data I want to classify with the Process Documents from Data Operator. In the following picuture you can see a comparison of both DataSets.

I used RapidMiner Automodel to create an SVM Classification Process and then I stored the Model with this Process. I then used the following Process:

<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">

<macro>

<value>Lets try the relly simple way. I like smart watches</value>

</macro>

</macros>

</context>

</operator>

</operator>

</operator>

</operator>

</process>

</operator>

</process>

However I always get this error:

I used this tutorial on youTube with the videoid VbNhvYQZ2v0 and the rapidMiner Academy TextMining and Machine Learning course to construct my Processes.

This Process shows my Preprocessing for my Training Data:

<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">

</context>

</operator>

</operator>

</operator>

</operator>

</operator>

</operator>

</process>

</operator>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</process>

</operator>

</process>

And this Process shows my Preprocessing for my Data I want to classify:

<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">

</context>

</operator>

</operator>

</operator>

</operator>

</operator>

</operator>

</process>

</operator>

</operator>

</list>

</operator>

</list>

</operator>

</operator>

</process>

</operator>

</process>

I also attached a 100 rows of my sample Data. I apologize if this problem has already been solved (Couldn't find anything useful for my situation) or if I made some simple mistake.

If someone knows how to correct english spelling, (I already tried the python script using textblob posted in the rapid Miner community. It changes words that are already correct for example "Big" to "Fig") I would also be really grateful.

Thanks in advance,

Tennessee

Tennessee · November 2019

Okay so I copied the svm model operator from the automodell process that was created into a cross validation and created a another model. In this model I can feed the data that I need to classify, without getting the error message. Now it works smoothly. But still 3 days wasted.

Hence I recommend if you have problems with the model created by the automodler, copy the model into a cross validation.

sgenzer · November 2019

hi @Tennessee ok I was able to look at this. Your Process Documents preprocessing is NOT the same in your training and testing, which of course it needs to be.

Image: https://us.v-cdn.net/6030995/uploads/editor/5b/54xccxk5gqkk.png

Scott

Telcontar120 · November 2019

Also be very careful with wordlists. You really need to store the wordlist from your original model construction process and then make sure you use that same wordlist when applying the model in the future, otherwise differences in the text you are processing can lead to incompatible results.

Tennessee · November 2019

The reason the preprocessing steps are not the same is due the wordlist I saved while creating the training data to create the model. I used this wordlist as input for the operator 'process documents from data' which allows me to leave out certain preprocessing steps as per rapid miner text mining tutorial on YouTube. Also if I hadn't used the wordlist and the same preprocessing steps for both training and real data I would have gotten different attributes in my prepped tables. Are you sure this is the problem?

Tennessee · November 2019

If you compare both my last two processes you can see that I store a wordlist in the first process and use it in the second one. Also the first picture shows that I have the same amount of attributes in both examples. This would not be possible if I used the generate n gram operator in both preprocessing steps without a wordlist wouldn't it?

Thanks in advance,

Tennessee

Tennessee · November 2019

I have also tried feeding the model with training data and I still get the same error message.

I used rapidminer automodell to create this model and it can't even except its own training data. Something seems very wrong. I'll try manually creating the model.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Missing Attribute when applying model although Training exampleSet and real DataSet are the same

Best Answer

Answers