How to map a predicted result from Auto Model to original data?

budyonosaputro · August 2019

Hi everyone,

I'm very new in Rapidminer and just found a difficulty in here. I have 2 columns of data, the 1st one is text data row which I crawled from twitter, and the 2nd one is the category that belongs to the text classification. The classification is partially done by the manual process and the rest needs to be predicted by Rapidminer. Thus, I auto modeled my text data using a "Predict" task in the first screen and I click next until the results is come out.

I've exported the predictions results into an excel sheet, but I'm confused with the result. Indeed, in the predicted sheet has a prediction for my category, but I don't know how to map the results with my original data. I dont know which one belongs to positive or negative categories.

Your help is really appreciated.

Thanks,

Budyono from Indonesia

varunm1 · August 2019

Hello @budyonosaputro

After some analysis on the auto model process for text, there is one way to get the text column so that you can fill the empty values.

1. Once you run the auto model, you need to select "open process"

2. Then once you open the process there is a block called "Handle Texts" as shown below.

3. Then double click on this "Handle Texts" you will find "Text Vectorization" block, click on that and you can see an option "Keep Originals" in parameters block and select it as shown in below image.

4. Once you select that, you need to run the process. Then you can see multiple results tabs, here select "Explain PredictionsIOObject" tab, now you can see the texts as well as predictions, you can use this to fill your empty columns.

5. You can also write the results into excel so that it will be easy to fill. To do that, in the same process you need to connect "Write Excel" operator to the "exa" port of "Explain Predictions" operator. Fill the File name parameter of write excel and run the process so that you can get the predictions and texts in an excel file.

Image: https://us.v-cdn.net/6030995/uploads/editor/9c/qvgma3879ok2.png

Hope this helps.

varunm1 · August 2019

Hello @budyonosaputro

The auto models separate the data that have labels and without labels, one reason from my understanding for this separation is that "performance metrics like accuracy cannot be calculated with unlabelled data". Once the data is separated it will again divide the labeled dataset into 60:40 ratio (train: test) and the performance is calculated on the testing data (divided into 7 folds and tested).

What happens to unlabelled data?
The unlabeled data is not removed, it is being predicted by "Apply TV on scoring" operator or the Explain prediction operator based on the type of data inputted into the auto model. You can see this in the predictions tab as shown in the figure. Here you will find a prediction for 40% test dataset as well as unlabelled data. You can also see which data predictions belong to based on the features in this column.

This is a bit confusing statement from your post.

but I don't know how to map the results with my original data

Do you mean, you don't know how to map the predictions of unlabeled to the original label?
If that is the case then I don't think it is possible without knowing the original labels. You can map predictions to the data based on attributes (features) but you cannot find an original label (maybe assume based on performance) from computational modeling.

Please inform if you need more information. If this is not what you are looking for, provide an example with an XML code or an excel. If you would like to post some images of the problem @Tghadially can help you with that.

budyonosaputro · August 2019

Hi @varunm1,

Thanks for your explanation.
What I mean is I want to fill the unlabeled data/category with the prediction from rapidminer. The result from rapidminer is quite confusing to me and I don't know how to read it.

Fyi, my original data is only 2 columns, the first one is a free-text data from twitter and the second one is a positive/negative category that belongs to the first column. Some of data are already categorized into positive and negative and the rest needs to be filled in with the prediction from rapidminer.

I hope you can understand what my question is. I'd like post a picture in here so maybe you can understand what I need, but it always says "You have to be around for a little while longer before you can post links.". I just sent a message to @Tghadially for this issue and I hope I can immediately upload my pics in here.

Thanks,
Budyono from Indonesia

varunm1 · August 2019

Thanks for your response. I understood your requirement now, once you get access to post image then we can take a look at the predictions so that we can clarify your confusion. Its Sunday night here in the US, you will get access mostly tomorrow morning once they are in office.

budyonosaputro · August 2019

Thanks @varunm1 for your info. I didn't notice that currently is sunday night in there, because in Indonesia is already monday morning, haha. Ok then, I'll get back to you once I can post some images in here yaa.

Thanks,
Budyono from Indonesia

budyonosaputro · August 2019

Hi @varunm1

Back again with me and currently I can post any image now, hehe. Sorry for late reply by the way.
Below is the pic I promised to you.

Image: https://us.v-cdn.net/6030995/uploads/editor/s5/6e28xinxqi5y.png

Picture above is example of my original data. I need to fill the yellow colored column with the predicted result from Rapidminer. But the result in rapidminer is like below image and I don't know how to fill my original data with the result from Rapidminer. Can you help me?

Image: https://us.v-cdn.net/6030995/uploads/editor/xx/x2vpgvyzscfa.png

Thanks,
Budyono from Indonesia

budyonosaputro · August 2019

Dear @varunm1

Thank you so much for your ultimate solution. It really helps me a lot. aaahhh I'm so happy right now. hahaha

Thanks and Regards,
Budyono from Indonesia

varunm1 · August 2019

Hello @IngoRM

While working on this question, I thought of something that needs your suggestion. Can you inform why we are unable to access a process unless we run them in auto model?. Is it possible to access the process from the below page when I click on a decision tree? The reason I am asking this is, in case I want to change something in the model, I need to run it first and then access the process.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How to map a predicted result from Auto Model to original data?

Best Answer

Be Safe. Follow precautions and Maintain Social Distancing

Answers

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing