The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to map a predicted result from Auto Model to original data?
budyonosaputro
Member Posts: 5 Contributor II
Hi everyone,
I'm very new in Rapidminer and just found a difficulty in here. I have 2 columns of data, the 1st one is text data row which I crawled from twitter, and the 2nd one is the category that belongs to the text classification. The classification is partially done by the manual process and the rest needs to be predicted by Rapidminer. Thus, I auto modeled my text data using a "Predict" task in the first screen and I click next until the results is come out.
I've exported the predictions results into an excel sheet, but I'm confused with the result. Indeed, in the predicted sheet has a prediction for my category, but I don't know how to map the results with my original data. I dont know which one belongs to positive or negative categories.
Your help is really appreciated.
Thanks,
Budyono from Indonesia
Tagged:
0
Best Answer
-
varunm1 Member Posts: 1,207 UnicornHello @budyonosaputro
After some analysis on the auto model process for text, there is one way to get the text column so that you can fill the empty values.
1. Once you run the auto model, you need to select "open process"
2. Then once you open the process there is a block called "Handle Texts" as shown below.
3. Then double click on this "Handle Texts" you will find "Text Vectorization" block, click on that and you can see an option "Keep Originals" in parameters block and select it as shown in below image.
4. Once you select that, you need to run the process. Then you can see multiple results tabs, here select "Explain PredictionsIOObject" tab, now you can see the texts as well as predictions, you can use this to fill your empty columns.
5. You can also write the results into excel so that it will be easy to fill. To do that, in the same process you need to connect "Write Excel" operator to the "exa" port of "Explain Predictions" operator. Fill the File name parameter of write excel and run the process so that you can get the predictions and texts in an excel file.
Hope this helps.Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
3
Answers
The auto models separate the data that have labels and without labels, one reason from my understanding for this separation is that "performance metrics like accuracy cannot be calculated with unlabelled data". Once the data is separated it will again divide the labeled dataset into 60:40 ratio (train: test) and the performance is calculated on the testing data (divided into 7 folds and tested).
What happens to unlabelled data?
The unlabeled data is not removed, it is being predicted by "Apply TV on scoring" operator or the Explain prediction operator based on the type of data inputted into the auto model. You can see this in the predictions tab as shown in the figure. Here you will find a prediction for 40% test dataset as well as unlabelled data. You can also see which data predictions belong to based on the features in this column.
This is a bit confusing statement from your post.
If that is the case then I don't think it is possible without knowing the original labels. You can map predictions to the data based on attributes (features) but you cannot find an original label (maybe assume based on performance) from computational modeling.
Please inform if you need more information. If this is not what you are looking for, provide an example with an XML code or an excel. If you would like to post some images of the problem @Tghadially can help you with that.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thanks for your explanation.
What I mean is I want to fill the unlabeled data/category with the prediction from rapidminer. The result from rapidminer is quite confusing to me and I don't know how to read it.
Fyi, my original data is only 2 columns, the first one is a free-text data from twitter and the second one is a positive/negative category that belongs to the first column. Some of data are already categorized into positive and negative and the rest needs to be filled in with the prediction from rapidminer.
I hope you can understand what my question is. I'd like post a picture in here so maybe you can understand what I need, but it always says "You have to be around for a little while longer before you can post links.". I just sent a message to @Tghadially for this issue and I hope I can immediately upload my pics in here.
Thanks,
Budyono from Indonesia
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thanks,
Budyono from Indonesia
Back again with me and currently I can post any image now, hehe. Sorry for late reply by the way.
Below is the pic I promised to you.
Picture above is example of my original data. I need to fill the yellow colored column with the predicted result from Rapidminer. But the result in rapidminer is like below image and I don't know how to fill my original data with the result from Rapidminer. Can you help me?
Thanks,
Budyono from Indonesia
Thank you so much for your ultimate solution. It really helps me a lot. aaahhh I'm so happy right now. hahaha
Thanks and Regards,
Budyono from Indonesia
While working on this question, I thought of something that needs your suggestion. Can you inform why we are unable to access a process unless we run them in auto model?. Is it possible to access the process from the below page when I click on a decision tree? The reason I am asking this is, in case I want to change something in the model, I need to run it first and then access the process.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing