The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
how to predict response rate or responses in Rapid Miner
User111113
Member Posts: 24 Maven
Hi All,
I'm fairly new to Rapid Miner and looking for a way to predict response rate based on historical data from past 2 years.
I have customer id and categories and of course quantity mailed and responses
for example
id category state year month QtyMailed Responses Received Response Rate
1 a OH 2018 oct 5000 200 4%
1 b CA 2018 Nov 10000 130
1 c PA 2018 dec 35000 512
2
2
and so on.............. I would like to predict responses or response rate let's say for upcoming month
I'm fairly new to Rapid Miner and looking for a way to predict response rate based on historical data from past 2 years.
I have customer id and categories and of course quantity mailed and responses
for example
id category state year month QtyMailed Responses Received Response Rate
1 a OH 2018 oct 5000 200 4%
1 b CA 2018 Nov 10000 130
1 c PA 2018 dec 35000 512
2
2
and so on.............. I would like to predict responses or response rate let's say for upcoming month
0
Best Answers
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn@User111113,
Of course !
- Put a Select Attributes operator after your data retrieval.
- In the parameters of this operator, choose attribute filter type = subset
- Select your 2 or 3 relevant attributes :
Regards,
Lionel
7 -
varunm1 Member Posts: 1,207 UnicornHello @User111113how can I reduce the error rate, have better performance?
Are you optimizing the predictive models? You need to adopt concepts from feature selection, optimizing hyperparameters ("Optimize Parameter Grid"), try different models, generate new features from existing features. As there is no single solution to improve model performance. You can try the above-mentioned concept in your modeling to check if you could get better performance.do I need to validate my models? if yes, then how can we do it after we deployed models using auto-models?Yes, you need to validate your models. There are different validation methods like cross-validation, split validation and multi hold-out validation (used in automodel). Auto model uses multi-hold out validation while training and testing your model. After deploying you can score on new data, I am not clear on this question. Once we deploy a model it just predicts the labels. If you have the new original labels you can always retrieve your trained model and then apply the model on new data and use performance to check the performance of new data.
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thank you for your response.
I tried a few things and looked at some examples. It gives me a lot of errors and asked me to auto fix which I don't even get how and why it is doing so. Only one time it ran and took year as a prediction value where it should be either responses or response rate. I am stuck not sure how to move forward
In order we can understand what's going on, could you share :
- your process ( via File --> Export Process)
- your data
Regards,
Lionel
The error means that the attributes in your training set and the attributes in your test set are not strictly the same.
This error is caused by the Nominal to Numerical operator in the training part of your Cross Validation operator which create attribute(s) in the training set and not in your test set.
The solution is to move the Nominal to Numerical operator outside the CV operator.
In attached file, the working process.
Regards,
Lionel
Thank you for your response. I used decision tree and it looks like it's working fine. I would like to know one more thing here, the responses these models are giving are based on what parameters like in my case I want the model to make predictions based on category and state or may be category, state and total mailed.
Can I set it up myself so it looks only at those 2 or those 3 columns and predict the response.
I have a few more questions I guess....
When I am trying automodel it shows "back" and "next button" sometimes and sometimes it doesn't. If you see the below screenshot I cannot go back or front... and sometimes it do shows up. Do you know how to resolve this.
Strange !
Try to select the attribute you want to predict (the label).
Regards,
Lionel
@Telcontar120
Thank you for your help.
I have more parameters that I want to add to my data to predict responses but I wanted to see a better way. I have indexes which are like 0,1,2,3 let's say responses with index 0 is higher now my data will look like below.
id category index state year month QtyMailed Responses Received
1 a 0 OH 2018 oct 3000 150
1 a 1 OH 2018 oct 1000 40 1 a 2 OH 2018 oct 1000 10
1 b CA 2018 Nov 10000 130
1 c PA 2018 dec 35000 512
2
2
my question is that I know important factors that changes the responses are indexes, state and month of the year but how much are they affecting like may be % wise can we find that out and is it also possible to feed data by counties or zip codes and then see if that makes any difference because people would have responded may be only from 3 zip codes and not from other 2....
I have a lot in my mind hope I am not confusing anyone
When I tried doing AutoModel it says "DeSelect" quantityMailed column and if I do that I knows it's not going to work as I saw response predicted and they were not up to the mark at all technically everything was same... so I never deselect that column
I have difficulties to understand what is your question...
Can you explain more explicitly what you get and what you want to obtain ?
In the meantime, you can indeed apply your dataset to AutoModel. If you have doubt about one or more columns (attributes)
first select it (them) and enable the Automatic Feature Selection before running AutoModel. If, in fine, these attributes are not relevant
they will be removed from the final feature set.
Concerning the "weights", you can see that for several models you have access to the weights of each regular attributes by clicking
on Weights for a given model.
Hope this helps,
Regards,
Lionel
I did more research and modified my data set and generated new models. my questions are:
how can I reduce the error rate, have better performance ?
do I need to validate my models? if yes, then how can we do it after we deployed models using auto-models?
What do you think about grouping the models?