Problem with too many parameter to put as columns into example set

ahaensel · July 2013

My problem-task is that I have customers with a unique ID and they have parameter (binomial) and I would like to predict the value of certain target variables, so far only one but possible multiple.
In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.
meta data:
Role Name Type
id Customer_Id integer
label Target binominal
regular Para1 binominal
regular Para2 binominal
regular Para3 binominal
regular Para4 binominal
dataset:
Customer_Id Target Para1 Para2 Para3 Para4
1 M 1 0 1 0
2 V 1 0 0 1
3 M 0 1 1 1

=> With Naïve Bayes I get great prediction results in the test case with limited dimensions.

Problem with the actual dataset:
I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:
meta data:
Role Name Type
id Customer_Id integer
label Target binominal
regular ActivePara polynominal
data:
Customer_Id Target ActivePara
1 M Para1
1 M Para3
2 V Para1
2 V Para4
3 M Para2
3 M Para3
3 M Para4

BUT now I do not get consistent predictions per customer what I get is something like this
Customer_Id Target ActivePara Prediction of Target
1 M Para1 V
1 M Para3 M
2 V Para1 V
2 V Para4 V
3 M Para2 M
3 M Para3 M
3 M Para4 V

But I want/need the target prediction per customer_id to be consistent.

How do I need to set up the input data/ the model to get the result!

Thanks a lot in advance for any hints and help!!!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Problem with too many parameter to put as columns into example set