The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Problem with too many parameter to put as columns into example set
My problem-task is that I have customers with a unique ID and they have parameter (binomial) and I would like to predict the value of certain target variables, so far only one but possible multiple.
In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.
meta data:
Role Name Type
id Customer_Id integer
label Target binominal
regular Para1 binominal
regular Para2 binominal
regular Para3 binominal
regular Para4 binominal
dataset:
Customer_Id Target Para1 Para2 Para3 Para4
1 M 1 0 1 0
2 V 1 0 0 1
3 M 0 1 1 1
=> With Naïve Bayes I get great prediction results in the test case with limited dimensions.
Problem with the actual dataset:
I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:
meta data:
Role Name Type
id Customer_Id integer
label Target binominal
regular ActivePara polynominal
data:
Customer_Id Target ActivePara
1 M Para1
1 M Para3
2 V Para1
2 V Para4
3 M Para2
3 M Para3
3 M Para4
BUT now I do not get consistent predictions per customer what I get is something like this
Customer_Id Target ActivePara Prediction of Target
1 M Para1 V
1 M Para3 M
2 V Para1 V
2 V Para4 V
3 M Para2 M
3 M Para3 M
3 M Para4 V
But I want/need the target prediction per customer_id to be consistent.
How do I need to set up the input data/ the model to get the result!
Thanks a lot in advance for any hints and help!!!
In my test case I used the following input dataset, see meta data, each customer is represented in a row and the parameter are in the columns – simply the usual way.
meta data:
Role Name Type
id Customer_Id integer
label Target binominal
regular Para1 binominal
regular Para2 binominal
regular Para3 binominal
regular Para4 binominal
dataset:
Customer_Id Target Para1 Para2 Para3 Para4
1 M 1 0 1 0
2 V 1 0 0 1
3 M 0 1 1 1
=> With Naïve Bayes I get great prediction results in the test case with limited dimensions.
Problem with the actual dataset:
I have some 100,000s of parameter and the number is growing a lot. The actual number of active parameter for a customer is very small and so the table would be extremely large and sparse. So my idea was to use the following dataset format as input:
meta data:
Role Name Type
id Customer_Id integer
label Target binominal
regular ActivePara polynominal
data:
Customer_Id Target ActivePara
1 M Para1
1 M Para3
2 V Para1
2 V Para4
3 M Para2
3 M Para3
3 M Para4
BUT now I do not get consistent predictions per customer what I get is something like this
Customer_Id Target ActivePara Prediction of Target
1 M Para1 V
1 M Para3 M
2 V Para1 V
2 V Para4 V
3 M Para2 M
3 M Para3 M
3 M Para4 V
But I want/need the target prediction per customer_id to be consistent.
How do I need to set up the input data/ the model to get the result!
Thanks a lot in advance for any hints and help!!!
0