The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Decision Tree on a huge sparse dataset"
aryan_hosseinza
Member Posts: 74 Contributor II
Hi,
I have very sparse dataset with huge number of attributes (~12 K features and 700K records) I can not fit it in memory (attribute values are binomial i.e. True/False) ,
As it is sparse I keep the dataset in (ID , Feature) format, so for example I would have the following records :
(ID , Feature)
(110 , d_0022)
(110 , d_2393)
(110 , i_2293)
(822 , d_933)
(822 , p_2003)
....
So we would have three attributes with true value (d_0022 ; 2_2393 ; i_2293) for the record with ID : 110 and the rest are false (attributes are all distinct values of the attribute "feature")
Is it possible to train decision tree while not making the whole dataset first ?
Thanks
I have very sparse dataset with huge number of attributes (~12 K features and 700K records) I can not fit it in memory (attribute values are binomial i.e. True/False) ,
As it is sparse I keep the dataset in (ID , Feature) format, so for example I would have the following records :
(ID , Feature)
(110 , d_0022)
(110 , d_2393)
(110 , i_2293)
(822 , d_933)
(822 , p_2003)
....
So we would have three attributes with true value (d_0022 ; 2_2393 ; i_2293) for the record with ID : 110 and the rest are false (attributes are all distinct values of the attribute "feature")
Is it possible to train decision tree while not making the whole dataset first ?
Thanks
Tagged:
0
Answers
Best regards,
Marius