The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Building a predictive decision tree while excluding historical attributes
All,
I am trying to use RapidMiner to build a predictive decision tree. Currently, I have a process that imports historical shipment data for a number of products along with some additional attributes. These other attributes make the product more or less attractive to customers (age, color, size...).
Before my import I categorize the historical, numerical shipment data into 5 buckets called ShipCats - from "very low shipments" (<1000) to "very high shipments" (>10000) so I can use a decision tree in RapidMiner. In addition to the ShipCats attribute, each experiment that I import has a FYDate attribute, which is a date time field along with the shipment results showing the shipments of a product in that year (example: 2010->1549|2011->1722|2012->1999...). The resulting decision tree from RapidMiner, I'm sure, is correct but includes that FYDate attribute.
I am looking to predict ShipCats for new products from a user entry of most of the other attributes that were used to create the decision tree but not the one they can't affect, FYDate. The FYDate, of course, would be the current year.
Do I need to model the historical information first and somehow feed that input into a decision tree operator that only includes variables that can reasonably be chosen?
Thanks very much for this software and your help!
Pat
I am trying to use RapidMiner to build a predictive decision tree. Currently, I have a process that imports historical shipment data for a number of products along with some additional attributes. These other attributes make the product more or less attractive to customers (age, color, size...).
Before my import I categorize the historical, numerical shipment data into 5 buckets called ShipCats - from "very low shipments" (<1000) to "very high shipments" (>10000) so I can use a decision tree in RapidMiner. In addition to the ShipCats attribute, each experiment that I import has a FYDate attribute, which is a date time field along with the shipment results showing the shipments of a product in that year (example: 2010->1549|2011->1722|2012->1999...). The resulting decision tree from RapidMiner, I'm sure, is correct but includes that FYDate attribute.
I am looking to predict ShipCats for new products from a user entry of most of the other attributes that were used to create the decision tree but not the one they can't affect, FYDate. The FYDate, of course, would be the current year.
Do I need to model the historical information first and somehow feed that input into a decision tree operator that only includes variables that can reasonably be chosen?
Thanks very much for this software and your help!
Pat
Tagged:
0
Answers
If it contains the full shipment date then maybe use Date to Numerical operator to convert it into quarter by year, that way you can see if seasonality affects your tree.
Otherwise, yes remove it.
My question is how do I approach building a decision tree that incorporates that history without including an FYDates in the resulting decision tree?
Thank-you for your quick reply!
Pat
Best,
Martin
Dortmund, Germany
Thank-you, that took FYDate out of the decision tree result. Does marking FYDate with a special role adversely affect it's use in the decision tree processing? In other words, is the historical sales data still be considered?