Building a predictive decision tree while excluding historical attributes

pyearick · March 2016

All,

I am trying to use RapidMiner to build a predictive decision tree. Currently, I have a process that imports historical shipment data for a number of products along with some additional attributes. These other attributes make the product more or less attractive to customers (age, color, size...).

Before my import I categorize the historical, numerical shipment data into 5 buckets called ShipCats - from "very low shipments" (<1000) to "very high shipments" (>10000) so I can use a decision tree in RapidMiner. In addition to the ShipCats attribute, each experiment that I import has a FYDate attribute, which is a date time field along with the shipment results showing the shipments of a product in that year (example: 2010->1549|2011->1722|2012->1999...). The resulting decision tree from RapidMiner, I'm sure, is correct but includes that FYDate attribute.

I am looking to predict ShipCats for new products from a user entry of most of the other attributes that were used to create the decision tree but not the one they can't affect, FYDate. The FYDate, of course, would be the current year.

Do I need to model the historical information first and somehow feed that input into a decision tree operator that only includes variables that can reasonably be chosen?

Thanks very much for this software and your help!

Pat

JEdward · March 2016

If FYDate just contains year then I wouldn't include it in your modelling as it wouldn't be very useful for your future predictions.
If it contains the full shipment date then maybe use Date to Numerical operator to convert it into quarter by year, that way you can see if seasonality affects your tree.

Otherwise, yes remove it.

pyearick · March 2016

JEdward wrote:

If FYDate just contains year then I wouldn't include it in your modelling as it wouldn't be very useful for your future predictions.
If it contains the full shipment date then maybe use Date to Numerical operator to convert it into quarter by year, that way you can see if seasonality affects your tree.

Otherwise, yes remove it.

So the yearly information is by year, it could be by quarter or by month and I feel that it is important to include in a model. The increase or decrease of shipments over history is what we are observing. Otherwise, we don't know if the product with particular attributes is more successful than one with other chosen attributes. My goal is to try to build a decision tree to determine which products to build next based on our past results incorporating certain attributes and not others.

My question is how do I approach building a decision tree that incorporates that history without including an FYDates in the resulting decision tree?

Thank-you for your quick reply!

Pat

MartinLiebig · March 2016

I would recommend to set the role of FYDate to a special role. Simply use Set Role for this and type anything into the role box.

Best,
Martin

pyearick · March 2016

Martin,

Thank-you, that took FYDate out of the decision tree result. Does marking FYDate with a special role adversely affect it's use in the decision tree processing? In other words, is the historical sales data still be considered?

Martin Schmitz wrote:

I would recommend to set the role of FYDate to a special role. Simply use Set Role for this and type anything into the role box.

Best,
Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Building a predictive decision tree while excluding historical attributes

Answers