RAPIDMINER 9.7 BETA ANNOUNCEMENT
The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!
Building a predictive decision tree while excluding historical attributes
I am trying to use RapidMiner to build a predictive decision tree. Currently, I have a process that imports historical shipment data for a number of products along with some additional attributes. These other attributes make the product more or less attractive to customers (age, color, size...).
Before my import I categorize the historical, numerical shipment data into 5 buckets called ShipCats - from "very low shipments" (<1000) to "very high shipments" (>10000) so I can use a decision tree in RapidMiner. In addition to the ShipCats attribute, each experiment that I import has a FYDate attribute, which is a date time field along with the shipment results showing the shipments of a product in that year (example: 2010->1549|2011->1722|2012->1999...). The resulting decision tree from RapidMiner, I'm sure, is correct but includes that FYDate attribute.
I am looking to predict ShipCats for new products from a user entry of most of the other attributes that were used to create the decision tree but not the one they can't affect, FYDate. The FYDate, of course, would be the current year.
Do I need to model the historical information first and somehow feed that input into a decision tree operator that only includes variables that can reasonably be chosen?
Thanks very much for this software and your help!