CSCE 633 Project 1 - Decision Tree due: Tues, Feb 21, 2012 Implement the decision tree algorithm (recursive partitioning, ID3) in any language you like. You will have to handle both discrete and continuous attributes, but you may assume the target feature is discrete (i.e. classification problems, though they might have >2 classes). You must also implement some form of pruning. To test your algorithm, you can run it on datasets from the UCI Machine Learning Repository. The data files come in a simple input format: one example on each line, with a list of features separated by commas, with the target class typically last. You will have to write a simple parsing routine to read-in the data. Each dataset includes a file called "*.names" that indicates the names, type, and range of values for each feature. You should at least present results on the following databases for comparison: voting, mushroom, heart disease, and iris. You may include any other datasets that interest you. There are some other good ones, like census, LED, secondary structure, promoters, chess, wine, etc. You will also have to write a simple output routine the prints a given tree out in a hierarchical way (using indentation), and you will probably want to label each node with proportion of training examples in each class observed at each node. Here is an example: water-project-cost-sharing=yes [10 republican, 5 democrat] adoption-of-the-budget-resolution=yes [4 republican, 1 democrat] class=republican [4 republican, 0 democrat] adoption-of-the-budget-resolution=no [4 republican, 1 democrat] class=democrat [0 republican, 1 democrat] water-project-cost-sharing=no [2 republican, 5 democrat] ... Evaluate your algorithm using 10-fold cross-validation. Report approriate statististics for estimating true accuracy and confidence intervals. Also, perform a formal comparison between two alternative versions of your algorithm, e.g. 2 different methods for handling continutous attributes, or 2 different splitting criteria, or 2 different pruning methods. Evaluate the statistical significance of the difference in accuracy. What to turn in: a written report, plus a print-out of your code. Be sure to include some example trees (small ones), and some examples of pruning. Focus on a brief description of your implementation (the methodological details, like what splitting criterion you use and how you handle continuous attribute, not coding details like data structures). Put most of your effort into an "expermental" or "results" section, in which you present statistics like accuracy and tree size for different dataset. The experiments should also investigate the relative performance of two types of pruning strategies. Include an interpretation and/or explanation of your results and a summary of your observations.