CSCE 633 Machine Learning (Spring 2010) due: Tues, Feb 23 Project 1 --------- Goal: Implement and test a decision tree algorithm (ID3). You should also implement reduced-error pruning. You may use any programming language you like. However, the code must be own, not shared with another student or copied from a source on the web. You will have to implement a simple parser to read in datasets in the format provided by the UCI Repository (typically column-separated values, one example per row). You will probably want to implement a simple routine for analyzing attribute values and summarizing their distribution. At your discretion, you may experiment with: alternative splitting criteria, stopping criteria, or pruning methods. You might want to compare binary with multi-way splits. You might want see if there are cases where using GainRatio has an advantage. You might want to test different methods for handling missing values. The choice of what to do is up to you - just make it interesting. What to turn in: A written project report (in the format of a conference paper) that describes significant details of your implementation, results of any comparative tests you do, and performance (accuracy) on several databases. Choose several (at least 5) datasets from the UCI ML Repository, including at least one with all numeric variables (like Iris) and one with all categorical variables (like Congressional Voting). At minimum, you should test your DT algorithm against a simple majority classifier, with pruning, and without pruning. You should use proper statistics to report performance, such as cross validation, confidence intervals on accuracy, and paired t-test for comparisons to support any conclusions you draw.