Programming Assignment 8

Summary

In this assignment you will implement k-nearest neighbor classifiers.

NOTE: This is an optional assignment, This will only be added to your programming assignment average if it Improves it. If you do not make a submission or if your points awarded does not increase your average, this assignment will be ignored.


Command-line Arguments

You must implement a program that learns a naive Bayes classifier for a classification problem, given some training data and some additional options. In particular, your program can be invoked as follows:
knnclassify <training_file> <test_file> <k>

Both the training file and the test file are text files, containing data in tabular format. Each value is a number, and values are separated by white space. The i-th row and j-th column contain the value for the j-th feature of the i-th object. The only exception is the LAST column, that stores the class label for each object.Make sure you do not use data from the last column (i.e., the class labels) as attributes (features) in your decision tree.

Example files that can be passed as command-line arguments are in the datasets directory. That directory contains three datasets, copied from the UCI repository of machine learning datasets:

For each dataset, a training file and a test file are provided. The name of each file indicates what dataset the file belongs to, and whether the file contains training or test data.

Note that, for the purposes of your assignment, it does not matter at all where the data came from. One of the attractive properties of decision trees (and many other machine learning methods) is that they can be applied in the exact same way to many different types of data, and produce useful results.


Training

For this phase you should classify the test data using a k-nearest neighbor classifier. The value of k is specified by the third command-line argument.

In your k-nearest neighbor classifier, you should use the following guidelines:

There is no need to output anything for the training phase.

Classification

For each test object you should print a line containing the following info:To produce this output in a uniform manner, use these printing statements:After you have printed the results for all test objects, you should print the overall classification accuracy, which is defined as the average of the classification accuracies you printed out for each test object. To print the classification accuracy in a uniform manner, use these printing statements:


Grading


How to submit

Submissions should be made using Blackboard. Implementations in Python, C, C++, and Java will be accepted. If you would like to use another language, please first check with the instructor via e-mail. Points will be taken off for failure to comply with this requirement.

Submit a ZIPPED directory called programming-assignment8.zip (no other forms of compression accepted, contact the instructor or TA if you do not know how to produce .zip files). The directory should contain:

Insufficient or unclear instructions will be penalized by up to 20 points. Code that does not run on omega machines gets AT MOST half credit, unless you obtained prior written permission.