K-Nearest Neighbors | 349project

K-Nearest Neighbor

The k-nearest neighbors or k-NN is a method used for both regression and classification. This algorithm will find the k closest dataset in the feature space, its output relies on whether k-NN is used for either regression or classification application. In our case, it outputs binary class memberships that it classifies by a vote of neighbors common among its k nearest neighbors. k-NN is considered as a lazy learning since it does not create the model until the classification begins.

In our implementation, we try to change the number of k, weight, and algorithms used to compute the nearest neighbors. When we compute the percent accuracy and the logloss, we divide the training set into 2 subsets which are training set and validation sets. Then, we validate the algorithm with the validation set, and test the model again with the test set. All results of the k-NN are shown below. From the table, the best result from the k-NN is from k = 3200,weight = uniform and algorithm =auto although we tried to change weight and algorithm to distance as well as ball_tree and kd_tree. In addition, when we set k=5, the logloss of the validation set is the lowest, but the logloss of the test set is a lot higher. Thus, we have to take the logloss of training set and the test set into consideration.

Table 6: the logloss and accuracy of the k-nearest neighbors

EECS 349 Machine Learning Project on

San Francisco Crime Classification

Bo Guan, Panitan Wongse-ammat, Xinyuan Zhao

Email: {BoGuan2015, Top, xinyuanzhao2016}@u.northwestern.edu

Northwestern University

K-Nearest Neighbor