Naive Bayes

Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. We found that Naive Bayes was critical due to its output being a probability. Given a class variable y and a dependent feature vector x1 through xn, Bayes’ theorem states the following relationship:

Using the naive independence assumption, for all i, this relationship is simplified to

Since is constant given the input, we can use the following classification rule:

So that,

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of . There are GaussianNB, MultinomialNB and BernoulliNB. The accuracy results of these Naive Bayes by Weka are shown in Table 3. And the training and testing results by Python are shown in Table 4.

Table 3: 10 cross-validation accuracy tested by Weka

Table 4: Accuracy and logloss results tested by Python

From the results by Weka and Python, the BernoulliNB performs better than MultinomialNB. Our final choice of Naive Bayes was GaussianNB, where the probability of the features is assumed to be Gaussian:

EECS 349 Machine Learning Project on

San Francisco Crime Classification

Bo Guan, Panitan Wongse-ammat, Xinyuan Zhao

Email: {BoGuan2015, Top, xinyuanzhao2016}@u.northwestern.edu

Northwestern University

Naive Bayes