cap5638 term project report - florida state...

32
Britton Dennis bbd09 CAP5638 Term Project Report I created a Java application that runs k-nearest neighbor, linear discriminant, and decision tree classifiers on sample datasets. K-nearest neighbor works on a kn value that you can choose, linear discriminant classifier uses one against one on algorithm 4, and decision trees use entropy impurity to branch the features. The datasets have between 2 and 5 (inclusive) classes and use a training set of 100 samples (evenly distributed) and 20 testing samples (evenly distributed). They use only a small number of samples as this is just an example application used to compare the different classifiers and I didn't want the run times to be take a long time. The datasets also only have two features and the samples are restricted to a range of (<-100,100>, <-100, 100>). This is done so that I could plot how the datasets were distributed graphically. As for the distribution of the samples within the datasets, I simply tried to provide a range of different options to try a show where one classifier will be better than another. The plot is used to show the dataset distribution. It shows the training samples as a more translucent/whiter color and the testing samples as a more opaque/darker color. The classes are colored as follows, class 1 is blue, class 2 is red, class 3 is green, class 4 is yellow, and class 5 is purple. I had intended to show decision boundaries for the different classifiers too, but I ran out of time. I also wanted to add in parzen windows and a boosting algorithm, but because I spent too much time on the decision trees I was unable to. Speaking of which, the decision tree classifier isn't always generating the correct results and hangs/crashes on the 'blurred' datasets. I was never able to find the reason why, but I went ahead and included it in anyway. Due to Java security restrictions, the Java applet I originally made had to be converted to a Java application. It can be run by simply double clicking it in windows or using `java -jar 'application.java'` in linux. If it doesn't work, you might have to download the Java SDK, but I think it should work without it. Regardless, if you have questions, let me know and I'll try to help. As an emergency measure, a sample run (shows only a subset of the features) has been provided. I don't really have any sources to cite as the datasets were taken from my head and the classifiers were taken from the book / previous programming assignments from this class (rewritten as they were in a different programming language).

Upload: others

Post on 20-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

Britton Dennisbbd09

CAP5638 Term Project

Report

I created a Java application that runs k-nearest neighbor, linear discriminant, and decision tree classifiers on sample datasets. K-nearest neighbor works on a kn value that you can choose, linear discriminant classifier uses one against one on algorithm 4, and decision trees use entropy impurity to branch the features.

The datasets have between 2 and 5 (inclusive) classes and use a training set of 100 samples (evenly distributed) and 20 testing samples (evenly distributed). They use only a small number of samples as this is just an example application used to compare the different classifiers and I didn't want the run times to be take a long time. The datasets also only have two features and the samples are restricted to arange of (<-100,100>, <-100, 100>). This is done so that I could plot how the datasets were distributed graphically. As for the distribution of the samples within the datasets, I simply tried to provide a range of different options to try a show where one classifier will be better than another.

The plot is used to show the dataset distribution. It shows the training samples as a more translucent/whiter color and the testing samples as a more opaque/darker color. The classes are colored as follows, class 1 is blue, class 2 is red, class 3 is green, class 4 is yellow, and class 5 is purple. I had intended to show decision boundaries for the different classifiers too, but I ran out of time. I also wanted to add in parzen windows and a boosting algorithm, but because I spent too much time on the decision trees I was unable to. Speaking of which, the decision tree classifier isn't always generating the correct results and hangs/crashes on the 'blurred' datasets. I was never able to find the reason why, but I went ahead and included it in anyway.

Due to Java security restrictions, the Java applet I originally made had to be converted to a Java application. It can be run by simply double clicking it in windows or using `java -jar 'application.java'` in linux. If it doesn't work, you might have to download the Java SDK, but I think it should work without it. Regardless, if you have questions, let me know and I'll try to help. As an emergency measure, a sample run (shows only a subset of the features) has been provided.

I don't really have any sources to cite as the datasets were taken from my head and the classifiers were taken from the book / previous programming assignments from this class (rewritten as they were in a different programming language).

Page 2: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

Source Code

//ClassifierApplication.javapackage PatternRecognition;

import java.awt.BorderLayout;import java.awt.GridLayout;import java.awt.event.ActionEvent;import java.awt.event.ActionListener;import java.util.ArrayList;import java.util.Collections;import java.util.Enumeration;import java.util.Hashtable;import javax.swing.DefaultComboBoxModel;import javax.swing.JButton;import javax.swing.JComboBox;import javax.swing.JFrame;import javax.swing.JLabel;import javax.swing.JOptionPane;import javax.swing.JPanel;import javax.swing.JScrollPane;import javax.swing.JTextArea;import javax.swing.event.ListDataEvent;import javax.swing.event.ListDataListener;

public class ClassifierApplication extends JFrame implements ActionListener, ListDataListener {

private JPanel topPane; private JPanel northPanel; private JPanel eastPanel; private JPanel westPanel; private JPanel container; private JLabel datasetLabel; private DefaultComboBoxModel datasetData; private JComboBox datasets; private JLabel classifierLabel; private DefaultComboBoxModel classifierData; private JComboBox classifiers; private DefaultComboBoxModel parameterKValue; private JComboBox parameterKBox; private JLabel parameterKLabel; private JButton training; private JButton testing; private JTextArea status; private JScrollPane statusScroll;

//todo: add in time / accuracy values here private JPanel plotHolder; private Plot plot; private Hashtable<String, Dataset> datasetHash; private Dataset dataset; private Classifier classifier; public static void main(String[] args) { ClassifierApplication app = new ClassifierApplication(); app.init(); app.setDefaultCloseOperation(DISPOSE_ON_CLOSE); app.setTitle("Classification Example Application"); app.setVisible(true); } public void init() { this.setSize(800, 600);

Page 3: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

datasetLabel = new JLabel("Datasets: "); datasetHash = new Hashtable<String, Dataset>(); datasetHash.put("2 Classes", DatasetFactory.TwoClass()); datasetHash.put("4 Classes", DatasetFactory.FourClass()); datasetHash.put("5 Classes", DatasetFactory.FiveClass()); datasetHash.put("2 Classes Blured", DatasetFactory.TwoClassBlur()); datasetHash.put("4 Classes Blured", DatasetFactory.FourClassBlur()); datasetHash.put("2 Classes Far Apart", DatasetFactory.TwoClassFar()); datasetHash.put("2 Classes Wrong", DatasetFactory.TwoClassWrong()); datasetHash.put("2 Classes with Random Bias", DatasetFactory.TwoClassRandomBias()); datasetHash.put("2 Classes Completly Random", DatasetFactory.TwoClassTotalRandom()); datasetHash.put("4 Classes with Random Bias", DatasetFactory.FourClassRandomBias()); datasetHash.put("5 Classes Completly Random", DatasetFactory.FiveClassTotalRandom()); datasetData = new DefaultComboBoxModel(); //new DefaultListModel(); Enumeration<String> k = datasetHash.keys(); ArrayList<String> list = new ArrayList(); while(k.hasMoreElements()) { list.add(k.nextElement()); } Collections.sort(list); for(int i = 0; i < list.size(); i += 1) { datasetData.addElement(list.get(i)); } datasetData.addListDataListener(this); datasets = new JComboBox(datasetData); classifierLabel = new JLabel("Classifiers: "); classifierData = new DefaultComboBoxModel(); classifierData.addElement("K Nearest Neighbor"); classifierData.addElement("Linear Discriminant Function"); classifierData.addElement("Decision Tree"); classifierData.addListDataListener(this); classifiers = new JComboBox(classifierData); parameterKLabel = new JLabel("Value of Kn:"); parameterKValue = new DefaultComboBoxModel(); for (int i = 1; i <= 40; i += 1) { parameterKValue.addElement(i); } parameterKValue.addListDataListener(this); parameterKBox = new JComboBox(parameterKValue); training = new JButton("Run Training"); training.setActionCommand("training"); training.addActionListener(this); testing = new JButton("Run Classification"); testing.setActionCommand("testing"); testing.addActionListener(this); status = new JTextArea(); status.setEditable(false); status.setText("Status Area:\n"); status.setText(status.getText() + "=============\n"); status.setText(status.getText() + "*Select a dataset, classifier, and a parameter (if required)\n"); status.setText(status.getText() + "*The current dataset is plotted at the bottom\n"); status.setText(status.getText() + "*Each class has a separate color, lighter/transparent are training points and darker/opaque are testing points\n"); status.setText(status.getText() + "*There are about 50 training points equaly distributed amoungst the classes\n"); status.setText(status.getText() + "*There are about 20 testing points equaly distributed amoungst the classes\n"); status.setText(status.getText() + "*Use training button to train a classifier and classify button to see how well it does on the testing set\n"); status.setText(status.getText() + "*Not all classifiers require training, but those that do will notify you here when you try to run classification\n"); status.setText(status.getText() + "*Also, all trainings print the success rates, but not all trainings actually do anything with them, thus some classifiers will print out 0% during training\n"); status.setText(status.getText() + "*Note, if you change datasets, classifiers, or parameters, you will have to retrain the classifier\n"); status.setText(status.getText() + "=============\n"); statusScroll = new JScrollPane(status);

Page 4: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

plot = new Plot(datasetHash.get(datasetData.getSelectedItem()).prepareToPlot()); plotHolder = new JPanel(new BorderLayout()); plotHolder.add(plot, BorderLayout.CENTER); northPanel = new JPanel(); westPanel = new JPanel(); eastPanel = new JPanel(); container = new JPanel(new BorderLayout()); topPane = new JPanel(new GridLayout(2, 1)); northPanel.add(datasetLabel); northPanel.add(datasets); northPanel.add(classifierLabel); northPanel.add(classifiers); northPanel.add(parameterKLabel); northPanel.add(parameterKBox); container.add(northPanel, BorderLayout.NORTH); westPanel.add(training); container.add(westPanel, BorderLayout.WEST);

eastPanel.add(testing); container.add(eastPanel, BorderLayout.EAST); container.add(statusScroll, BorderLayout.CENTER); topPane.add(container); topPane.add(plotHolder); add(topPane); setClassifier(); }

//@Override //public void paint(Graphics g) { //g.drawString("Hello applet!", 50, 25); //} // TODO overwrite start(), stop() and destroy() methods

@Override public void actionPerformed(ActionEvent e) { String command = e.getActionCommand(); if (command.equals("training")) { status.setText(status.getText() + "Training...\n"); int ret = classifier.train(); status.setText(status.getText() + "Done!\n"); status.setText(status.getText() + "Time taken: " + String.valueOf(classifier.trainingTimeDiff()) + "ms\n"); if("K Nearest Neighboor".equals((String)classifierData.getSelectedItem())) { status.setText(status.getText() + "Percent Correctly Classified: " + String.valueOf(classifier.trainingPercentCorrect()) + "%\n"); status.setText(status.getText() + "Percent Incorrectly Classified: " + String.valueOf(classifier.trainingPercentIncorrect()) + "%\n"); status.setText(status.getText() + "Percent Ambiguous: " + String.valueOf(classifier.trainingPercentAmbiguous()) + "%\n"); } status.setText(status.getText() + "\n"); } else if (command.equals("testing")) { status.setText(status.getText() + "Classifying...\n"); int ret = classifier.classify(); if (ret == -1) { status.setText(status.getText() + "Can't classify yet, you need to train first\n"); status.setText(status.getText() + "\n"); } else { status.setText(status.getText() + "Done!\n"); status.setText(status.getText() + "Time taken: " + String.valueOf(classifier.classicationTimeDiff()) + "ms\n"); status.setText(status.getText() + "Percent Correctly Classified: " + String.valueOf(classifier.classificationPercentCorrect()) + "%\n"); status.setText(status.getText() + "Percent Incorrectly Classified: " + String.valueOf(classifier.classificationPercentIncorrect()) + "%\n"); status.setText(status.getText() + "Percent Ambiguous: " + String.valueOf(classifier.classificationPercentAmbiguous()) + "%\n"); status.setText(status.getText() + "\n"); }

Page 5: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

} }

@Override public void intervalAdded(ListDataEvent e) {}

@Override public void intervalRemoved(ListDataEvent e) {}

@Override public void contentsChanged(ListDataEvent e) { setClassifier(); } private void setClassifier() { dataset = datasetHash.get(datasetData.getSelectedItem()); plot.changeDataset(dataset.prepareToPlot()); String algorithm = (String)classifierData.getSelectedItem(); if ("K Nearest Neighbor".equals(algorithm)) { classifier = new KNN(dataset, (Integer)parameterKValue.getSelectedItem()); parameterKBox.setVisible(true); parameterKLabel.setVisible(true); } else if ("Linear Discriminant Function".equals(algorithm)) { classifier = new LDF(dataset); parameterKBox.setVisible(false); parameterKLabel.setVisible(false); } else if ("Decision Tree".equals(algorithm)) { classifier = new DT(dataset); parameterKBox.setVisible(false); parameterKLabel.setVisible(false); } }}

//Classifier.javapackage PatternRecognition;

import static java.lang.System.exit;import java.math.BigInteger;import java.util.ArrayList;import java.util.Collections;import java.util.Comparator;import java.util.Date;import javax.swing.tree.DefaultMutableTreeNode;

public abstract class Classifier { //Instead of returned, these values are kept //so they can be used after the classification is done //this prevents having to store a large ambiguous array //and allows the classifier to return a success / error protected Dataset dataset; protected Date trainingTime1; protected Date trainingTime2; protected Date classificationTime1; protected Date classificationTime2; protected int trainingSamples; protected int classificationSamples; protected float trainingNumCorrect; protected float trainingNumIncorrect; protected float trainingNumAmbiguous; protected float classificationNumCorrect; protected float classificationNumIncorrect; protected float classificationNumAmbiguous; protected boolean canClassify; protected Classifier(Dataset d) { dataset = d; trainingTime1 = new Date();

Page 6: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

trainingTime2 = new Date(); classificationTime1 = new Date(); classificationTime2 = new Date(); trainingSamples = 0; classificationSamples = 0; trainingNumCorrect = 0; trainingNumIncorrect = 0; trainingNumAmbiguous = 0; classificationNumCorrect = 0; classificationNumIncorrect = 0; classificationNumAmbiguous = 0; canClassify = false; } abstract int train(); abstract int classify(); public BigInteger trainingTimeDiff() { BigInteger t1 = new BigInteger(String.valueOf(trainingTime1.getTime())); BigInteger t2 = new BigInteger(String.valueOf(trainingTime2.getTime())); return t2.subtract(t1); } public BigInteger classicationTimeDiff() { BigInteger t1 = new BigInteger(String.valueOf(classificationTime1.getTime())); BigInteger t2 = new BigInteger(String.valueOf(classificationTime2.getTime())); return t2.subtract(t1); } public float trainingPercentCorrect() { if (!canClassify) { return -1; } return ((float)trainingNumCorrect / (float)trainingSamples) * 100.0f; } public float trainingPercentIncorrect() { if (!canClassify) { return -1; } return ((float)trainingNumIncorrect / (float)trainingSamples) * 100.0f; } public float trainingPercentAmbiguous() { if (!canClassify) { return -1; } return ((float)trainingNumAmbiguous / (float)trainingSamples) * 100.0f; } public float classificationPercentCorrect() { if (!canClassify) { return -1; } return ((float)classificationNumCorrect / (float)classificationSamples) * 100.0f; } public float classificationPercentIncorrect() { if (!canClassify) { return -1; } return ((float)classificationNumIncorrect / (float)classificationSamples) * 100.0f; } public float classificationPercentAmbiguous() { if (!canClassify) { return -1; } return ((float)classificationNumAmbiguous / (float)classificationSamples) * 100.0f; } public <T extends Comparable> T maxValue(ArrayList<T> a) { return a.get(maxIndex(a));

Page 7: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

} public <T extends Comparable> int maxIndex(ArrayList<T> a) { int ret = 0; for (int i = 1; i < a.size(); i += 1) { if (a.get(ret).compareTo(a.get(i)) < 0) { ret = i; } } return ret; } public <T extends Comparable> T minValue(ArrayList<T> a) { return a.get(minIndex(a)); } public <T extends Comparable> int minIndex(ArrayList<T> a) { int ret = 0; for (int i = 1; i < a.size(); i += 1) { if (a.get(ret).compareTo(a.get(i)) > 0) { ret = i; } } return ret; }}

class KNN extends Classifier { private int kn; public KNN(Dataset d, int k) { super(d); canClassify = true; kn = k; }

private double distance(Sample a, Sample b) { double diff, sum = 0.0; for (int i = 0; i < a.length(); i += 1) { diff = a.get(i) - b.get(i); sum += Math.pow(diff, 2.0); } return Math.sqrt(sum); } @Override int train() { trainingNumCorrect = 0; trainingNumIncorrect = 0; trainingNumAmbiguous = 0; trainingSamples = dataset.trainingSamples(); classificationSamples = dataset.testingSamples(); trainingTime1 = new Date(); for (int i = 0; i < dataset.trainingSamples(); i += 1) { Sample testsample = dataset.trainingSample(i); Dataset datacopy = new Dataset(dataset); datacopy.trainingRemoveSample(i); ArrayList<Double> distances = new ArrayList<Double>(); ArrayList<Integer> classes = new ArrayList<Integer>(); ArrayList<Integer> classCtner = new ArrayList<Integer>(); for (int j = 0; j <= datacopy.maxClass(); j += 1) { classCtner.add(0); } for (int j = 0; j < datacopy.trainingSamples(); j += 1) { distances.add(distance(testsample, datacopy.trainingSample(j))); classes.add(datacopy.trainingSample(j).classNum()); } for (int j = 0; j < kn; j += 1) { double val = minValue(distances); int index = minIndex(distances); int c = classes.get(index);

Page 8: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

classCtner.set(c, classCtner.get(c) + 1); } if (testsample.classNum() == maxIndex(classCtner)) { trainingNumCorrect += 1; } else { trainingNumIncorrect += 1; } } trainingTime2 = new Date(); canClassify = true; return 0; }

@Override int classify() { if (!canClassify) { return -1; } classificationNumCorrect = 0; classificationNumIncorrect = 0; classificationNumAmbiguous = 0; classificationTime1 = new Date(); for (int i = 0; i < dataset.testingSamples(); i += 1) { ArrayList<Double> distances = new ArrayList<Double>(); ArrayList<Integer> classes = new ArrayList<Integer>(); ArrayList<Integer> classCtner = new ArrayList<Integer>(); for (int j = 0; j <= dataset.maxClass(); j += 1) { classCtner.add(0); } for (int j = 0; j < dataset.trainingSamples(); j += 1) { distances.add(distance(dataset.testingSample(i), dataset.trainingSample(j))); classes.add(dataset.trainingSample(j).classNum()); } for (int j = 0; j < kn; j += 1) { double val = minValue(distances); int index = minIndex(distances); int c = classes.get(index); classCtner.set(c, classCtner.get(c) + 1); } if (dataset.testingSample(i).classNum() == maxIndex(classCtner)) { classificationNumCorrect += 1; } else { classificationNumIncorrect += 1; } } classificationTime2 = new Date(); return 0; }}

class LDF extends Classifier { ArrayList<ArrayList> weightVectors; public LDF(Dataset d) { super(d); } Sample fixedIncrementSingleSamplePerceptron(Dataset data) { int d = data.testingDimensions(); int n = data.testingSamples(); int numClassified = 0; int k = 0; Sample a = new Sample(-1, d, 0.0); BigInteger bigN = new BigInteger(String.valueOf(n)); BigInteger glbCnt = new BigInteger("0"); BigInteger glbMod = new BigInteger("1000"); BigInteger glbMax = new BigInteger("100000"); glbMax = glbMax.multiply(bigN);

Page 9: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

while (numClassified < n && glbCnt.compareTo(glbMax) == -1) { Sample yk = data.trainingSample(k); double gx = a.multiply(yk); if (gx > 0.0) { numClassified += 1; } else { a = a.add(yk); numClassified = 0; } k = (k+1) % n; glbCnt = glbCnt.add(BigInteger.ONE); if (glbCnt.mod(glbMod) == BigInteger.ZERO) { System.out.println(String.valueOf(glbCnt) + " " + String.valueOf(n - numClassified)); } } return a; } ArrayList<ArrayList> oneVsOneSplit() { ArrayList<ArrayList> ret = new ArrayList<ArrayList>(); int max = dataset.maxClass(); for (int i = 1; i <= max; i += 1) { for (int j = 0; j < i; j += 1) { ArrayList slot = new ArrayList(); Dataset d = new Dataset(i, j, dataset); Sample a = fixedIncrementSingleSamplePerceptron(d); slot.add(i); slot.add(j); slot.add(a); ret.add(slot); } } return ret; } @Override int train() { trainingNumCorrect = 0; trainingNumIncorrect = 0; trainingNumAmbiguous = 0; trainingSamples = dataset.trainingSamples(); classificationSamples = dataset.testingSamples(); trainingTime1 = new Date(); weightVectors = oneVsOneSplit(); trainingTime2 = new Date(); canClassify = true; return 0; }

@Override int classify() { if (!canClassify) { return -1; } classificationNumCorrect = 0; classificationNumIncorrect = 0; classificationNumAmbiguous = 0; classificationTime1 = new Date(); int max = dataset.maxClass(); for (int i = 0; i < dataset.testingSamples(); i += 1) { ArrayList<Integer> ranks = new ArrayList(); for (int j = 0; j <= max; j += 1) { ranks.add(0); } Sample s = new Sample(dataset.testingSample(i)); s.prepend(); for (int j = 0; j < weightVectors.size(); j += 1) { ArrayList<Object> av = weightVectors.get(j); int c1 = (Integer)av.get(0); int c2 = (Integer)av.get(1); Sample a = (Sample)av.get(2);

Page 10: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

double gx = a.multiply(s); if (gx >= 0.0) { ranks.set(c1, ranks.get(c1) + 1); } else { ranks.set(c2, ranks.get(c2) + 1); } } int bestIndex = 0; boolean ambiguous = false; for (int j = 1; j < ranks.size(); j += 1) { if (ranks.get(bestIndex) < ranks.get(j)) { ambiguous = false; bestIndex = j; } else if (ranks.get(bestIndex) == ranks.get(j) && bestIndex != j) { ambiguous = true; } } if (ambiguous) { classificationNumAmbiguous += 1; } else if (bestIndex == dataset.testingSample(i).classNum()) { classificationNumCorrect += 1; } else { classificationNumIncorrect += 1; } } classificationTime2 = new Date(); return 0; }}

class DT extends Classifier { private class Tree { private class FeatureArrays { private class FeatureClassPair { private double featureVal; private int classNum; private int sampleNum; public FeatureClassPair(double f, int c, int s) { featureVal = f; classNum = c; sampleNum = s; } public double getFeatureVal() { return featureVal; } public int getClassNum() { return classNum; } public int getSampleNum() { return sampleNum; } public Comparator<FeatureClassPair> comparator() { return new Comparator<FeatureClassPair>() { @Override public int compare(FeatureClassPair s1, FeatureClassPair s2) { if (s1.featureVal == s2.featureVal) { return 0; } else if (s1.featureVal < s2.featureVal) {

Page 11: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

return -1; } else { return 1; } } }; } } private class FeatureClassArray { private ArrayList<FeatureClassPair> array; public FeatureClassArray() { array = new ArrayList(); } public int size() { return array.size(); } public FeatureClassPair get(int index) { return array.get(index); } public void add(double f, int c, int s) { array.add(new FeatureClassPair(f, c, s)); } public void sort() { Collections.sort(array, array.get(0).comparator()); } public ArrayList bestSplit() { if (array.isEmpty()) { return null; } ArrayList ret = new ArrayList(); ArrayList al = makeSplits(); ArrayList<Double> entropies = new ArrayList(); if (al.isEmpty()) { return null; } for (int i = 0; i < al.size(); i += 1) { ArrayList splitval = (ArrayList)al.get(i); if (splitval.isEmpty()) { entropies.add(-10.0); } FeatureClassArray left = (FeatureClassArray)splitval.get(0); FeatureClassArray right = (FeatureClassArray)splitval.get(1); entropies.add(entropyDelta(this, left, right)); } int index = maxIndex(entropies); ArrayList splitVal = (ArrayList)al.get(index); FeatureClassArray left = (FeatureClassArray)splitVal.get(0); FeatureClassArray right = (FeatureClassArray)splitVal.get(1); ret.add(entropies.get(index)); ret.add(left); ret.add(right); return ret; } private ArrayList makeSplits() { ArrayList ret = new ArrayList(); for (int i = 1; i < array.size() - 1; i += 1) { ret.add(split(i)); } return ret; } public ArrayList split(int index) { ArrayList ret = new ArrayList(); FeatureClassArray left = new FeatureClassArray();

Page 12: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

FeatureClassArray right = new FeatureClassArray(); for (int i = 0; i < array.size(); i += 1) { if (i < index) { left.add(array.get(i).featureVal, array.get(i).classNum, array.get(i).sampleNum); } else { right.add(array.get(i).featureVal, array.get(i).classNum, array.get(i).sampleNum); } } ret.add(left); ret.add(right); return ret; } private int maxClass() { int max = 0; for (int i = 0; i < array.size(); i += 1) { if (max < array.get(i).classNum) { max = array.get(i).classNum; } } return max; } private int samplesOfClass(int classNum) { int sum = 0; for (int i = 0; i < array.size(); i += 1) { if (classNum == array.get(i).classNum) { sum += 1; } } return sum; } private double entropyImpurity() { double sum = 0.0; int size = array.size(); double log, prob; for (int i = 0; i <= maxClass(); i += 1) { prob = (double)samplesOfClass(i) / (double)size; if (prob == 0.0) { log = 0.0; } else { log = prob * (Math.log(prob) / Math.log(2.0)); } sum += log; } sum *= -1; return sum; } private double entropyDelta(FeatureClassArray start, FeatureClassArray leftend, FeatureClassArray rightend) { int total = leftend.size() + rightend.size(); double leftprior = (double)leftend.size() / (double)total; double leftEntropy = leftend.entropyImpurity(); double rightprior = (double)rightend.size() / (double)total; double rightEntropy = rightend.entropyImpurity(); return start.entropyImpurity() - (leftprior * leftEntropy) - (rightprior * rightEntropy); } public double max() { int bestIndex = 0; for (int i = 1; i < size(); i += 1) { if (array.get(bestIndex).featureVal < array.get(i).featureVal) { bestIndex = 1; } } return array.get(bestIndex).featureVal; } public double min() { int bestIndex = 0;

Page 13: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

for (int i = 1; i < size(); i += 1) { if (array.get(bestIndex).featureVal > array.get(i).featureVal) { bestIndex = 1; } } return array.get(bestIndex).featureVal; } } private ArrayList<FeatureClassArray> array; public FeatureArrays(Dataset d) { array = new ArrayList(); convertDataset(d); } public FeatureArrays(ArrayList a) { array = a; } private void convertDataset(Dataset d) { for (int i = 0; i < d.trainingSamples(); i += 1) { for (int j = 0; j < d.trainingSample(i).size(); j += 1) { if (i == 0) { array.add(new FeatureClassArray()); } double s = d.trainingSample(i).get(j); int c = d.trainingSample(i).classNum(); array.get(j).add(s, c, i); } } for (int i = 0; i < array.size(); i += 1) { array.get(i).sort(); } } public ArrayList bestChoice() { ArrayList ret = new ArrayList(); int bestIndex = -1; int done = isDone(); if (done != -1) { ret.add(array.get(done).get(0).classNum); return ret; } for (int i = 0; i < array.size(); i += 1) { ret.add(array.get(i).bestSplit()); } for (int i = 0; i < ret.size(); i += 1) { if (ret.get(i) == null || ((ArrayList)ret.get(i)).isEmpty()) { continue; } if (bestIndex == -1) { bestIndex = i; continue; } ArrayList bestList = (ArrayList)ret.get(bestIndex); double bestEntropy = (Double)bestList.get(0); ArrayList list = (ArrayList)ret.get(i); double entropy = (Double)list.get(0); if (bestEntropy < entropy) { bestIndex = i; } } if (bestIndex == -1) { ret = new ArrayList(); ret.add(0); return ret; } ArrayList choice = (ArrayList)ret.get(bestIndex); FeatureClassArray left = (FeatureClassArray)choice.get(1);

Page 14: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

FeatureClassArray right = (FeatureClassArray)choice.get(2); int splitFeature = bestIndex; double splitValue = generateSplitValue(left, right);

ArrayList<FeatureClassArray> leftFeatures = new ArrayList(); ArrayList<FeatureClassArray> rightFeatures = new ArrayList(); for (int j = 0; j < array.size(); j += 1) { FeatureClassArray temp = array.get(j); if (j == bestIndex) { leftFeatures.add(left); rightFeatures.add(right); continue; } FeatureClassArray tempLeft = new FeatureClassArray(); FeatureClassArray tempRight = new FeatureClassArray(); for (int k = 0; k < temp.size(); k += 1) { for (int i = 0; i < left.size(); i += 1) { if (left.get(i).sampleNum == temp.get(k).sampleNum) { tempLeft.add(temp.get(k).featureVal, temp.get(k).sampleNum, temp.get(k).classNum); }

} for (int i = 0; i < right.size(); i += 1) { if (right.get(i).sampleNum == temp.get(k).sampleNum) { tempRight.add(temp.get(k).featureVal, temp.get(k).sampleNum, temp.get(k).classNum); } } } leftFeatures.add(tempLeft); rightFeatures.add(tempRight); } ArrayList ret2 = new ArrayList(); ret2.add(splitFeature); ret2.add(splitValue); ret2.add(new FeatureArrays(leftFeatures)); ret2.add(new FeatureArrays(rightFeatures)); return ret2; } private int isDone() { for (int i = 0; i < array.size(); i += 1) { if (array.get(i).entropyImpurity() == 0.0) { return i; } } return -1; } private double generateSplitValue(FeatureClassArray left, FeatureClassArray right) { return (left.max() - right.min()) / 2.0; } } private Tree left; private Tree right; private int splitFeature; private double splitValue; private int classToChoose; public Tree(Dataset d) { FeatureArrays fa = new FeatureArrays(d); setupTree(fa); } private Tree(FeatureArrays fa) { setupTree(fa); } public void setupTree(FeatureArrays fa) { left = null;

Page 15: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

right = null; splitFeature = -1; splitValue = -1; classToChoose = -1; ArrayList best = fa.bestChoice(); if (best.size() == 1) { classToChoose = (Integer)best.get(0); } else { splitFeature = (Integer)best.get(0); splitValue = (Double)best.get(1); left = new Tree((FeatureArrays)best.get(2)); right = new Tree((FeatureArrays)best.get(3)); } } } private Tree root; public DT(Dataset d) { super(d); } @Override int train() { trainingNumCorrect = 0; trainingNumIncorrect = 0; trainingNumAmbiguous = 0; trainingSamples = dataset.trainingSamples(); classificationSamples = dataset.testingSamples(); trainingTime1 = new Date(); root = new Tree(new Dataset(dataset)); trainingTime2 = new Date(); canClassify = true; return 0; }

private int classifyPoint(Tree t, Sample s) { if (t.splitFeature == -1) { return t.classToChoose; } else { if (s.get(t.splitFeature) < t.splitValue) { return classifyPoint(t.left, s); } else { return classifyPoint(t.right, s); } } } @Override int classify() { if (!canClassify) { return -1; } classificationNumCorrect = 0; classificationNumIncorrect = 0; classificationNumAmbiguous = 0; classificationTime1 = new Date(); int choosenClass; for (int i = 0; i < dataset.testingSamples(); i += 1) { choosenClass = classifyPoint(root, dataset.testingSample(i)); if (choosenClass == dataset.testingSample(i).classNum()) { classificationNumCorrect += 1; } else { classificationNumIncorrect += 1; } } classificationTime2 = new Date(); return 0;

Page 16: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

}}

//Plot.javapackage PatternRecognition;

import java.awt.Color;import java.awt.Graphics;import java.awt.Graphics2D;import java.awt.RenderingHints;import java.awt.geom.Ellipse2D;import java.awt.geom.Line2D;import java.util.ArrayList;import javax.swing.JPanel;

public class Plot extends JPanel { final Color lightblue = new Color(0, 0, 200, 100); final Color blue = new Color(0, 0, 200, 200); final Color lightred = new Color(200, 0, 0, 100); final Color red = new Color(200, 0, 0, 200); final Color lightgreen = new Color(0, 200, 0, 100); final Color green = new Color(0, 200, 0, 200); final Color lightyellow = new Color(200, 200, 0, 100); final Color yellow = new Color(200, 200, 0, 200); final Color lightpurple = new Color(200, 0, 200, 100); final Color purple = new Color(200, 0, 200, 200); Color[] colors = { lightblue, lightred, lightgreen, lightyellow, lightpurple, purple, yellow, green, red, blue }; ArrayList<Double> x; ArrayList<Double> y; ArrayList<Integer> c; public Plot(ArrayList list) { c = (ArrayList)list.get(0); x = (ArrayList)list.get(1); y = (ArrayList)list.get(2); setSize(100, 100); repaint(); } public void changeDataset(ArrayList list) { c = (ArrayList)list.get(0); x = (ArrayList)list.get(1); y = (ArrayList)list.get(2); repaint(); } protected void paintComponent(Graphics g) { super.paintComponent(g); Graphics2D g2 = (Graphics2D)g; g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON); int w = getWidth(); int h = getHeight(); int midx = w / 2; int midy = h / 2; g2.draw(new Line2D.Double(midx, 0, midx, h)); g2.draw(new Line2D.Double(0, midy, w, midy)); for(int i = 0; i < c.size(); i++) { int ci = c.get(i); if (ci >= 0) { g2.setPaint(colors[ci]); } else { ci = colors.length + ci; g2.setPaint(colors[ci]); } g2.fill(new Ellipse2D.Double(x.get(i)-2 + midx, midy - y.get(i)-2, 7, 7)); } }

Page 17: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

private double getMax() { double max = -Integer.MAX_VALUE; for(int i = 0; i < x.size(); i++) { if (x.get(i) > max) { max = x.get(i); } if (y.get(i) > max) { max = y.get(i); } } return max; } private double getMin() { double min = Integer.MAX_VALUE; for(int i = 0; i < x.size(); i++) { if (x.get(i) > min) { min = x.get(i); } if (y.get(i) > min) { min = y.get(i); } } return min; }}

//Dataset.javapackage PatternRecognition;

import java.io.BufferedReader;import java.io.File;import java.io.FileNotFoundException;import java.io.FileReader;import java.util.ArrayList;import java.util.Collections;

public class Dataset { final String SEPARATOR = ", "; public ArrayList<Sample> trainingSet; public ArrayList<Sample> testingSet; public Dataset() { trainingSet = new ArrayList<Sample>(); testingSet = new ArrayList<Sample>(); } public Dataset(Dataset d) { this.trainingSet = new ArrayList<Sample>(); this.testingSet = new ArrayList<Sample>(); for (int i = 0; i < d.trainingSet.size(); i += 1) { this.trainingSet.add(new Sample(d.trainingSet.get(i))); } for (int i = 0; i < d.testingSet.size(); i += 1) { this.testingSet.add(new Sample(d.testingSet.get(i))); } } public Dataset(int class1, int class2, Dataset d) { this.trainingSet = new ArrayList<Sample>(); for (int i = 0; i < d.trainingSet.size(); i += 1) { if (d.trainingSet.get(i).classNum() == class1) { this.trainingSet.add(new Sample(d.trainingSet.get(i))); this.trainingSet.get(this.trainingSet.size()-1).prepend(); } else if (d.trainingSet.get(i).classNum() == class2) { this.trainingSet.add(new Sample(d.trainingSet.get(i))); this.trainingSet.get(this.trainingSet.size()-1).prepend(); this.trainingSet.get(this.trainingSet.size()-1).invert(); }

Page 18: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

} this.testingSet = new ArrayList<Sample>(); for (int i = 0; i < d.testingSet.size(); i += 1) { if (d.testingSet.get(i).classNum() == class1) { this.testingSet.add(new Sample(d.testingSet.get(i))); this.testingSet.get(this.testingSet.size()-1).prepend(); } else if (d.testingSet.get(i).classNum() == class2) { this.testingSet.add(new Sample(d.testingSet.get(i))); this.testingSet.get(this.testingSet.size()-1).prepend(); this.testingSet.get(this.testingSet.size()-1).invert(); } } }

private Dataset(ArrayList<Sample> train, ArrayList<Sample> test) { trainingSet = train; testingSet = test; } public ArrayList prepareToPlot() { //only works for 2D samples ArrayList ret = new ArrayList(); ArrayList x = new ArrayList(); ArrayList y = new ArrayList(); ArrayList c = new ArrayList(); for (int i = 0; i < trainingSamples(); i += 1) { c.add(trainingSample(i).classNum()); x.add(trainingSample(i).get(0)); y.add(trainingSample(i).get(1)); } for (int i = 0; i < testingSamples(); i += 1) { c.add(0 - testingSample(i).classNum() - 1); x.add(testingSample(i).get(0)); y.add(testingSample(i).get(1)); } ret.add(c); ret.add(x); ret.add(y); return ret; } public int minClass() { int min = Integer.MAX_VALUE; for (int i = 0; i < trainingSet.size(); ++i) { if (min > trainingSet.get(i).classNum()) { min = trainingSet.get(i).classNum(); } } for (int i = 0; i < testingSet.size(); ++i) { if (min > testingSet.get(i).classNum()) { min = testingSet.get(i).classNum(); } } return min; } public int maxClass() { int max = Integer.MIN_VALUE; for (int i = 0; i < trainingSet.size(); ++i) { if (max < trainingSet.get(i).classNum()) { max = trainingSet.get(i).classNum(); } } for (int i = 0; i < testingSet.size(); ++i) { if (max < testingSet.get(i).classNum()) { max = testingSet.get(i).classNum(); } } return max; }

Page 19: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

public void resort(int d) { if (d == 0) { Collections.sort(trainingSet, Sample.classComparator()); } else { Collections.sort(trainingSet, Sample.dnComparator(d)); } } public Dataset trainingRemoveSamples(ArrayList<Integer> sampleIndexes) { ArrayList<Sample> test = (ArrayList<Sample>)testingSet.clone(); ArrayList<Sample> train = (ArrayList<Sample>)trainingSet.clone(); Collections.sort(sampleIndexes); Collections.reverse(sampleIndexes); for (int i = 0; i < sampleIndexes.size(); i += 1) { train.remove(sampleIndexes.get(i).intValue()); } Dataset d = new Dataset(train, test); return d; } public void trainingRemoveSample(int sampleIndex) { trainingSet.remove(sampleIndex); } public int trainingSamples() { return trainingSet.size(); } public Sample trainingSample(int index) { return trainingSet.get(index); } public int trainingDimensions() { //assumes each of the samples contain an equal number of items return trainingSet.get(0).length(); } public double trainingPoint(int x, int d) { return trainingSet.get(x).get(d); } public int testingSamples() { return testingSet.size(); } public Sample testingSample(int index) { return testingSet.get(index); } public int testingDimensions() { return testingSet.get(0).length(); } public double testingPoint(int x, int d) { return testingSet.get(x).get(d); }}

//Sample.javapackage PatternRecognition;

import static java.lang.System.exit;import java.util.ArrayList;import java.util.Comparator;

public class Sample { private int classNumber; private ArrayList<Double> dimensions; public Sample(Sample s) {

Page 20: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

this.classNumber = s.classNumber; this.dimensions = new ArrayList<Double>(); for (int i = 0; i < s.dimensions.size(); i += 1) { this.dimensions.add(s.dimensions.get(i)); } } public Sample(int c) { dimensions = new ArrayList<Double>(); classNumber = c; } public Sample(int c, int d, double v) { dimensions = new ArrayList<Double>(); classNumber = c; for (int i = 0; i < d; i += 1) { dimensions.add(v); } } public Sample(int c, double s1, double s2) { //cheat definition for this project which is only going to use 2D samples dimensions = new ArrayList<Double>(); classNumber = c; dimensions.add(s1); dimensions.add(s2); } public int classNum() { return classNumber; } public void add(double x) { dimensions.add(x); } public void prepend() { dimensions.add(0.0); for (int i = dimensions.size()-1; i > 0; i -= 1) { dimensions.set(i, dimensions.get(i-1)); } } public void invert() { for (int i = 0; i < dimensions.size(); i += 1) { dimensions.set(i, dimensions.get(i) * -1); } } public int size() { return dimensions.size(); } public int length() { return size(); } public double get(int index) { if (index < 0 || index >= length()) { return 0; } return dimensions.get(index); } public void set(int index, double value) { if (index >= 0 && index < length()) { dimensions.set(index, value); } } public double max() { double max = Double.MIN_VALUE; for (int i = 0; i < length(); ++i) { if (max < dimensions.get(i)) {

Page 21: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

max = dimensions.get(i); } } return max; } public double min() { double min = Double.MAX_VALUE; for (int i = 0; i < length(); ++i) { if (min < dimensions.get(i)) { min = dimensions.get(i); } } return min; } public double mean() { double ret = 0.0; for (int i = 0; i < length(); ++i) { ret += dimensions.get(i); } ret /= length(); return ret; } public double range() { return (max() - min()); }

public Sample add(Sample s) { Sample ret = new Sample(this.classNumber, this.size(), 0.0); for (int i = 0; i < size(); i += 1) { ret.set(i, this.get(i) + s.get(i)); } return ret; } public double multiply(Sample s) { if (size() != s.size()) { exit(-1); } double ret = 0.0; for (int i = 0; i < s.size(); i += 1) { ret += this.get(i) * s.get(i); } return ret; } static Comparator<Sample> classComparator() { return new Comparator<Sample>() { @Override public int compare(Sample s1, Sample s2) { if (s1.classNum() == s2.classNum()) { return 0; } else if (s1.classNum() < s2.classNum()) { return -1; } else { return 1; } } }; }

static Comparator<Sample> dnComparator(final int d) { return new Comparator<Sample>() { @Override public int compare(Sample s1, Sample s2) { if (s1.get(d) == s2.get(d)) { return 0; } else if (s1.get(d) < s2.get(d)) {

Page 22: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

return -1; } else { return 1; } } }; } }

//DatasetFactory.javapackage PatternRecognition;

import java.util.Random;

public class DatasetFactory { //All datasets have about 100 elements split evenly amoung classes in the training set //All datasets have about 20 elements split evenly amoung classes in the testing set static Dataset TwoClass() { Dataset d = new Dataset(); for (int x = 1; x <= 7; x += 1) { for (int y = 1; y <= 7; y += 1) { d.trainingSet.add(new Sample(0, x * 10.0, y * 10.0)); d.trainingSet.add(new Sample(1, x * -10.0, y * -10.0)); } } for (int x = 3; x <= 8; x += 1) { for (int y = 4; y <= 6; y += 2) { d.testingSet.add(new Sample(0, x * 7.0, y * 7.0)); d.testingSet.add(new Sample(1, x * -7.0, y * -7.0)); } } return d; } static Dataset FourClass() { Dataset d = new Dataset(); for (int x = 1; x <= 5; x += 1) { for (int y = 1; y <= 5; y += 1) { d.trainingSet.add(new Sample(0, x * 15.0, y * 15.0)); d.trainingSet.add(new Sample(1, x * -15.0, y * 15.0)); d.trainingSet.add(new Sample(2, x * 15.0, y * -15.0)); d.trainingSet.add(new Sample(3, x * -15.0, y * -15.0)); } } for (int x = 3; x <= 4; x += 1) { for (int y = 3; y <= 4; y += 1) { d.testingSet.add(new Sample(0, x * 11.0, y * 11.0)); d.testingSet.add(new Sample(1, x * -11.0, y * 11.0)); d.testingSet.add(new Sample(2, x * 11.0, y * -11.0)); d.testingSet.add(new Sample(3, x * -11.0, y * -11.0)); } } return d; } static Dataset FiveClass() { Dataset d = new Dataset(); for (int x = -2; x <= 1; x += 1) { for (int y = -2; y <= 2; y += 1) { d.trainingSet.add(new Sample(0, x * 9.0, y * 9.0)); } } for (int x = 3; x <= 6; x += 1) { for (int y = 3; y <= 7; y += 1) { d.trainingSet.add(new Sample(1, x * 11.0, y * 13.0)); d.trainingSet.add(new Sample(2, x * -13.0, y * 11.0)); d.trainingSet.add(new Sample(3, x * 11.0, y * -11.0));

Page 23: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

d.trainingSet.add(new Sample(4, x * -13.0, y * -13.0)); } }

for (int x = -2; x <= 1; x += 1) { for (int y = 0; y <= 0; y += 1) { d.testingSet.add(new Sample(0, x * 7.0, y * 7.0)); } } for (int x = 4; x <= 6; x += 1) { for (int y = 4; y <= 5; y += 1) { d.testingSet.add(new Sample(1, x * 9.0, y * 11.0)); d.testingSet.add(new Sample(2, x * -11.0, y * 9.0)); d.testingSet.add(new Sample(3, x * 9.0, y * -9.0)); d.testingSet.add(new Sample(4, x * -11.0, y * -11.0)); } } return d; } static Dataset TwoClassBlur() { Dataset d = new Dataset(); for (int x = -1; x <= 5; x += 1) { for (int y = -3; y <= 3; y += 1) { d.trainingSet.add(new Sample(0, x * 15.0, y * 15.0)); d.trainingSet.add(new Sample(1, x * -13.0, y * 13.0)); } } for (int x = 1; x <= 2; x += 1) { for (int y = -2; y <= 2; y += 1) { d.testingSet.add(new Sample(0, x * 11.0, y * 11.0)); d.testingSet.add(new Sample(1, x * -7.0, y * 7.0)); } } return d; } static Dataset FourClassBlur() { Dataset d = new Dataset(); for (int x = -1; x <= 3; x += 1) { for (int y = -1; y <= 3; y += 1) { d.trainingSet.add(new Sample(0, x * 15.0, y * 15.0)); d.trainingSet.add(new Sample(1, x * -15.0, y * 15.0)); d.trainingSet.add(new Sample(2, x * 15.0, y * -15.0)); d.trainingSet.add(new Sample(3, x * -15.0, y * -15.0)); } } for (int x = 0; x <= 2; x += 1) { for (int y = 1; y <= 2; y += 1) { d.testingSet.add(new Sample(0, x * 11.0, y * 11.0)); d.testingSet.add(new Sample(1, x * -7.0, y * 7.0)); d.testingSet.add(new Sample(2, x * 13.0, y * -13.0)); d.testingSet.add(new Sample(3, x * -9.0, y * -9.0)); } } return d; } static Dataset TwoClassFar() { Dataset d = new Dataset(); for (int x = 1; x <= 7; x += 1) { for (int y = 1; y <= 7; y += 1) { d.trainingSet.add(new Sample(0, x * 5.0, y * 5.0)); d.trainingSet.add(new Sample(1, x * -5.0, y * -5.0)); } } for (int x = 6; x <= 8; x += 1) { for (int y = 6; y <= 8; y += 1) { d.testingSet.add(new Sample(0, x * 10.0, y * 10.0)); d.testingSet.add(new Sample(1, x * -10.0, y * -10.0));

Page 24: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

} } return d; } static Dataset TwoClassWrong() { Dataset d = new Dataset(); for (int x = 1; x <= 7; x += 1) { for (int y = 1; y <= 7; y += 1) { d.trainingSet.add(new Sample(0, x * 10.0, y * 10.0)); d.trainingSet.add(new Sample(1, x * -10.0, y * -10.0)); } } for (int x = 1; x <= 5; x += 1) { for (int y = 4; y <= 5; y += 1) { d.testingSet.add(new Sample(1, x * 7.0, y * 7.0)); d.testingSet.add(new Sample(0, x * -7.0, y * -7.0)); } } return d; } static Dataset TwoClassRandomBias() { Dataset d = new Dataset(); Random r = new Random(); int xsign, ysign; double x, y; for (int c = 0; c <= 1; c += 1) { if (c == 0) { xsign = 1; ysign = 1; } else { xsign = -1; ysign = -1; } for (int i = 0; i < 50; i += 1) { x = r.nextDouble(); y = r.nextDouble(); d.trainingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); } for (int i = 0; i < 10; i += 1) { x = r.nextDouble(); y = r.nextDouble(); d.testingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); } } return d; } static Dataset FourClassRandomBias() { Dataset d = new Dataset(); Random r = new Random(); int xsign, ysign; double x, y; for (int c = 0; c < 4; c += 1) { if (c == 0 || c == 2) { xsign = 1; } else { xsign = -1; } if (c == 0 || c == 1) { ysign = 1; } else { ysign = -1; } for (int i = 0; i < 25; i += 1) {

Page 25: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

x = r.nextDouble(); y = r.nextDouble(); d.trainingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); } for (int i = 0; i < 5; i += 1) { x = r.nextDouble(); y = r.nextDouble(); d.testingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); } } return d; } static Dataset TwoClassTotalRandom() { Dataset d = new Dataset(); Random r = new Random(); int xsign, ysign; double x, y; for (int c = 0; c <= 1; c += 1) { for (int i = 0; i < 50; i += 1) { xsign = r.nextInt(2); ysign = r.nextInt(2); if (xsign == 0) { xsign = -1; } if (ysign == 0) { ysign = -1; } x = r.nextDouble(); y = r.nextDouble(); d.trainingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); } for (int i = 0; i < 10; i += 1) { xsign = r.nextInt(2); ysign = r.nextInt(2); if (xsign == 0) { xsign = -1; } if (ysign == 0) { ysign = -1; } x = r.nextDouble(); y = r.nextDouble(); d.testingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); } } return d; } static Dataset FiveClassTotalRandom() { Dataset d = new Dataset(); Random r = new Random(); int xsign, ysign; double x, y; for (int c = 0; c < 5; c += 1) { for (int i = 0; i < 20; i += 1) { xsign = r.nextInt(2); ysign = r.nextInt(2); if (xsign == 0) { xsign = -1; } if (ysign == 0) { ysign = -1; } x = r.nextDouble(); y = r.nextDouble(); d.trainingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); }

Page 26: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

for (int i = 0; i < 4; i += 1) { xsign = r.nextInt(2); ysign = r.nextInt(2); if (xsign == 0) { xsign = -1; } if (ysign == 0) { ysign = -1; } x = r.nextDouble(); y = r.nextDouble(); d.testingSet.add(new Sample(c, x * 100 * xsign, y * 100 * ysign)); } } return d; }}

Page 27: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,

Sample Run

Page 28: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,
Page 29: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,
Page 30: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,
Page 31: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,
Page 32: CAP5638 Term Project Report - Florida State …ww2.cs.fsu.edu/~dennis/projects/cap5638/report.pdfCAP5638 Term Project Report I created a Java application that runs k-nearest neighbor,