neural networks and machine learning

Upload: aditya-agarwal

Post on 02-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Neural Networks and Machine Learning

    1/25

    - 0 -

    8/12/2014 Neural Networks and

    Machine LearningSelf-study Seminar Report

    Aditya Agarwal2K13/SE/007

  • 8/10/2019 Neural Networks and Machine Learning

    2/25

    Certificate

    DEPARTMENT OF SOFTWARE

    ENGINEERING

    This is to certify that this Seminar report entitled Neural Networks and Machine

    Learning submitted by Aditya Agarwal (2K13/SE/007)in partial fulfillment for the

    requirements for the award of Bachelor of Technology Degree in Software Engineering

    (SE) at Delhi Technological University is an authentic work carried out by the student

    under my supervision and guidance.

    To the best of my knowledge, the matter embodied in the report has not been submitted to

    any other university or institute for the award of any degree or diploma.

    Ms. Kusum Lata

    (Assistant Professor)

    Dept. of Computer Engineering

    Delhi Technological University

    Place: DTU, Bawana Road, Delhi-110042

    Date: 08/12/2014

  • 8/10/2019 Neural Networks and Machine Learning

    3/25

    Acknowledgement

    The successful completion of any task would be incomplete without accomplishing the

    people who made it possible and whose constant guidance and encouragement secured me

    the success.

    First of all, I am grateful to the Almighty for establishing me to complete this self-study

    assignment. I owe a debt to our faculty, Ms. Kusum Lata (Assistant Professor, COE

    Department) for incorporating in me the idea of a creative self-study project, helping me in

    undertaking this project and also for being there whenever I needed her assistance.

    I also place on record, our sense of gratitude to one and all, who directly or indirectly have

    lent their helping hand in this venture.

    Last, but never the least, I thank my parents for being with me, in every sense.

  • 8/10/2019 Neural Networks and Machine Learning

    4/25

    Abstract

    The goal of the field of Machine Learningis to build computer systems that learn from

    experience and that are capable to adapt to their environments. Learning techniques and

    methods developed by researchers in this field have been successfully applied to a variety

    of learning tasks in a broad range of areas, including, for example, text classification, gene

    discovery, financial forecasting, credit card fraud detection, collaborative filtering, design

    of adaptive web agents and others.

    Neural Networksare an innovation in the field of machine learning and Artificial

    Intelligence that was originally motivated by the goal of having machines that can mimicthe brain. A Neural Network is the representation of brain's learning approach. This brain

    operates as multiprocessor and has excellent interlinked. Neural Network also can be

    represented as "Parallel distributed processing" planning.

    Neural Networks came to be very widely used throughout the 1980's and 1990's and for

    various reasons as popularity diminished in the late 90's. But more recently, Neural

    Networks have had a major recent resurgence because maybe somewhat more recently that

    computers became fast enough to really run large scale Neural Networks and because of

    that as well as a few other technical reasons which we'll talk about later, modern Neural

    Networks today are the state of the art technique for many applications like speech

    recognition, text detection etc. Digit Recognitionis an application of Neural Networks

    which has been dealt with in this project.

  • 8/10/2019 Neural Networks and Machine Learning

    5/25

    Table of Contents

    S.No. Topic Page No.

    1 Chapter 1

    Introduction

    Machine Learning

    Supervised Learning

    Unsupervised Learning

    Neural Networks

    1

    1

    2

    3

    4

    2 Chapter 2

    Literature Survey

    6

    3 Chapter 3

    Discussion

    Model Representation

    Architecture

    Algorithms

    Hand-written digit recognition

    Other applications

    7

    7

    9

    12

    14

    18

    4 Conclusion 19

    5 References 20

  • 8/10/2019 Neural Networks and Machine Learning

    6/25

    Chapter-1

    Introduction

    Machine Learning

    Machine Learning is the field of study that gives computers the ability to learn without

    being explicitly programmed.

    A computer program is said to learnfrom experience E with respect to some task T and

    some performance measure P, if its performance on T, as measured by P, improves with

    experience E.

    Various examples and applications exist:

    Database mining

    Large datasets from growth of automation/web.

    E.g., Web click data, medical records, biology, engineering

    Applications cant program by hand.

    E.g., Autonomous helicopter, handwriting recognition, most of Natural Language

    Processing (NLP), Computer Vision.

    Self-customizing programs

    E.g., Amazon, Netflix product recommendations

    Understanding human learning (brain, real AI).

    Two types of learning:

    Supervised learning

    Unsupervised learning

  • 8/10/2019 Neural Networks and Machine Learning

    7/25

    Supervised learning

    The term supervised learning refers to the fact that we gave the algorithm a data set in

    which the "right answers" were given. Such a data set is commonly called a training data

    set.

    Two types of supervised learning problems:

    Regression problem, that means that our goal is to predict a continuous valued

    output. Let's say you want to predict housing prices by collecting and plotting data

    of price vs features of house. The learning algorithm might be able to do is put a

    straight line through the data and use it to predict the price for new house.

    Classification problem (Logistic regression), where the goal is to predict a

    discrete value output.

    Let's say you want to look at medical records and try to predict of a breast cancer as

    malignant or benign. The past medical records help to get a discrete output-

    malignant or benign.

  • 8/10/2019 Neural Networks and Machine Learning

    8/25

    Unsupervised learning

    In machine learning, the problem of unsupervised learning is that of trying to find hidden

    structure in unlabeled data. Since the examples given to the learner are unlabeled, there is

    no error or reward signal to evaluate a potential solution. This distinguishes unsupervised

    learning from supervised learning and reinforcement learning.

    Approaches to unsupervised learning include:

    Clustering (e.g., k-means, mixture models, hierarchical clustering)

    Hidden Markov models,

    Blind signal separation using feature extraction techniques for dimensionality

    reduction (e.g., principal component analysis, independent component analysis,

    non-negative matrix factorization, singular value decomposition)

    For example,

    Clusteringis used is in Google News and if you have not seen this before, you can

    actually go to this URL news.google.com to take a look. What Google News does is

    everyday it goes and looks at tens of thousands or hundreds of thousands of new stories on

    the web and it groups them into cohesive news stories.

    Similarly, an example of DNA microarray data is with the idea is put a group of different

    individuals and for each of them, you measure how much they do or do not have a certain

    gene.

  • 8/10/2019 Neural Networks and Machine Learning

    9/25

    Neural Networks

    In computer science, artificial neural networks (ANNs) are forms of computer architecture

    inspired by biological neural networks (the central nervous systems of animals, in

    particular the brain) and are used to estimate or approximate functions that can depend on a

    large number of inputs and are generally unknown. Artificial neural networks are generally

    presented as systems of interconnected "neurons" which can compute values from inputs,

    and are capable of machine learning as well as pattern recognition thanks to their adaptive

    nature.

    Examinations of the human's centralnervoussysteminspired the concept of neural

    networks. In an Artificial Neural Network, simple artificial nodes, known as "neurons",

    "neurodes", "processing elements" or "units", are connected together to form a network

    which mimics a biological neural network.

    There is no single formal definition of what an artificial neural network is. However, a class

    of statistical models may commonly be called "Neural" if they possess the following

    characteristics:

    1. It consist of sets of adaptiveweights, i.e. numerical parameters that are tuned by a

    learning algorithm, and2. They are capable of approximatingnon-linear functions of their inputs.

    Non-Linear Hypothesis

  • 8/10/2019 Neural Networks and Machine Learning

    10/25

    For many machine learning problems, number of features, n will be pretty large. For

    example, consider the problem of computer vision on the previous page.

    So, if we were to try to learn a nonlinear hypothesis by including all the quadratic features,

    that is all the terms of the form, you know, Xi times Xj, while with the 2500 pixels we

    would end up with a total of three million features. And that's just too large to be

    reasonable; the computation would be very expensive to find and to represent all of these

    three million features per training example.

    So, simple logistic regression together with adding in maybe the quadratic or the cubic

    features - that's just not a good way to learn complex nonlinear hypotheses when n is large

    because you just end up with too many features.

    The problem can be stated that it is difficult to design an algorithm to do what the brain

    does even when features are large. The solution is hence to model the brain itself.

  • 8/10/2019 Neural Networks and Machine Learning

    11/25

    Chapter-2

    L iterature Survey

    Warren McCulloch and Walter Pitts (1943) created a computational model for neural

    networks based on mathematics and algorithms. They called this model threshold logic.

    The model paved the way for neural network research to split into two distinct approaches.

    One approach focused on biological processes in the brain and the other focused on the

    application of neural networks to artificial intelligence.

    Frank Rosenblatt (1958) created the perceptron, an algorithm for pattern recognition

    based on a two-layer learning computer network using simple addition and subtraction.With mathematical notation, Rosenblatt also described circuitry not in the basic

    perceptron, such as the exclusive-or circuit, a circuit whose mathematical computation

    could not be processed until after the back propagation algorithmwas created by Paul

    Werbos (1975).

    In the 1990s, neural networks were overtaken in popularity in machine learning by support

    vector machines and other, much simpler methods such as linear classifiers. Renewed

    interest in neural nets was sparked in the 2000s by the advent of deep learning.

    Between 2009 and 2012, the recurrent neural networks and deep feed forward neural

    networks developed in the research group of Jrgen Schmidhuber at the Swiss AI Lab

    IDSIA have won eight international competitions in pattern recognition and machine

    learning.

    Such neural networks also were the first artificial pattern recognizers to achieve human-

    competitive or even superhuman performance on benchmarks such as traffic sign

    recognition (IJCNN 2012), or the MNIST handwritten digits problem of Yann LeCun and

    colleagues at NYU.

    This work is in direct correspondence to the recent multi layered neural network

    architecture, and its algorithms and applications in handwritten digit recognition.

  • 8/10/2019 Neural Networks and Machine Learning

    12/25

    Chapter-3

    Discussion

    Model Representation

    These are called neuro-rewiring experiments. There's this sense that if the same piece ofphysical brain tissue can process sight or sound or touch then maybe there is one learning

    algorithm that can process sight or sound or touch.

    And instead of needing to implement a thousand different programs or a thousand different

    algorithms to do, you know, the

    thousand wonderful things that

    the brain does, maybe what we

    need to do is figure out some

    approximation or to whatever

    the brain's learning algorithm is

    and implement that and that the

    brain learned by itself how to

    process these different types of

    data.

  • 8/10/2019 Neural Networks and Machine Learning

    13/25

    This is logistic model of a neuronwith x1, x2 and x3 being the three features, x0 being

    the bias unit equal to 1 and h(x)is the sigmoid (logistic) activation functionthat uses

    the feature vector x and the parameter vector . Here, 0, 1, 2, 3 are the parameters

    or the weights assigned to x0, x1, x2, x3 respectively.

    Above is different neurons strung together called a neural network.

    The first layer, also called the input layeris where we input our features, x1 x2 x3.

    The final layer also called the output layerhas the neuron that outputs the final value

    computed by a hypotheses.

    The layer two in between, is called the hidden layerin which the neurons represent the

    features learnt by the neural network from the input features and the learnt parameters.

  • 8/10/2019 Neural Networks and Machine Learning

    14/25

    Architecture

    If network has sjunits in layer j, sj+1units in layer j+1, then (j)will be of dimension

    sj+1x (sj+1).

    There are various architectures of neural networks possible:

    Feed-forward neural networks

    These are the commonest type of neural network in practical applications.

    The first layer is the input and the last layer is the output. If there is more than one hidden layer, we call them deep neural networks.

    They compute a series of transformations that change the similarities between

    cases.

    The activities of the neurons in each layer are a non-linear function of the

    activities in the layer below.

  • 8/10/2019 Neural Networks and Machine Learning

    15/25

    Recurrent Networks

    These have directed cycles in their connection graph.

    That means you can sometimes get back to where you started by following

    the arrows.

    They can have complicated dynamics and this can make them very difficult to train.

    There is a lot of interest at present in finding efficient ways of training

    recurrent nets.

    They are more biologically realistic.

    Symmetrically connected networks

    These are like recurrent networks, but the connections between units are

    symmetrical (they have the same weight in both directions). John Hopfield (and others) realized that symmetric networks are much

    easier to analyze than recurrent networks.

    They are also more restricted in what they can do because they obey an

    energy function.

    For example, they cannot model cycles.

    Symmetrically connected nets without hidden units are called Hopfield nets.

  • 8/10/2019 Neural Networks and Machine Learning

    16/25

  • 8/10/2019 Neural Networks and Machine Learning

    17/25

    Algorithms

    Forward propagation algorithm

    The process of computing h(x) is called forward propagation where we start off with the

    activations of the input-units and then we sort of forward-propagate that to the hidden layer

    and compute the activations of the hidden layer and then of the output layer. A vector wise

    implementation of this procedure is given below.

    1. Calculation of activations

    2. Vectorisation of input features and the activations

    3. Forward propagation step of calculating hidden layers from input layer and output

    layer from the last hidden layer using sigmoid activation function.

  • 8/10/2019 Neural Networks and Machine Learning

    18/25

    Back propagation algorithm

    The main objective is to find parameters theta to try to minimize the cost function j (theta)

    in order to use either gradient descent or one of the advance optimization algorithms.

    Follwing are the steps:

    1. First convert the discrepancy between each output and its target value into an error

    derivative.

    2. Then compute error derivatives in each hidden layer from error derivatives in the

    layer above.

    3. Then use error derivatives w.r.t. activities to get error derivatives w.r.t. the

    incoming weights.

    4. Finally use gradient descent or any other technique to minimize the error cost

    function.

  • 8/10/2019 Neural Networks and Machine Learning

    19/25

    Handwritten Digit Recognition

    We can use multi-class logistic regression to recognize handwritten digits. However,

    logistic regression cannot form more complex hypotheses as it is only a linear classifier.

    Thus one can implement a neural network to recognize handwritten digits using the

    MNIST database of handwritten digits. The neural network will be able to represent

    complex models that form non-linear hypotheses. One goal is to implement the feed-

    forward propagation algorithm to use already given weights for prediction. Next goal is to

    write the back-propagation algorithm for learning the neural network parameters.

    Model representation

    Our neural network is shown in Figure 2. It has 3

    layers: an input layer, a hidden layer and an output

    layer. Our inputs are pixel values of digit images. Since

    the images are of size 20x20, this gives us 400 input

    layer units (excluding the extra bias unit which always

    outputs +1). There are 5000 training examples in

    ex3data1.mat. Each pixel is represented by a floating

    point number indicating the grayscale intensity at that

    location. The 20 by 20 grid of pixels is unrolled into a 400-dimensional vector. Each of

    these training examples becomes a single row in our data matrix X. This gives us a 5000

    by 400 matrix X where every row is a training example for a handwritten digit image.

    The second part of the training set is a 5000-dimensional vector y that contains labels for

    the training set. We have mapped the digit zero to the value ten, while the digits \1" to \9"

    are labeled as 1 to9 in their natural order.

  • 8/10/2019 Neural Networks and Machine Learning

    20/25

    Feed-forward Propagation and Prediction

    Feed-forward propagation for the neural network is implemented. The code in predict.m

    returns the neural network's prediction.

    The feed-forward computation computes h(x(i)) for every example i and returns the

    associated predictions. Predict function is called using the loaded set of parameters for

    Theta1 and Theta2. The accuracy is about 97.5%.

    Cost Function

    The cost function for the neural network (without regularization) is

    where h(x(i)) is computed and K = 10 is the total number of possible labels.

    The regularized cost function is

    where if lambda = 1, the cost is about 0.383770.

    Back propagation

    Implement the backpropagation algorithm to compute the gradients for the parameters for

    the (unregularized) neural network. After you have verified that your gradient computation

  • 8/10/2019 Neural Networks and Machine Learning

    21/25

    for the unregularized case is correct, you will implement the gradient for the regularized

    neural network.

    When training neural networks, it is important to randomly initialize the parameters for

    symmetry breaking. One effective strategy for random initialization is to randomly select

    values for theta (l) uniformly in the range -0.12 to 0.12. This range of values ensures that

    the parameters are kept small and makes the learning more efficient.

    Given a training example (x(t); y(t)), we will first run a forward pass to compute all the

    activations throughout the network, including the output value of the hypothesis h_(x).

    Then, for each node j in layer l, we would like to compute an error term delta(l)j that

    measures how much that node was responsible for any errors in our output.

    For an output node, we can directly measure the difference between the network's

    activation and the true target value, and use that to define delta(3)j (since layer 3 is the

    output layer). For the hidden units, you will compute delta(l)j based on a weighted average

    of the error terms of the nodes in layer (L + 1).

  • 8/10/2019 Neural Networks and Machine Learning

    22/25

  • 8/10/2019 Neural Networks and Machine Learning

    23/25

    Other applications

    1. Integration of fuzzy logic into neural networks

    Fuzzy logic is a type of logic that recognizes more than simple true and false

    values, hence better simulating the real world. For example, the statement today is

    sunny might be 100% true if there are no clouds, 80% true if there are a few clouds,

    50% true if it's hazy, and 0% true if rains all day. Hence, it takes into account

    concepts like -usually, somewhat, and sometimes.

    Fuzzy logic and neural networks have been integrated for uses as diverse as

    automotive engineering, applicant screening for jobs, the control of a crane, and the

    monitoring of glaucoma.

    2. Pulsed neural networks

    Most practical applications of artificial neural networks are based on a

    computational model involving the propagation of continuous variables from one

    processing unit to the next. In recent years, data from neurobiological experiments

    have made it increasingly clear that biological neural networks, which

    communicate through pulses, use the timing of the pulses to transmit

    information and perform computation.This realization has stimulated

    significant research on pulsed neural networks, including theoretical analyses and

    model development, neurobiological modeling, and hardware implementation."

    3. NNs might, in the future, allow:

    robots that can see, feel, and predict the world around them

    improved stock prediction

    common usage of self-driving cars

    composition of music

    handwritten documents to be automatically transformed into formatted word

    processing documents

    trends found in the human genome to aid in the understanding of the data

    compiled by the Human Genome Project

    self-diagnosis of medical problems using neural networks

    and much more!

  • 8/10/2019 Neural Networks and Machine Learning

    24/25

    Conclusion

    Perhaps the greatest advantage of Neural Networks is their ability to be used as an arbitrary

    function approximation mechanism that 'learns' from observed data. However, using them

    is not so straightforward, and a relatively good understanding of the underlying theory is

    essential.

    Choice of model: This will depend on the data representation and the application.

    Overly complex models tend to lead to problems with learning.

    Learning algorithm: There are numerous trade-offs between learning algorithms.

    Almost any algorithm will work well with the correct hyperparametersfor training on

    a particular fixed data set. However, selecting and tuning an algorithm for training on

    unseen data requires a significant amount of experimentation.

    Robustness: If the model, cost function and learning algorithm are selected

    appropriately the resulting ANN can be extremely robust.

    With the correct implementation, ANNs can be used naturally in online learning and large

    data set applications. Their simple implementation and the existence of mostly local

    dependencies exhibited in the structure allows for fast, parallel implementations in

    hardware.

  • 8/10/2019 Neural Networks and Machine Learning

    25/25

    References

    1. class.coursera.org/ml-007/lecture

    2. cs.stanford.edu/people/eroberts/courses/soco/projects/neuralnetworks/Future/index.

    html

    3. ima.ac.uk/slides/nzk-02-06-2009.pdf

    4. L. Neumann and J. Matas, A method for text localization and recognition in real-

    world images, in Computer Vision ACCV 2010, ser. Lecture Notes in Computer

    Science, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Springer Berlin /

    Heidelberg, 2011, vol. 6494, pp. 770783.

    5. papers.nips.cc/paper/293-handwritten-digit-recognition-with-a-back-propagation-

    network.pdf

    6. Steven Bell, Text Detection And Recognition in Natural Images in CS 231A(Computer Vision) Stanford University, 2011