neural networks and machine learning

8/10/2019 Neural Networks and Machine Learning

1/25

- 0 -

8/12/2014 Neural Networks and

Machine LearningSelf-study Seminar Report

Aditya Agarwal2K13/SE/007


2/25

Certificate

DEPARTMENT OF SOFTWARE

ENGINEERING

This is to certify that this Seminar report entitled Neural Networks and Machine

Learning submitted by Aditya Agarwal (2K13/SE/007)in partial fulfillment for the

requirements for the award of Bachelor of Technology Degree in Software Engineering

(SE) at Delhi Technological University is an authentic work carried out by the student

under my supervision and guidance.

To the best of my knowledge, the matter embodied in the report has not been submitted to

any other university or institute for the award of any degree or diploma.

Ms. Kusum Lata

(Assistant Professor)

Dept. of Computer Engineering

Delhi Technological University

Place: DTU, Bawana Road, Delhi-110042

Date: 08/12/2014


3/25

Acknowledgement

The successful completion of any task would be incomplete without accomplishing the

people who made it possible and whose constant guidance and encouragement secured me

the success.

First of all, I am grateful to the Almighty for establishing me to complete this self-study

assignment. I owe a debt to our faculty, Ms. Kusum Lata (Assistant Professor, COE

Department) for incorporating in me the idea of a creative self-study project, helping me in

undertaking this project and also for being there whenever I needed her assistance.

I also place on record, our sense of gratitude to one and all, who directly or indirectly have

lent their helping hand in this venture.

Last, but never the least, I thank my parents for being with me, in every sense.


4/25

Abstract

The goal of the field of Machine Learningis to build computer systems that learn from

experience and that are capable to adapt to their environments. Learning techniques and

methods developed by researchers in this field have been successfully applied to a variety

of learning tasks in a broad range of areas, including, for example, text classification, gene

discovery, financial forecasting, credit card fraud detection, collaborative filtering, design

of adaptive web agents and others.

Neural Networksare an innovation in the field of machine learning and Artificial

Intelligence that was originally motivated by the goal of having machines that can mimicthe brain. A Neural Network is the representation of brain's learning approach. This brain

operates as multiprocessor and has excellent interlinked. Neural Network also can be

represented as "Parallel distributed processing" planning.

Neural Networks came to be very widely used throughout the 1980's and 1990's and for

various reasons as popularity diminished in the late 90's. But more recently, Neural

Networks have had a major recent resurgence because maybe somewhat more recently that

computers became fast enough to really run large scale Neural Networks and because of

that as well as a few other technical reasons which we'll talk about later, modern Neural

Networks today are the state of the art technique for many applications like speech

recognition, text detection etc. Digit Recognitionis an application of Neural Networks

which has been dealt with in this project.


5/25

Table of Contents

S.No. Topic Page No.

1 Chapter 1

Introduction

Machine Learning

Supervised Learning

Unsupervised Learning

Neural Networks

1

1

2

3

4

2 Chapter 2

Literature Survey

6

3 Chapter 3

Discussion

Model Representation

Architecture

Algorithms

Hand-written digit recognition

Other applications

7

7

9

12

14

18

4 Conclusion 19

5 References 20


6/25

Chapter-1

Introduction

Machine Learning

Machine Learning is the field of study that gives computers the ability to learn without

being explicitly programmed.

A computer program is said to learnfrom experience E with respect to some task T and

some performance measure P, if its performance on T, as measured by P, improves with

experience E.

Various examples and applications exist:

Database mining

Large datasets from growth of automation/web.

E.g., Web click data, medical records, biology, engineering

Applications cant program by hand.

E.g., Autonomous helicopter, handwriting recognition, most of Natural Language

Processing (NLP), Computer Vision.

Self-customizing programs

E.g., Amazon, Netflix product recommendations

Understanding human learning (brain, real AI).

Two types of learning:

Supervised learning

Unsupervised learning


7/25

Supervised learning

The term supervised learning refers to the fact that we gave the algorithm a data set in

which the "right answers" were given. Such a data set is commonly called a training data

set.

Two types of supervised learning problems:

Regression problem, that means that our goal is to predict a continuous valued

output. Let's say you want to predict housing prices by collecting and plotting data

of price vs features of house. The learning algorithm might be able to do is put a

straight line through the data and use it to predict the price for new house.

Classification problem (Logistic regression), where the goal is to predict a

discrete value output.

Let's say you want to look at medical records and try to predict of a breast cancer as

malignant or benign. The past medical records help to get a discrete output-

malignant or benign.


8/25

Unsupervised learning

In machine learning, the problem of unsupervised learning is that of trying to find hidden

structure in unlabeled data. Since the examples given to the learner are unlabeled, there is

no error or reward signal to evaluate a potential solution. This distinguishes unsupervised

learning from supervised learning and reinforcement learning.

Approaches to unsupervised learning include:

Clustering (e.g., k-means, mixture models, hierarchical clustering)

Hidden Markov models,

Blind signal separation using feature extraction techniques for dimensionality

reduction (e.g., principal component analysis, independent component analysis,

non-negative matrix factorization, singular value decomposition)

For example,

Clusteringis used is in Google News and if you have not seen this before, you can

actually go to this URL news.google.com to take a look. What Google News does is

everyday it goes and looks at tens of thousands or hundreds of thousands of new stories on

the web and it groups them into cohesive news stories.

Similarly, an example of DNA microarray data is with the idea is put a group of different

individuals and for each of them, you measure how much they do or do not have a certain

gene.


9/25

Neural Networks

In computer science, artificial neural networks (ANNs) are forms of computer architecture

inspired by biological neural networks (the central nervous systems of animals, in

particular the brain) and are used to estimate or approximate functions that can depend on a

large number of inputs and are generally unknown. Artificial neural networks are generally

presented as systems of interconnected "neurons" which can compute values from inputs,

and are capable of machine learning as well as pattern recognition thanks to their adaptive

nature.

Examinations of the human's centralnervoussysteminspired the concept of neural

networks. In an Artificial Neural Network, simple artificial nodes, known as "neurons",

"neurodes", "processing elements" or "units", are connected together to form a network

which mimics a biological neural network.

There is no single formal definition of what an artificial neural network is. However, a class

of statistical models may commonly be called "Neural" if they possess the following

characteristics:

1. It consist of sets of adaptiveweights, i.e. numerical parameters that are tuned by a

learning algorithm, and2. They are capable of approximatingnon-linear functions of their inputs.

Non-Linear Hypothesis


10/25

For many machine learning problems, number of features, n will be pretty large. For

example, consider the problem of computer vision on the previous page.

So, if we were to try to learn a nonlinear hypothesis by including all the quadratic features,

that is all the terms of the form, you know, Xi times Xj, while with the 2500 pixels we

would end up with a total of three million features. And that's just too large to be

reasonable; the computation would be very expensive to find and to represent all of these

three million features per training example.

So, simple logistic regression together with adding in maybe the quadratic or the cubic

features - that's just not a good way to learn complex nonlinear hypotheses when n is large

because you just end up with too many features.

The problem can be stated that it is difficult to design an algorithm to do what the brain

does even when features are large. The solution is hence to model the brain itself.


11/25

Chapter-2

L iterature Survey

Warren McCulloch and Walter Pitts (1943) created a computational model for neural

networks based on mathematics and algorithms. They called this model threshold logic.

The model paved the way for neural network research to split into two distinct approaches.

One approach focused on biological processes in the brain and the other focused on the

application of neural networks to artificial intelligence.

Frank Rosenblatt (1958) created the perceptron, an algorithm for pattern recognition

based on a two-layer learning computer network using simple addition and subtraction.With mathematical notation, Rosenblatt also described circuitry not in the basic

perceptron, such as the exclusive-or circuit, a circuit whose mathematical computation

could not be processed until after the back propagation algorithmwas created by Paul

Werbos (1975).

In the 1990s, neural networks were overtaken in popularity in machine learning by support

vector machines and other, much simpler methods such as linear classifiers. Renewed

interest in neural nets was sparked in the 2000s by the advent of deep learning.

Between 2009 and 2012, the recurrent neural networks and deep feed forward neural

networks developed in the research group of Jrgen Schmidhuber at the Swiss AI Lab

IDSIA have won eight international competitions in pattern recognition and machine

learning.

Such neural networks also were the first artificial pattern recognizers to achieve human-

competitive or even superhuman performance on benchmarks such as traffic sign

recognition (IJCNN 2012), or the MNIST handwritten digits problem of Yann LeCun and

colleagues at NYU.

This work is in direct correspondence to the recent multi layered neural network

architecture, and its algorithms and applications in handwritten digit recognition.


12/25

Chapter-3

Discussion

Model Representation

These are called neuro-rewiring experiments. There's this sense that if the same piece ofphysical brain tissue can process sight or sound or touch then maybe there is one learning

algorithm that can process sight or sound or touch.

And instead of needing to implement a thousand different programs or a thousand different

algorithms to do, you know, the

thousand wonderful things that

the brain does, maybe what we

need to do is figure out some

approximation or to whatever

the brain's learning algorithm is

and implement that and that the

brain learned by itself how to

process these different types of

data.


13/25

This is logistic model of a neuronwith x1, x2 and x3 being the three features, x0 being

the bias unit equal to 1 and h(x)is the sigmoid (logistic) activation functionthat uses

the feature vector x and the parameter vector . Here, 0, 1, 2, 3 are the parameters

or the weights assigned to x0, x1, x2, x3 respectively.

Above is different neurons strung together called a neural network.

The first layer, also called the input layeris where we input our features, x1 x2 x3.

The final layer also called the output layerhas the neuron that outputs the final value

computed by a hypotheses.

The layer two in between, is called the hidden layerin which the neurons represent the

features learnt by the neural network from the input features and the learnt parameters.


14/25

Architecture

If network has sjunits in layer j, sj+1units in layer j+1, then (j)will be of dimension

sj+1x (sj+1).

There are various architectures of neural networks possible:

Feed-forward neural networks

These are the commonest type of neural network in practical applications.

The first layer is the input and the last layer is the output. If there is more than one hidden layer, we call them deep neural networks.

They compute a series of transformations that change the similarities between

cases.

The activities of the neurons in each layer are a non-linear function of the

activities in the layer below.


15/25

Recurrent Networks

These have directed cycles in their connection graph.

That means you can sometimes get back to where you started by following

the arrows.

They can have complicated dynamics and this can make them very difficult to train.

There is a lot of interest at present in finding efficient ways of training

recurrent nets.

They are more biologically realistic.

Symmetrically connected networks

These are like recurrent networks, but the connections between units are

symmetrical (they have the same weight in both directions). John Hopfield (and others) realized that symmetric networks are much

easier to analyze than recurrent networks.

They are also more restricted in what they can do because they obey an

energy function.

For example, they cannot model cycles.

Symmetrically connected nets without hidden units are called Hopfield nets.


16/25


17/25

Algorithms

Forward propagation algorithm

The process of computing h(x) is called forward propagation where we start off with the

activations of the input-units and then we sort of forward-propagate that to the hidden layer

and compute the activations of the hidden layer and then of the output layer. A vector wise

implementation of this procedure is given below.

1. Calculation of activations

2. Vectorisation of input features and the activations

3. Forward propagation step of calculating hidden layers from input layer and output

layer from the last hidden layer using sigmoid activation function.


18/25

Back propagation algorithm

The main objective is to find parameters theta to try to minimize the cost function j (theta)

in order to use either gradient descent or one of the advance optimization algorithms.

Follwing are the steps:

1. First convert the discrepancy between each output and its target value into an error

derivative.

2. Then compute error derivatives in each hidden layer from error derivatives in the

layer above.

3. Then use error derivatives w.r.t. activities to get error derivatives w.r.t. the

incoming weights.

4. Finally use gradient descent or any other technique to minimize the error cost

function.


19/25

Handwritten Digit Recognition

We can use multi-class logistic regression to recognize handwritten digits. However,

logistic regression cannot form more complex hypotheses as it is only a linear classifier.

Thus one can implement a neural network to recognize handwritten digits using the

MNIST database of handwritten digits. The neural network will be able to represent

complex models that form non-linear hypotheses. One goal is to implement the feed-

forward propagation algorithm to use already given weights for prediction. Next goal is to

write the back-propagation algorithm for learning the neural network parameters.

Model representation

Our neural network is shown in Figure 2. It has 3

layers: an input layer, a hidden layer and an output

layer. Our inputs are pixel values of digit images. Since

the images are of size 20x20, this gives us 400 input

layer units (excluding the extra bias unit which always

outputs +1). There are 5000 training examples in

ex3data1.mat. Each pixel is represented by a floating

point number indicating the grayscale intensity at that

location. The 20 by 20 grid of pixels is unrolled into a 400-dimensional vector. Each of

these training examples becomes a single row in our data matrix X. This gives us a 5000

by 400 matrix X where every row is a training example for a handwritten digit image.

The second part of the training set is a 5000-dimensional vector y that contains labels for

the training set. We have mapped the digit zero to the value ten, while the digits \1" to \9"

are labeled as 1 to9 in their natural order.


20/25

Feed-forward Propagation and Prediction

Feed-forward propagation for the neural network is implemented. The code in predict.m

returns the neural network's prediction.

The feed-forward computation computes h(x(i)) for every example i and returns the

associated predictions. Predict function is called using the loaded set of parameters for

Theta1 and Theta2. The accuracy is about 97.5%.

Cost Function

The cost function for the neural network (without regularization) is

where h(x(i)) is computed and K = 10 is the total number of possible labels.

The regularized cost function is

where if lambda = 1, the cost is about 0.383770.

Back propagation

Implement the backpropagation algorithm to compute the gradients for the parameters for

the (unregularized) neural network. After you have verified that your gradient computation


21/25

for the unregularized case is correct, you will implement the gradient for the regularized

neural network.

When training neural networks, it is important to randomly initialize the parameters for

symmetry breaking. One effective strategy for random initialization is to randomly select

values for theta (l) uniformly in the range -0.12 to 0.12. This range of values ensures that

the parameters are kept small and makes the learning more efficient.

Given a training example (x(t); y(t)), we will first run a forward pass to compute all the

activations throughout the network, including the output value of the hypothesis h_(x).

Then, for each node j in layer l, we would like to compute an error term delta(l)j that

measures how much that node was responsible for any errors in our output.

For an output node, we can directly measure the difference between the network's

activation and the true target value, and use that to define delta(3)j (since layer 3 is the

output layer). For the hidden units, you will compute delta(l)j based on a weighted average

of the error terms of the nodes in layer (L + 1).


22/25


23/25

Other applications

1. Integration of fuzzy logic into neural networks

Fuzzy logic is a type of logic that recognizes more than simple true and false

values, hence better simulating the real world. For example, the statement today is

sunny might be 100% true if there are no clouds, 80% true if there are a few clouds,

50% true if it's hazy, and 0% true if rains all day. Hence, it takes into account

concepts like -usually, somewhat, and sometimes.

Fuzzy logic and neural networks have been integrated for uses as diverse as

automotive engineering, applicant screening for jobs, the control of a crane, and the

monitoring of glaucoma.

2. Pulsed neural networks

Most practical applications of artificial neural networks are based on a

computational model involving the propagation of continuous variables from one

processing unit to the next. In recent years, data from neurobiological experiments

have made it increasingly clear that biological neural networks, which

communicate through pulses, use the timing of the pulses to transmit

information and perform computation.This realization has stimulated

significant research on pulsed neural networks, including theoretical analyses and

model development, neurobiological modeling, and hardware implementation."

3. NNs might, in the future, allow:

robots that can see, feel, and predict the world around them

improved stock prediction

common usage of self-driving cars

composition of music

handwritten documents to be automatically transformed into formatted word

processing documents

trends found in the human genome to aid in the understanding of the data

compiled by the Human Genome Project

self-diagnosis of medical problems using neural networks

and much more!


24/25

Conclusion

Perhaps the greatest advantage of Neural Networks is their ability to be used as an arbitrary

function approximation mechanism that 'learns' from observed data. However, using them

is not so straightforward, and a relatively good understanding of the underlying theory is

essential.

Choice of model: This will depend on the data representation and the application.

Overly complex models tend to lead to problems with learning.

Learning algorithm: There are numerous trade-offs between learning algorithms.

Almost any algorithm will work well with the correct hyperparametersfor training on

a particular fixed data set. However, selecting and tuning an algorithm for training on

unseen data requires a significant amount of experimentation.

Robustness: If the model, cost function and learning algorithm are selected

appropriately the resulting ANN can be extremely robust.

With the correct implementation, ANNs can be used naturally in online learning and large

data set applications. Their simple implementation and the existence of mostly local

dependencies exhibited in the structure allows for fast, parallel implementations in

hardware.


25/25

References

1. class.coursera.org/ml-007/lecture

2. cs.stanford.edu/people/eroberts/courses/soco/projects/neuralnetworks/Future/index.

html

3. ima.ac.uk/slides/nzk-02-06-2009.pdf

4. L. Neumann and J. Matas, A method for text localization and recognition in real-

world images, in Computer Vision ACCV 2010, ser. Lecture Notes in Computer

Science, R. Kimmel, R. Klette, and A. Sugimoto, Eds. Springer Berlin /

Heidelberg, 2011, vol. 6494, pp. 770783.

5. papers.nips.cc/paper/293-handwritten-digit-recognition-with-a-back-propagation-

network.pdf

6. Steven Bell, Text Detection And Recognition in Natural Images in CS 231A(Computer Vision) Stanford University, 2011

neural networks and machine learning

Documents