convolution neural network computer visionnlp.jbnu.ac.kr/aijbnu2020/slides/aijbnu2020_cnn.pdf ·...

51
인공지능 심화학습 프로그램 Convolution Neural Network Computer vision Vision and Learning laboratory 강준규 2020.11.02 ~ 2020.11.05

Upload: others

Post on 31-Jan-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

  • 인공지능심화학습프로그램Convolution Neural Network

    Computer vision

    Vision and Learning laboratory강준규

    2020.11.02 ~ 2020.11.05

  • 2

    소개

    ◼ 출처 : cs231n.stanford.edu

  • 3

    목차

    ◼ What is computer vision?

    ◼ Let’s know Image *like RGB channel

    ◼ Convolution neural network example

    ◼ Filter

    ◼ Why Convolution neural network?

    ◼ Convolution neural network

    ◼ Convolution layer

    ◼ Stride, padding, pooling

    ◼ Receptive field

    ◼ ImageNet

    ◼ VGG, Resnet, Googlenet

    ◼ QnA & Training

  • 4

    컴퓨터비전이란?

    컴퓨터가보는시야를뜻합니다.

    즉컴퓨터에들어갈이미지, 영상등

    얼마나어떻게어떠한방법인지

    분석하고찾아내는분야입니다.

    출처 : https://medium.com/@miccowang/computer-vision-the-closet-thing-to-ai-on-our-personal-device-d2ff63994856

    출처 : https://www.annalect.com/7-ways-computer-vision-helps-marketers-see-better-performance/

  • 5

    컴퓨터비전이란?

    컴퓨터가사물을보기위해선컴퓨터가알아볼수있게저장해야하고저장방식에는다양한방법이있습니다.

    일반적인이미지는 RGB 채널을이용하여 0~255 사이의숫자를집어넣어서이미지를숫자화시킵니다.

    출처 : https://qits.tistory.com/entry/RGB%EC%83%89%EC%83%81%ED%91%9C

    R GB

  • 6

    Convolution neural network example

  • 7

    Convolution neural network example

  • 8

    Convolution neural network example

  • 9

    Convolution neural network example

  • 10

    Filter?

  • 11

    Filter?

    0 -1 0

    -1 8 -1

    0 -1 0

    18 8 9

    19 12 7

    13 14 30

    0 -8 0

    -19 96 -7

    0 -14 0

    0 -1 0

    -1 8 -1

    0 -1 0

    8 9 6

    12 7 17

    14 30 12

    0 -9 0

    -12 56 -17

    0 -30 0

  • 12

    Filter?

    0 -8 0

    -19 96 -7

    0 -14 0

    0 -9 0

    -12 56 -17

    0 -30 0

    0 -9 0

    -1 56 -17

    0 -30 0

    0 -8 0

    -9 96 -7

    0 -14 0

    0 -9 0

    -1 56 -17

    0 -30 0

    0 -19 0

    -8 23 -1

    0 -3 0

    +

    Filter

  • 13

    Filter?

    Filter Filter Filter Filter

    사람이정한 filter를이용하면사진에특수한효과를넣을수있다.

    그럼기계가 filter를자동으로찾아내고특수한 filter를만들순없을까?

    Ex) 눈을찾는 filter, 모서리를찾는 filter, 사람이알지못하는특수한차원의 filter

  • 14

    Convolution

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    𝑥1 𝑥2 𝑥3

    𝑥4 𝑥5 𝑥6

    𝑥7 𝑥8 𝑥9

    Image3x3x3

    RG

    B

    1 2

    2 1

    1 2

    2 1

    𝑤1 𝑤2

    𝑤3 𝑤4𝑦1

    Filter2x2x3

    Activation map2x2x1

    𝑦1 = 𝑤1𝑥1 + 𝑤2𝑥2 + 𝑤4𝑥4 + 𝑤5𝑥5+ 𝑤10𝑥10 + 𝑤11𝑥11 + 𝑤13𝑥13 + 𝑤14𝑥14

    + 𝑤19𝑥19 + 𝑤20𝑥20 + 𝑤22𝑥22 + 𝑤23𝑥23

  • 15

    Why Convolution neural network?

  • 16

    Why Convolution neural network?

  • 17

    Why Convolution neural network?

    Model이찾아야하는 weight의개수는?

    16 (input size) x 4 (hidden node)+ 4 (hidden node) x 2 (output)

    72

    만약 input size 224 x 224

    hidden layer가 3개

    각각 1024, 2048, 4096개 node

    Output 1000개의 class 일경우

    필요한 weight 개수는?

    50176 x 1024 + 1024 x 2048 + 2048 x 4096 + 4096 x 1000= 65,961,984개

  • 18

    Why Convolution neural network?

    Input size 224x224

    Convolution layer 3 kernel size 3x3

    Each channel size 128, 256, 512

    Last fully connected layer 1024

    Output class 1000

    9 x 128 + 9 x 256 + 9 x 512 + 1024 x 1000= 1,032,064

  • 19

    Convolution neural network

  • 20

    Fully connected layer

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    3 3 2

    2 9 1

    2 5 1

    Image3x3x3

    RG

    B

    3 2 5 6 2 3 4 2 4 3 9 6 4 5 1 1 3 3 2 7 4 3 1 1 3 9 1

    𝑤1𝑤2

    𝑤7

    𝑤18

    𝑤27

    ……

    𝑥1𝑥2

    𝑥7

    𝑥18

    𝑥27…

    2727

    𝑤1𝑤2

    𝑤7

    𝑤18

    𝑤27

    ……

  • 21

    Convolution layer

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    𝑥1 𝑥2 𝑥3

    𝑥4 𝑥5 𝑥6

    𝑥7 𝑥8 𝑥9

    Image3x3x3

    RG

    B

    Image 3x3x3

    filter 2x2x3

    1 2

    2 1

    1 2

    2 1

    𝑤1 𝑤2

    𝑤3 𝑤4

    𝑦1 𝑦2

    𝑦3 𝑦4

    Filter2x2x3

    Activation map2x2x1

    𝑦1 = 𝑤1𝑥1 +𝑤2𝑥2 + 𝑤4𝑥4 + 𝑤5𝑥5+ 𝑤10𝑥10 + 𝑤11𝑥11 + 𝑤13𝑥13 + 𝑤14𝑥14

    + 𝑤19𝑥19 + 𝑤20𝑥20 + 𝑤22𝑥22 + 𝑤23𝑥23

  • 22

    Convolution layer

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    3 3 2

    2 9 1

    2 5 1

    Image3x3x3

    RG

    B

    1 2

    2 1

    Filter2x2x3

    1 2

    2 1

    1 2

    2 1

    Activation map2x2x1

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    3 3 2

    2 9 1

    2 5 1

    Image3x3x3

    RG

    B

    1 2

    2 1

    Filter2x2x3

    1 2

    2 1

    1 2

    2 1

    42

    56

    Activation map2x2x1

    42 56

    17 95

  • 23

    Convolution layer

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    3 3 2

    2 9 1

    2 5 1

    Image3x3x3

    RG

    B

    1 2

    2 1

    Filter2x2x3

    1 2

    2 1

    8 21

    -2 -1

    Activation map2x2x1

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    3 3 2

    2 9 1

    2 5 1

    Image3x3x3

    RG

    B

    1 2

    2 1

    Filter2x2x3

    1 2

    2 1

    17

    -5

    Activation map2x2x1

    17 -5

    1 35

    8 21

    -2 -1

  • 24

    Convolution layer

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    3 3 2

    2 9 1

    2 5 1

    Image3x3x3

    RG

    B

    Filter2x2x3

    Activation map2x2x1

    17 -5

    1 35

    4 3 5

    2 9 3

    2 5 2

    4 3 5

    7 9 3

    1 1 8

    3 3 2

    2 9 1

    2 5 1

    Image3x3x3

    RG

    B

    Filter2x2x3

    Activation map2x2x1

    42 56

    17 951 2

    2 1

    1 2

    2 1

    1 2

    2 1

    1 2

    2 1

    1 2

    2 1

    8 21

    -2 -1

  • 25

    Convolution layer

    숫자 1개

  • 26

    Convolution layer

  • 27

    Convolution layer

  • 28

    Convolution layer

  • 29

    Convolution layer

  • 30

    Convolution layer stride

    7x7 image / 3x3 filter / stride 1

    7x7 image / 3x3 filter / stride 2

  • 31

    Convolution layer stride

    7x7 image / 3x3 filter / stride 2

    Output activation map7x7 image / 3x3 filter / stride 1

    Output activation map7x7 image / 3x3 filter / stride 2

    Output activation map7x7 image / 3x3 filter / stride 3

  • 32

    Convolution layer padding

    0 0 0 0 0 0 0 0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0 0 0 0 0 0 0 0

    7x7 image / 3x3 filter / stride 1 / padding 1

    0 0 0 0 0 0 0 0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0

    0 0 0 0 0 0 0 0 0

    Output activation map7x7 image / 3x3 filter / stride 1 / padding 1

    Output activation map7x7 image / 3x3 filter / stride 2 / padding 1

  • 33

    Convolution layer pooling

  • 34

    Filter size formula

  • 35

    Classification

    Cat 0.7

    Dog 0.08

    Car 0.01

    Bird 0.01

    Lion 0.2

    Model

  • 36

    Detection

    1

    2

    3 22

    4

    Model

  • 37

    Segmentation

    1 0 1 1 0 1

    1 0 1 1 0 1

    1 1 1 1 1 1

    2 1 1 1 1 1

    1 1 1 1 0 1

    2 1 2 2 0 1

    2 2 2 2 0 2

    0 0 2 2 0 2

    Model

  • 38

    Convolution neural network

    Filter 동그라미? Filter 눈? Filter

    고양이

    Filter 가로선? Filter 수염?

    Filter 동그라미? Filter 발?

    Filter

    Filter

  • 39

    Convolution neural network

    convolution layer 1 convolution layer 2 convolution layer 3

    간단한특징 고차원특징 초고차원특징

  • 40

    Convolution neural network

  • 41

    Receptive field

    3x3 filter conv

    Fully connected layer

    Cat : 78%Lion : 20%Dog : 2%

    이경우 model은

    즉더넓은영역의정보를취합하여예측하는게아니라 3x3 영역까지의정보만보고예측한다.

    Ex) 귀가있으면서발이있다 -> 정보를모름귀와발은 3x3 이상떨어져있기때문

    만보고판단을한다.

    3x3 filter conv 3x3 filter conv

    Fully connected layer

    Cat : 78%Lion : 20%Dog : 2%

    이경우 model은 5x5 영역까지의

    정보를보고예측한다.

  • 42

    Receptive field

    5x5 filter conv

    3x3 filter conv 3x3 filter conv

    둘의차이점은?

    장점과단점

  • 43

    Dilation convolution

  • 44

    Convolution neural network

  • 45

    LeNet-5

    Parameter size ≒ 60,000

    LeNet은 1998년초창기 CNN 모델이다.Yann LeCun이개발했으며현재뉴욕대교수이면서Facebook 연구소기술이사이다.원래우편번호와수표의필기체들을인식하기위한용도

  • 46

    ImageNet

    ImageNet은 computer vision 경연을위해ILSVRC에서사용하는유명한데이터셋.2012~2017년 까지대회를진행하였으며현재에도각종논문에서사용하고있는가장유명한데이터이다.

    1000종류와총 1,281,167장의 데이터가존재하며200GB가넘는다.

    ILSVRC: ImageNet Large Scale Visual Recognition Competition

  • 47

    AlexNet

    Parameter size ≒ 62,000,000

    AlexNet은 2012년 ILSVRC에서우승인 CNN 구조이다.특이사항으론이전에설명한 LeNet-5 의구조를따르지만컴퓨터성능이발전함에따라 2개의 GPU를병렬연산할수있게모델을설계했다.Image netTop-5 error : 15.3%

  • 48

    VGG

    Ex) VGG16 (D type)

    Parameter size ≒ 138,000,000

    VGG는 2014년 ILSVRC에서 2등인 CNN 구조이다.특이사항으론 1등인 GoogleNet보다간단한구조인데성능의차이가크지않고응용하기쉽다.Image netTop-5 error : 7.5%

  • 49

    GoogleNet

    GoogleNet은 2014년 ILSVRC에서우승인 CNN 구조이다.특이사항으론 Inception module을이용하여 .Image netTop-5 error : 6.7%

    Inception module

  • 50

    Resnet

    일반적인 network의구조 residual network의구조

    Resnet은 2015년 ILSVRC에서우승인 CNN 구조이다.특이사항으론기존모델들과다르게 residual 개념을사용하여더욱더깊은 network를설계할수있게되었다.Image netTop-5 error : 3.57%

  • 51

    Q & A

    ◼ Thank You