> Deep-Learning

ResNet

1. Preface Does deeper model always have good performance? Not Really! 2. ResNet Block Similar to VGG and GoogLeNet, ResNet make it easy to training deep neural networks. $$f(x)= x + g(x)$$ 3. Q&A

Continue reading

BatchNorm

1. Preface As the Neural Network grows deeper, there are a few questions. the loss comes at the end the top layers train faster the bottom layers train slowly the data is at the bottom as the bottom layers update, all other layers need to change the top layers retrain many times slower convergence 2. BatchNorm Fix mini-batch’s mean and variance: $$\mu_B=\frac{1}{|B|}\sum_{i\in B}x_i$$ and $$\sigma_{B}^{2}=\frac{1}{|B|}\sum_{i\in B}(x_i-\mu_B)^2+\epsilon$$ additional adjustments (trainable parameters):

Continue reading

GoogLeNet

1. Inception Block (v1) Extract information from 4 different paths, and then concatenate them in output channel. It has less parameters and time complexity. The assignment of channels are based on the significance. 1x1 Conv (64) 1x1 Conv (96); 3x3 Conv, pad 1 (128) 1x1 Conv (16); 5x5 Conv, pad 2 (32) 3x3 MaxPool, pad 1; 1x1 Conv (32) 2. Architecture You can see that there are so many hyper-parameters (num of channels).

Continue reading

NiN

1. NiN block Since ‘Dense Layer’ has so many parameters, we replace them by ‘1x1 Conv’. The shape doesn’t change while using ‘1x1 Conv’. Conv2d (the same as AlexNet, ReLu) 1x1 Conv, stride 1, no pad (out_channel = in_channel, ReLU) 1x1 Conv, stride 1, no pad (out_channel = in_channel, ReLU) 2. Architecture NiN’s architecture is based on AlexNet, but we don’t use ‘Dense Layer’. NiN block 3x3 MaxPool, stride 2 (half size) … Dropout(0.

Continue reading

VGG

1. VGG block 3x3 Conv, pad 1 (n layers, m channels usually double, ReLU) 2x2 MaxPool, stride 2 (half size per block) It turns out that ‘deeper 3x3 Conv’ is better than ‘5x5 Conv’. 2. Architecture multiple VGG blocks Dense (4096) (Flatten, Linear, ReLU, Dropout) Dense (4096) (Linear, ReLU, Dropout) Dense (1000) 3. Code import torch from torch import nn def vgg_block(num_conv,in_channels,out_channels) ->nn.Sequential: layers: List[nn.Module] = [] for _ in range(num_conv): layers.

Continue reading

AlexNet

1. AlexNet Compared with LeNet, it has some changes. Add noise and avoid overfitting(data transform). bigger and deeper use dropout (normalization) AvgPooling -> MaxPooling Sigmoid -> ReLU 2. Code net = nn.Sequential( nn.Conv2d(3,96,kernel_size=11,stride=4,padding=2),nn.ReLU(), nn.MaxPool2d(kernel_size=3,stride=2), # nn.Conv2d(96,128*2,kernel_size=5,padding=2),nn.ReLU(), nn.MaxPool2d(kernel_size=3,stride=2), nn.Conv2d(128*2,192*2,kernel_size=3,padding=1),nn.ReLU(), nn.Conv2d(192*2,192*2,kernel_size=3,padding=1),nn.ReLU(), nn.Conv2d(192*2,128*2,kernel_size=3,padding=1),nn.ReLU(), nn.MaxPool2d(kernel_size=3,stride=2), # 6*6*256 nn.Flatten(), nn.Linear(6*6*256,2048*2),nn.ReLU(),nn.Dropout(p=0.5), nn.Linear(2048*2,2048*2),nn.ReLU(),nn.Dropout(p=0.5), nn.Linear(2048*2,1000),nn.ReLU(), ) 3. Q&A CNN提取的特征只是针对最后的分类任务,对于人来说大部分难以理解,因此它的可解释性较差。 Last two 4096 Full Connected Layers is necessary. A good name.

Continue reading

CNN

1. Concept Applying ’translation invariance’ and ’locality’ to the MLP, we then get a CNN which can significantly reduce the parameters. $$h_{i,j} = \sum_{a,b}v_{i,j,a,b}x_{i+a,j+b}$$ $$\Rightarrow h_{i,j} = \sum_{a=-\Delta}^{\Delta}\sum_{b=-\Delta}^{\Delta}v_{a,b}x_{i+a,j+b}$$ 2. Conv2d 2.1 Definition Input $$X: (N, C_{in}, H, W)$$ Kernel $$W: (h,w)$$ Bias: $$b$$ Output $$Y: (N, C_{out}, H’, W’)$$ $$Y=X\star W+b$$ 2.2 Cross Correlation $$y_{i,j} = \sum_{a=1}^{h}\sum_{b=1}^{w}w_{a,b}x_{i+a,j+b}$$ 2.3 Conv2d $$y_{i,j} = \sum_{a=1}^{h}\sum_{b=1}^{w}w_{-a,-b}x_{i+a,j+b}$$ As for implementation, we use ‘Cross Correlation’ but call it ‘Conv2d’.

Continue reading

Author's picture

LI WEI

苟日新,日日新,又日新

Not yet

Tokyo