ResNet

December 14, 2021 in Deep-learning

1. Preface Does deeper model always have good performance? Not Really! 2. ResNet Block Similar to VGG and GoogLeNet, ResNet make it easy to training deep neural networks. $$f(x)= x + g(x)$$ 3. Q&A

BatchNorm

December 13, 2021 in Deep-learning

1. Preface As the Neural Network grows deeper, there are a few questions. the loss comes at the end the top layers train faster the bottom layers train slowly the data is at the bottom as the bottom layers update, all other layers need to change the top layers retrain many times slower convergence 2. BatchNorm Fix mini-batch’s mean and variance: $$\mu_B=\frac{1}{|B|}\sum_{i\in B}x_i$$ and $$\sigma_{B}^{2}=\frac{1}{|B|}\sum_{i\in B}(x_i-\mu_B)^2+\epsilon$$ additional adjustments (trainable parameters):

GoogLeNet

December 12, 2021 in Deep-learning

1. Inception Block (v1) Extract information from 4 different paths, and then concatenate them in output channel. It has less parameters and time complexity. The assignment of channels are based on the significance. 1x1 Conv (64) 1x1 Conv (96); 3x3 Conv, pad 1 (128) 1x1 Conv (16); 5x5 Conv, pad 2 (32) 3x3 MaxPool, pad 1; 1x1 Conv (32) 2. Architecture You can see that there are so many hyper-parameters (num of channels).

NiN

December 11, 2021 in Deep-learning

1. NiN block Since ‘Dense Layer’ has so many parameters, we replace them by ‘1x1 Conv’. The shape doesn’t change while using ‘1x1 Conv’. Conv2d (the same as AlexNet, ReLu) 1x1 Conv, stride 1, no pad (out_channel = in_channel, ReLU) 1x1 Conv, stride 1, no pad (out_channel = in_channel, ReLU) 2. Architecture NiN’s architecture is based on AlexNet, but we don’t use ‘Dense Layer’. NiN block 3x3 MaxPool, stride 2 (half size) … Dropout(0.

VGG

December 11, 2021 in Deep-learning

1. VGG block 3x3 Conv, pad 1 (n layers, m channels usually double, ReLU) 2x2 MaxPool, stride 2 (half size per block) It turns out that ‘deeper 3x3 Conv’ is better than ‘5x5 Conv’. 2. Architecture multiple VGG blocks Dense (4096) (Flatten, Linear, ReLU, Dropout) Dense (4096) (Linear, ReLU, Dropout) Dense (1000) 3. Code import torch from torch import nn def vgg_block(num_conv,in_channels,out_channels) ->nn.Sequential: layers: List[nn.Module] = [] for _ in range(num_conv): layers.

AlexNet

December 9, 2021 in Deep-learning

1. AlexNet Compared with LeNet, it has some changes. Add noise and avoid overfitting(data transform). bigger and deeper use dropout (normalization) AvgPooling -> MaxPooling Sigmoid -> ReLU 2. Code net = nn.Sequential( nn.Conv2d(3,96,kernel_size=11,stride=4,padding=2),nn.ReLU(), nn.MaxPool2d(kernel_size=3,stride=2), # nn.Conv2d(96,128*2,kernel_size=5,padding=2),nn.ReLU(), nn.MaxPool2d(kernel_size=3,stride=2), nn.Conv2d(128*2,192*2,kernel_size=3,padding=1),nn.ReLU(), nn.Conv2d(192*2,192*2,kernel_size=3,padding=1),nn.ReLU(), nn.Conv2d(192*2,128*2,kernel_size=3,padding=1),nn.ReLU(), nn.MaxPool2d(kernel_size=3,stride=2), # 6*6*256 nn.Flatten(), nn.Linear(6*6*256,2048*2),nn.ReLU(),nn.Dropout(p=0.5), nn.Linear(2048*2,2048*2),nn.ReLU(),nn.Dropout(p=0.5), nn.Linear(2048*2,1000),nn.ReLU(), ) 3. Q&A CNN提取的特征只是针对最后的分类任务，对于人来说大部分难以理解，因此它的可解释性较差。 Last two 4096 Full Connected Layers is necessary. A good name.

CNN

November 27, 2021 in Deep-learning

1. Concept Applying ’translation invariance’ and ’locality’ to the MLP, we then get a CNN which can significantly reduce the parameters. $$h_{i,j} = \sum_{a,b}v_{i,j,a,b}x_{i+a,j+b}$$ $$\Rightarrow h_{i,j} = \sum_{a=-\Delta}^{\Delta}\sum_{b=-\Delta}^{\Delta}v_{a,b}x_{i+a,j+b}$$ 2. Conv2d 2.1 Definition Input $$X: (N, C_{in}, H, W)$$ Kernel $$W: (h,w)$$ Bias: $$b$$ Output $$Y: (N, C_{out}, H’, W’)$$ $$Y=X\star W+b$$ 2.2 Cross Correlation $$y_{i,j} = \sum_{a=1}^{h}\sum_{b=1}^{w}w_{a,b}x_{i+a,j+b}$$ 2.3 Conv2d $$y_{i,j} = \sum_{a=1}^{h}\sum_{b=1}^{w}w_{-a,-b}x_{i+a,j+b}$$ As for implementation, we use ‘Cross Correlation’ but call it ‘Conv2d’.

ResNet

BatchNorm

GoogLeNet

NiN

VGG

AlexNet

CNN

LI WEI