1.Periodic Continuous-Time Signals given two periodic CT signals
$$\forall t, x_1(t+T_1) = x_1(t)$$
and
$$\forall t, x_2(t+T_2) = x_2(t)$$
if the sum of them is periodic, then
$$\forall t, x_1(t+T)+x_2(t+T)=x_1(t)+x_2(t)$$
it is satisfied if and only if
$$\exists p,q \in N^*, T=pT_1=qT_2$$
in other word, p and q are coprime
$$\frac{T_1}{T_2}=\frac{p}{q}\in Q$$
2. Sifting Property note that $$\forall \lambda \neq t_1, \delta(\lambda-t_1)=0$$
it follows that
$$\int_{-\infty}^{+\infty}f(\lambda)\delta(\lambda-t_1)d\lambda=\int_{t_1-\epsilon}^{t_1+\epsilon}f(\lambda)\delta(\lambda-t_1)d\lambda=f(t_1)$$
where $$f(t)$$ is continuous at $$t=t_1$$
1. Review of Entropy Entropy is usually used to represent the average amount of self-information. The calculation formula is as follows:
$$H(X) = E(log\frac{1}{p_i}) = -\sum_{i=1}^{n}p_ilogp_i $$
For example, supposed that we have a probability distribution X = [0.7, 0.3]. Then we have,
$$H(X) = -0.7log0.7-0.3log0.3 = 0.88 bit$$
What’s more, we can easily find that $$0\le H(X) \le logn$$ is always true.
The proof of the left inequality is trivial because $$0\le p\le 1$$ the equality is established if and only if $$p_i=1$$
When I was in college, I bought a pair of scissors in the supermarket. It’s durable and handy. Every time I use it to cut things, an inexplicable “sense of fulfillment” flooded into my heart. Although it is merely an ordinary pair of scissors, with the increase of use, I seem to have “relied” on it. Why do you say that? Because there are many times that I spend a lot of time finding it, even though several other scissors are at home.
1. Preface Does deeper model always have good performance? Not Really!
2. ResNet Block Similar to VGG and GoogLeNet, ResNet make it easy to training deep neural networks.
$$f(x)= x + g(x)$$
3. Q&A
1. Preface As the Neural Network grows deeper, there are a few questions.
the loss comes at the end the top layers train faster the bottom layers train slowly the data is at the bottom as the bottom layers update, all other layers need to change the top layers retrain many times slower convergence 2. BatchNorm Fix mini-batch’s mean and variance:
$$\mu_B=\frac{1}{|B|}\sum_{i\in B}x_i$$
and
$$\sigma_{B}^{2}=\frac{1}{|B|}\sum_{i\in B}(x_i-\mu_B)^2+\epsilon$$
additional adjustments (trainable parameters):
1. Inception Block (v1) Extract information from 4 different paths, and then concatenate them in output channel. It has less parameters and time complexity. The assignment of channels are based on the significance.
1x1 Conv (64) 1x1 Conv (96); 3x3 Conv, pad 1 (128) 1x1 Conv (16); 5x5 Conv, pad 2 (32) 3x3 MaxPool, pad 1; 1x1 Conv (32) 2. Architecture You can see that there are so many hyper-parameters (num of channels).
1. NiN block Since ‘Dense Layer’ has so many parameters, we replace them by ‘1x1 Conv’. The shape doesn’t change while using ‘1x1 Conv’.
Conv2d (the same as AlexNet, ReLu) 1x1 Conv, stride 1, no pad (out_channel = in_channel, ReLU) 1x1 Conv, stride 1, no pad (out_channel = in_channel, ReLU) 2. Architecture NiN’s architecture is based on AlexNet, but we don’t use ‘Dense Layer’.
NiN block 3x3 MaxPool, stride 2 (half size) … Dropout(0.