小編延續之前的教學繼續教大家如何把前面所講的公式用python一步一步實作出來,這裡選擇的是用ipython notebook實作,這種筆記本也是小編愛上python的原因,有了ipython notebook程式碼和解說公式可以放在一起互相比對,也可以把實驗結果跑出來的圖放在筆記本上,簡直是神器阿XD。在公式旁小編都會附上對應的程式碼,如果還有不懂得歡迎留言詢問 ^.^
In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
In [2]:
import numpy as np
設置神經元層數,並初始化參數¶
In [3]:
sizes=[2,3,1]
num_layers = len(sizes)
biases = [np.random.randn(y, 1) for y in sizes[1:]] #輸入層沒有bias
weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])] #23 31
In [4]:
np.random.randn(2, 3) #Return a sample (or samples) from the “standard normal” distribution.
Out[4]:
- 第一行為隱藏層的偏權值,第二行為輸出神經元的偏權值
In [5]:
biases
Out[5]:
- 第一個array為輸入層與隱藏層之間的權重
- 第二個array為隱藏層與輸出層之間的權重
In [6]:
weights
Out[6]:
準備矩陣儲存算出的偏微分值¶
In [7]:
nabla_b = [np.zeros(b.shape) for b in biases]
nabla_b
Out[7]:
In [8]:
nabla_w = [np.zeros(w.shape) for w in weights]
nabla_w
Out[8]:
定義函數¶
In [9]:
def sigmoid(z):
"""The sigmoid function."""
return 1.0/(1.0+np.exp(-z))
def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z)*(1-sigmoid(z))
def cost_derivative(output_activations, y):
"""Return the vector of partial derivatives \partial C_x / \partial a for the output activations."""
return (output_activations-y)
製造訓練數據 x,y¶
In [10]:
np.random.seed(1)
x = 10 * np.random.randn(sizes[0], 1)
y = np.array([1])
print x
print y
前饋網路¶
前饋網路矩陣示意圖 Wlayer2 X layer1 * Wlayer1 X N = Wlayer2 X N
輸入層第0顆 | 輸入層第1顆 | 輸入層第1組 (X2*n) | Z3*n | |||
---|---|---|---|---|---|---|
隱藏層第0顆 | W0,0 | W0,1 | X0 | W0,0 X0 +W0,1 X1+ b0 | ||
隱藏層第1顆 | W1,0 | W1,1 | * | X1 | => | W1,0 X0 +W1,1 X1+ b1 |
隱藏層第2顆 | W2,0 | W2,1 | W2,0 X0 +W2,1 X1+ b2 |
In [11]:
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(biases, weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
- 各層的輸出神經元Z值,zs中包含兩個矩陣,np.array(zs)[0]就是第一個矩陣也就是代表隱藏層的輸出Z值
In [12]:
print np.array(zs)
- 各層的輸出神經元a值
In [13]:
print np.array(activations)
後饋網路¶
- 算出最後一層的敏感度delta $$\delta_{j}^{L} = \frac{\partial E}{\partial a_{j}^{L}}\ f^{‘}\left( z_{j}^{L} \right)$$
In [14]:
delta = cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
delta
Out[14]:
- 總誤差對輸出層b的微分就等於最後一層的敏感度 $$\frac{\partial E}{\partial b_{j}^{L}} = \delta_{j}^{L}$$
In [15]:
nabla_b[-1] = delta
- 根據公式算出最後一層總誤差對weight的微分 $$\frac{\partial E}{\partial w_{\text{jk}}^{l}} = a_{k}^{l – 1}\delta_{j}^{l}$$
nabla_wj,k=deltaj,1 * a1,k
In [16]:
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
nabla_w[-1]
Out[16]:
- 算出倒數第二層函數微分 $$f^{‘}\left( z_{k}^{L – 1} \right)$$
In [17]:
z = zs[-2]
f_prime = sigmoid_prime(z)
print f_prime
- 代入公式 $$\delta_{k}^{l – 1} = f^{‘}\left( z_{k}^{l – 1} \right)*\sum_{j}^{}{\delta_{j}^{l}w_{\text{jk}}^{l}}$$
wk,j X deltaj,1 * f_primek,1
In [18]:
delta_l_1 = np.dot(weights[-1].transpose(), delta) * f_prime
delta_l_1
Out[18]:
將算出的delta放入nabla_b矩陣中
In [19]:
nabla_b[-2] = delta_l_1
print nabla_b[-2]
- 根據公式算出最後第二層總誤差對weight的微分 $$\frac{\partial E}{\partial w_{\text{ki}}^{l-1}} = a_{i}^{l – 2}\delta_{k}^{l-1}$$
nabla_wk,i=deltak,1 * a1,i
In [20]:
nabla_w[-2] = np.dot(delta_l_1, activations[-2-1].transpose())
print nabla_w[-2]
寫成函式全部合再一起¶
In [21]:
def backprop( x, y):
nabla_b = [np.zeros(b.shape) for b in biases]
nabla_w = [np.zeros(w.shape) for w in weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(biases, weights):
z = np.dot(w, activation)+b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)
# backward pass
delta = cost_derivative(activations[-1], y) * \
sigmoid_prime(zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
# l的定義在程式中不一樣,l=1代表最後一層,l=2代表倒數第二層
for l in xrange(2, num_layers):
z = zs[-l]
f_prime = sigmoid_prime(z)
delta = np.dot(weights[-l+1].transpose(), delta) * f_prime
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())
return (nabla_b, nabla_w)
X丟輸入,Y丟輸出,經由backprop副函式算出偏微分¶
In [22]:
nabla_b, nabla_w=backprop(x,y)
In [23]:
nabla_w
Out[23]:
In [24]:
nabla_b
Out[24]:
重要公式總整理¶
公式1:根據公式算出最後一層總誤差對weight的微分 $$\frac{\partial E}{\partial w_{\text{jk}}^{l}} = a_{k}^{l – 1}\delta_{j}^{l}$$
nabla_wj,k=deltaj,1 * a1,k
公式2:根據前一層算出的delta算出當層delta $$\delta_{k}^{l – 1} = f^{‘}\left( z_{k}^{l – 1} \right)*\sum_{j}^{}{\delta_{j}^{l}w_{\text{jk}}^{l}}$$
wk,j X deltaj,1 * f_primek,1