首页 > 搜索 > 神经网络教程

神经网络教程,(五)神经网络入门之构建多层网络

互联网 2020-10-24 16:49:38

作者:chen_h微信号 & QQ:862251340微信公众号:coderpai简书地址:https://www.jianshu.com/p/cb6...

这篇教程是翻译Peter Roelants写的神经网络教程,作者已经授权翻译,这是原文。

该教程将介绍如何入门神经网络,一共包含五部分。你可以在以下链接找到完整内容。

(一)神经网络入门之线性回归Logistic分类函数(二)神经网络入门之Logistic回归(分类问题)(三)神经网络入门之隐藏层设计Softmax分类函数(四)神经网络入门之矢量化(五)神经网络入门之构建多层网络多层网络的推广

这部分教程将介绍两部分:

多层网络的泛化随机梯度下降的最小批处理分析

在这个教程中,我们把前馈神经网络推到任意数量的隐藏层。其中的概念我们都通过矩阵乘法和非线性变换来进行系统的说明。我们通过构建一个由两层隐藏层组成的小型网络去识别手写数字识别,来说明神经网络向多层神经网络的泛化能力。这个神经网络将是通过随机梯度下降算法进行训练。

我们先导入教程需要使用的软件包。

import numpy as npimport matplotlib.pyplot as pltfrom sklearn import datasets, cross_validation, metricsfrom matplotlib.colors import colorConverter, ListedColormapimport itertoolsimport collections手写数字集

在这个教程中,我们使用scikit-learn提供的手写数字集。这个手写数字集包含1797张8*8的图片。在处理中,我们可以把像素铺平,形成一个64维的向量。下图展示了每个数字的图片。注意,这个数据集和MNIST手写数字集是不一样,MNIST是一个大型的数据集,而这个只是一个小型的数据集。

我们会先对这个数据集进行一个预处理,将这个数据集切分成以下几部分:

一个训练集,用于模型的训练。(输入数据:X_train,目标数据:T_train)一个验证的数据集,用于去评估模型的性能,如果模型在训练数据集上面出现过拟合了,那么可以终止训练了。(输入数据:X_validation,目标数据:T_avlidation)一个测试数据集,用于最终对模型的测试。(输入数据:X_test,目标数据:T_test)# load the data from scikit-learn.digits = datasets.load_digits()# Load the targets.# Note that the targets are stored as digits, these need to be #converted to one-hot-encoding for the output sofmax layer.T = np.zeros((digits.target.shape[0],10))T[np.arange(len(T)), digits.target] += 1# Divide the data into a train and test set.X_train, X_test, T_train, T_test = cross_validation.train_test_split(digits.data, T, test_size=0.4)# Divide the test set into a validation set and final test set.X_validation, X_test, T_validation, T_test = cross_validation.train_test_split(X_test, T_test, test_size=0.5)# Plot an example of each image.fig = plt.figure(figsize=(10, 1), dpi=100)for i in range(10):ax = fig.add_subplot(1,10,i+1)ax.matshow(digits.images[i], cmap='binary') ax.axis('off')plt.show()

手写数字

网络层的泛化

在第四部分中,我们设计的神经网络通过矩阵相乘实现一个线性转换和一个非线性函数的转换。

在进行非线性函数处理时,我们是对每个神经元进行处理的,这样的好处是可以帮助我们更加容易的进行理解和计算。

我们利用Python classes构造了三个层:

一个线性转换层LinearLayer一个Logistic函数LogisticLayer一个softmax函数层SoftmaxOutputLayer

在正向传递时,每个层可以通过get_output函数计算该层的输出结果,这个结果将被下一层作为输入数据进行使用。在反向传递时,每一层的输入的梯度可以通过get_input_grad函数计算得到。如果是最后一层,那么梯度计算方程将利用目标结果进行计算。如果是中间的某一层,那么梯度就是梯度计算函数的输出结果。如果每个层有迭代参数的话,那么可以在get_params_iter函数中实现,并且在get_params_grad函数中按照原来的顺序实现参数的梯度。

注意,在softmax层中,梯度和损失函数的计算将根据输入样本的数量进行计算。也就是说,这将使得梯度与损失函数和样本数量之间是相互独立的,以至于当我们改变批处理的数量时,对别的参数不会产生影响。

# Define the non-linear functions useddef logistic(z): return 1 / (1 + np.exp(-z))def logistic_deriv(y):# Derivative of logistic functionreturn np.multiply(y, (1 - y))def softmax(z): return np.exp(z) / np.sum(np.exp(z), axis=1, keepdims=True)# Define the layers used in this modelclass Layer(object):"""Base class for the different layers.Defines base methods and documentation of methods."""def get_params_iter(self):"""Return an iterator over the parameters (if any).The iterator has the same order as get_params_grad.The elements returned by the iterator are editable in-place."""return []def get_params_grad(self, X, output_grad):"""Return a list of gradients over the parameters.The list has the same order as the get_params_iter iterator.X is the input.output_grad is the gradient at the output of this layer."""return []def get_output(self, X):"""Perform the forward step linear transformation.X is the input."""passdef get_input_grad(self, Y, output_grad=None, T=None):"""Return the gradient at the inputs of this layer.Y is the pre-computed output of this layer (not needed in this case).output_grad is the gradient at the output of this layer (gradient at input of next layer).Output layer uses targets T to compute the gradient based on the output error instead of output_grad"""passclass LinearLayer(Layer):"""The linear layer performs a linear transformation to its input."""def __init__(self, n_in, n_out):"""Initialize hidden layer parameters.n_in is the number of input variables.n_out is the number of output variables."""self.W = np.random.randn(n_in, n_out) * 0.1self.b = np.zeros(n_out)def get_params_iter(self):"""Return an iterator over the parameters."""return itertools.chain(np.nditer(self.W, op_flags=['readwrite']), np.nditer(self.b, op_flags=['readwrite']))def get_output(self, X):"""Perform the forward step linear transformation."""return X.dot(self.W) + self.bdef get_params_grad(self, X, output_grad):"""Return a list of gradients over the parameters."""JW = X.T.dot(output_grad)Jb = np.sum(output_grad, axis=0)return [g for g in itertools.chain(np.nditer(JW), np.nditer(Jb))]def get_input_grad(self, Y, output_grad):"""Return the gradient at the inputs of this layer."""return output_grad.dot(self.W.T)class LogisticLayer(Layer):"""The logistic layer applies the logistic function to its inputs."""def get_output(self, X):"""Perform the forward step transformation."""return logistic(X)def get_input_grad(self, Y, output_grad):"""Return the gradient at the inputs of this layer."""return np.multiply(logistic_deriv(Y), output_grad)class SoftmaxOutputLayer(Layer):"""The softmax output layer computes the classification propabilities at the output."""def get_output(self, X):"""Perform the forward step transformation."""return softmax(X)def get_input_grad(self, Y, T):"""Return the gradient at the inputs of this layer."""return (Y - T) / Y.shape[0]def get_cost(self, Y, T):"""Return the cost at the output of this output layer."""return - np.multiply(T, np.log(Y)).sum() / Y.shape[0]样本模型

接下来的部分,我们会实现设计的各个网络层,以及层与层之间的线性转换,神经元的非线性激活。

在这个教程中,我们使用的样本模型是由两个隐藏层,Logistic函数作为激活函数,最后使用softmax函数作为分类的一个神经网络模型。第一层的隐藏层将输入的数据从64维度降维到20维度。第二层的隐藏层将前一层输入的20维度经过映射之后,还是以20维度输出。最后一层的输出层是一个10维度的分类结果。下图具体描述了这种架构的实现:

样本模型

这个神经网络被表示成一种序列模型,即当前层的输入数据是前一层的输出数据,当前层的输出数据将成为下一层的输入数据。第一层作为序列的第0位,最后一层作为序列的索引最后位置。

# Define a sample model to be trained on the datahidden_neurons_1 = 20# Number of neurons in the first hidden-layerhidden_neurons_2 = 20# Number of neurons in the second hidden-layer# Create the modellayers = [] # Define a list of layers# Add first hidden layerlayers.append(LinearLayer(X_train.shape[1], hidden_neurons_1))layers.append(LogisticLayer())# Add second hidden layerlayers.append(LinearLayer(hidden_neurons_1, hidden_neurons_2))layers.append(LogisticLayer())# Add output layerlayers.append(LinearLayer(hidden_neurons_2, T_train.shape[1]))layers.append(SoftmaxOutputLayer())BP算法

BP算法在正向传播过程和反向传播过程中的具体细节已经在第四部分中进行了详细的解释,如果对此还有疑问,建议再去学习一下。这一部分,我们只单纯实现在多层神经网络中的BP算法。

正向传播过程

在下列代码中,forward_step函数实现了正向传播过程。get_output函数实现了每层的输出结果。这些激活的输出结果被保存在activations列表中。

# Define the forward propagation step as a method.def forward_step(input_samples, layers):"""Compute and return the forward activation of each layer in layers.Input:input_samples: A matrix of input samples (each row is an input vector)layers: A list of LayersOutput:A list of activations where the activation at each index i+1 corresponds tothe activation of layer i in layers. activations[0] contains the input samples."""activations = [input_samples] # List of layer activations# Compute the forward activations for each layer starting from the firstX = input_samplesfor layer in layers:Y = layer.get_output(X)# Get the output of the current layeractivations.append(Y)# Store the output for future processingX = activations[-1]# Set the current input as the activations of the previous layerreturn activations# Return the activations of each layer反向传播过程

在反向传播过程中,backward_step函数实现了反向传播过程。反向传播过程的计算是从最后一层开始的。先利用get_input_grad函数得到最初的梯度。然后,利用get_params_grad函数计算每一层的误差函数的梯度,并且把这些梯度保存在一个列表中。

# Define the backward propagation step as a methoddef backward_step(activations, targets, layers):"""Perform the backpropagation step over all the layers and return the parameter gradients.Input:activations: A list of forward step activations where the activation at each index i+1 corresponds to the activation of layer i in layers. activations[0] contains the input samples. targets: The output targets of the output layer.layers: A list of Layers corresponding that generated the outputs in activations.Output:A list of parameter gradients where the gradients at each index corresponds tothe parameters gradients of the layer at the same index in layers. """param_grads = collections.deque()# List of parameter gradients for each layeroutput_grad = None# The error gradient at the output of the current layer# Propagate the error backwards through all the layers.#Use reversed to iterate backwards over the list of layers.for layer in reversed(layers): Y = activations.pop()# Get the activations of the last layer on the stack# Compute the error at the output layer.# The output layer error is calculated different then hidden layer error.if output_grad is None:input_grad = layer.get_input_grad(Y, targets)else:# output_grad is not None (layer is not output layer)input_grad = layer.get_input_grad(Y, output_grad)# Get the input of this layer (activations of the previous layer)X = activations[-1]# Compute the layer parameter gradients used to update the parametersgrads = layer.get_params_grad(X, output_grad)param_grads.appendleft(grads)# Compute gradient at output of previous layer (input of current layer):output_grad = input_gradreturn list(param_grads)# Return the parameter gradients梯度检查

正如在第四部分中的分析,我们通过比较数值梯度和反向传播计算的梯度,来分析梯度是否正确。

在代码中,get_params_iter函数实现了得到每一层的参数,并且返回一个所有参数的迭代。get_params_grad函数根据反向传播,得到每一个参数对应的梯度。

# Perform gradient checkingnb_samples_gradientcheck = 10 # Test the gradients on a subset of the dataX_temp = X_train[0:nb_samples_gradientcheck,:]T_temp = T_train[0:nb_samples_gradientcheck,:]# Get the parameter gradients with backpropagationactivations = forward_step(X_temp, layers)param_grads = backward_step(activations, T_temp, layers)# Set the small change to compute the numerical gradienteps = 0.0001# Compute the numerical gradients of the parameters in all layers.for idx in range(len(layers)):layer = layers[idx]layer_backprop_grads = param_grads[idx]# Compute the numerical gradient for each parameter in the layerfor p_idx, param in enumerate(layer.get_params_iter()):grad_backprop = layer_backprop_grads[p_idx]# + epsparam += epsplus_cost = layers[-1].get_cost(forward_step(X_temp, layers)[-1], T_temp)# - epsparam -= 2 * epsmin_cost = layers[-1].get_cost(forward_step(X_temp, layers)[-1], T_temp)# reset param valueparam += eps# calculate numerical gradientgrad_num = (plus_cost - min_cost)/(2*eps)# Raise error if the numerical grade is not close to the backprop gradientif not np.isclose(grad_num, grad_backprop):raise ValueError('Numerical gradient of {:.6f} is not close to the backpropagation gradient of {:.6f}!'.format(float(grad_num), float(grad_backprop)))print('No gradient errors found')

No gradient errors found

BP算法中的随机梯度下降

这个教程我们使用一个梯度下降的改进版,称为随机梯度下降,来优化我们的损失函数。在一整个训练集上面,随机梯度下降算法只选择一个子集按照负梯度的方向进行更新。这样处理有以下几个好处:第一,在一个大型的训练数据集上面,我们可以节省时间和内存,因为这个算法减少了很多的矩阵操作。第二,增加了训练样本的多样性。

损失函数需要和输入样本的数量之间相互独立,因为在随机梯度算法处理的每一个过程中,样本子集的数量这一信息都被使用了。这也是为什么我们使用损失函授的均方误差,而不是平方误差。

批处理的最小数量

训练样本的子集经常被称之为最小批处理单位。在下面的代码中,我们将最小批处理单位设置成25,并且将输入数据和目标数据打包成一个元祖输入到网络中。

# Create the minibatchesbatch_size = 25# Approximately 25 samples per batchnb_of_batches = X_train.shape[0] / batch_size# Number of batches# Create batches (X,Y) from the training setXT_batches = zip(np.array_split(X_train, nb_of_batches, axis=0),# X samplesnp.array_split(T_train, nb_of_batches, axis=0))# Y targets随机梯度下降算法的更新

在代码中,update_params函数中实现了对每个参数的更新操作。在每一次的迭代中,我们都使用最简单的梯度下降算法来处理参数的更新,即:

其中,μ是学习率。

nb_of_iterations函数实现了,更新操作将会在一整个训练集上面进行多次迭代,每一次迭代都是取最小批处理单位的数据量。在每次全部迭代完之后,模型将会在验证集上面进行测试。如果在验证集上面,经过三次的完全迭代,损失函数的值没有下降,那么我们就认为模型已经过拟合了,需要终止模型的训练。或者经过设置的最大值300次,模型也会被终止训练。所以的损失误差值将会被保存下来,以便后续的分析。

# Define a method to update the parametersdef update_params(layers, param_grads, learning_rate):"""Function to update the parameters of the given layers with the given gradientsby gradient descent with the given learning rate."""for layer, layer_backprop_grads in zip(layers, param_grads):for param, grad in itertools.izip(layer.get_params_iter(), layer_backprop_grads):# The parameter returned by the iterator point to the memory space of#the original layer and can thus be modified inplace.param -= learning_rate * grad# Update each parameter# Perform backpropagation# initalize some lists to store the cost for future analysisminibatch_costs = []training_costs = []validation_costs = []max_nb_of_iterations = 300# Train for a maximum of 300 iterationslearning_rate = 0.1# Gradient descent learning rate# Train for the maximum number of iterationsfor iteration in range(max_nb_of_iterations):for X, T in XT_batches:# For each minibatch sub-iterationactivations = forward_step(X, layers)# Get the activationsminibatch_cost = layers[-1].get_cost(activations[-1], T)# Get costminibatch_costs.append(minibatch_cost)param_grads = backward_step(activations, T, layers)# Get the gradientsupdate_params(layers, param_grads, learning_rate)# Update the parameters# Get full training cost for future analysis (plots)activations = forward_step(X_train, layers)train_cost = layers[-1].get_cost(activations[-1], T_train)training_costs.append(train_cost)# Get full validation costactivations = forward_step(X_validation, layers)validation_cost = layers[-1].get_cost(activations[-1], T_validation)validation_costs.append(validation_cost)if len(validation_costs) > 3:# Stop training if the cost on the validation set doesn't decrease#for 3 iterationsif validation_costs[-1] >= validation_costs[-2] >= validation_costs[-3]:breaknb_of_iterations = iteration + 1# The number of iterations that have been executedminibatch_x_inds = np.linspace(0, nb_of_iterations, num=nb_of_iterations*nb_of_batches)iteration_x_inds = np.linspace(1, nb_of_iterations, num=nb_of_iterations)# Plot the cost over the iterationsplt.plot(minibatch_x_inds, minibatch_costs, 'k-', linewidth=0.5, label='cost minibatches')plt.plot(iteration_x_inds, training_costs, 'r-', linewidth=2, label='cost full training set')plt.plot(iteration_x_inds, validation_costs, 'b-', linewidth=3, label='cost validation set')# Add labels to the plotplt.xlabel('iteration')plt.ylabel('$\\xi$', fontsize=15)plt.title('Decrease of cost over backprop iteration')plt.legend()x1,x2,y1,y2 = plt.axis()plt.axis((0,nb_of_iterations,0,2.5))plt.grid()plt.show()

Descrease of cost over backprop iteration

模型在测试集上面的性能

最后,我们在测试集上面进行模型的最终测试。在这个模型中,我们最后的训练正确率是96%。

最后的结果可以利用混淆图进行更加深入的分析。这个表展示了每一个手写数字被分类为什么数字的数量。下图是利用scikit-learn的confusion_matrix方法实现的。

比如,数字8被误分类了五次,其中,两次被分类成了2,两次被分类成了5,一次被分类成了9。

# Get results of test datay_true = np.argmax(T_test, axis=1)# Get the target outputsactivations = forward_step(X_test, layers)# Get activation of test samplesy_pred = np.argmax(activations[-1], axis=1)# Get the predictions made by the networktest_accuracy = metrics.accuracy_score(y_true, y_pred)# Test set accuracyprint('The accuracy on the test set is {:.2f}'.format(test_accuracy))

The accuracy on the test set is 0.96

# Show confusion tableconf_matrix = metrics.confusion_matrix(y_true, y_pred, labels=None)# Get confustion matrix# Plot the confusion tableclass_names = ['${:d}$'.format(x) for x in range(0, 10)]# Digit class namesfig = plt.figure()ax = fig.add_subplot(111)# Show class labels on each axisax.xaxis.tick_top()major_ticks = range(0,10)minor_ticks = [x + 0.5 for x in range(0, 10)]ax.xaxis.set_ticks(major_ticks, minor=False)ax.yaxis.set_ticks(major_ticks, minor=False)ax.xaxis.set_ticks(minor_ticks, minor=True)ax.yaxis.set_ticks(minor_ticks, minor=True)ax.xaxis.set_ticklabels(class_names, minor=False, fontsize=15)ax.yaxis.set_ticklabels(class_names, minor=False, fontsize=15)# Set plot labelsax.yaxis.set_label_position("right")ax.set_xlabel('Predicted label')ax.set_ylabel('True label')fig.suptitle('Confusion table', y=1.03, fontsize=15)# Show a grid to seperate digitsax.grid(b=True, which=u'minor')# Color each grid cell according to the number classes predictedax.imshow(conf_matrix, interpolation='nearest', cmap='binary')# Show the number of samples in each cellfor x in xrange(conf_matrix.shape[0]):for y in xrange(conf_matrix.shape[1]):color = 'w' if x == y else 'k'ax.text(x, y, conf_matrix[y,x], ha="center", va="center", color=color) plt.show()

Confusion table

完整代码,点击这里

作者:chen_h微信号 & QQ:862251340简书地址:https://www.jianshu.com/p/cb6...

CoderPai 是一个专注于算法实战的平台,从基础的算法到人工智能算法都有设计。如果你对算法实战感兴趣,请快快关注我们吧。加入AI实战微信群,AI实战QQ群,ACM算法微信群,ACM算法QQ群。长按或者扫描如下二维码,关注 “CoderPai” 微信号(coderpai)图片描述

SouthEast

免责声明:非本网注明原创的信息,皆为程序自动获取互联网,目的在于传递更多信息,并不代表本网赞同其观点和对其真实性负责;如此页面有侵犯到您的权益,请给站长发送邮件,并提供相关证明(版权证明、身份证正反面、侵权链接),站长将在收到邮件12小时内删除。

一周热门

查看更多