题图来自:
本文主要对卷积神经网络做可视化分析。| | |
| | | | |
by / /
中文翻译 /如有转载,请附上本文链接。
介绍
在之前的一些关于卷积神经网络的教程中,我们展示了卷积滤波权重,比如教程#02和#06。但单从滤波权重上看,不可能确定卷积滤波器能从输入图像中识别出什么。
本教程中,我们会提出一种用于可视化分析神经网络内部工作原理的基本方法。这个方法就是生成最大化神经网络内个体特征的图像。图像用一些随机噪声初始化,然后用给定特征关于输入图像的梯度来逐渐改变(生成的)图像。
可视化分析神经网络的方法也称为 特征最大化(feature maximization) 或 激活最大化(activation maximization)**。
本文基于之前的教程。你需要大概地熟悉神经网络(详见教程 #01和 #02),了解Inception模型也很有帮助(教程 #07)。
流程图
这里将会使用教程 #07中的Inception模型。我们想要找到使得神经网络内给定特征最大化的图像。输入图像用一些噪声初始化,然后用给定特征的梯度来更新图像。在执行了一些优化迭代之后,我们会得到一个这个特定特征“喜欢看到的”图像。
由于Inception模型是由很多相结合的基本数学运算构造的,使用微分链式法则,TensorFlow让我们很快就能找到损失函数的梯度。
from IPython.display import Image, displayImage('images/13_visual_analysis_flowchart.png')复制代码
导入
%matplotlib inlineimport matplotlib.pyplot as pltimport tensorflow as tfimport numpy as np# Functions and classes for loading and using the Inception model.import inception复制代码
使用Python3.5.2(Anaconda)开发,TensorFlow版本是:
tf.__version__复制代码
'1.1.0'
Inception 模型
从网上下载Inception模型
从网上下载Inception模型。这是你保存数据文件的默认文件夹。如果文件夹不存在就自动创建。
# inception.data_dir = 'inception/'复制代码
如果文件夹中不存在Inception模型,就自动下载。 它有85MB。
inception.maybe_download()复制代码
Downloading Inception v3 Model ...
Download progress: 100.0% Download finished. Extracting files. Done.
卷积层的名称
这个函数返回Inception模型中卷积层的名称列表。
def get_conv_layer_names(): # Load the Inception model. model = inception.Inception() # Create a list of names for the operations in the graph # for the Inception model where the operator-type is 'Conv2D'. names = [op.name for op in model.graph.get_operations() if op.type=='Conv2D'] # Close the TensorFlow session inside the model-object. model.close() return names复制代码
conv_names = get_conv_layer_names()复制代码
在Inception模型中总共有94个卷积层。
len(conv_names)复制代码
94
写出头5个卷积层的名称。
conv_names[:5]复制代码
['conv/Conv2D',
'conv_1/Conv2D', 'conv_2/Conv2D', 'conv_3/Conv2D', 'conv_4/Conv2D']
写出最后5个卷积层的名称。
conv_names[-5:]复制代码
['mixed_10/tower_1/conv/Conv2D',
'mixed_10/tower_1/conv_1/Conv2D', 'mixed_10/tower_1/mixed/conv/Conv2D', 'mixed_10/tower_1/mixed/conv_1/Conv2D', 'mixed_10/tower_2/conv/Conv2D']
找到输入图像的帮助函数
这个函数用来寻找使网络内给定特征最大化的输入图像。它本质上是用梯度法来进行优化。图像用小的随机值初始化,然后用给定特征关于输入图像的梯度来逐步更新。
def optimize_image(conv_id=None, feature=0, num_iterations=30, show_progress=True): """ Find an image that maximizes the feature given by the conv_id and feature number. Parameters: conv_id: Integer identifying the convolutional layer to maximize. It is an index into conv_names. If None then use the last fully-connected layer before the softmax output. feature: Index into the layer for the feature to maximize. num_iteration: Number of optimization iterations to perform. show_progress: Boolean whether to show the progress. """ # Load the Inception model. This is done for each call of # this function because we will add a lot to the graph # which will cause the graph to grow and eventually the # computer will run out of memory. model = inception.Inception() # Reference to the tensor that takes the raw input image. resized_image = model.resized_image # Reference to the tensor for the predicted classes. # This is the output of the final layer's softmax classifier. y_pred = model.y_pred # Create the loss-function that must be maximized. if conv_id is None: # If we want to maximize a feature on the last layer, # then we use the fully-connected layer prior to the # softmax-classifier. The feature no. is the class-number # and must be an integer between 1 and 1000. # The loss-function is just the value of that feature. loss = model.y_logits[0, feature] else: # If instead we want to maximize a feature of a # convolutional layer inside the neural network. # Get the name of the convolutional operator. conv_name = conv_names[conv_id] # Get a reference to the tensor that is output by the # operator. Note that ":0" is added to the name for this. tensor = model.graph.get_tensor_by_name(conv_name + ":0") # Set the Inception model's graph as the default # so we can add an operator to it. with model.graph.as_default(): # The loss-function is the average of all the # tensor-values for the given feature. This # ensures that we generate the whole input image. # You can try and modify this so it only uses # a part of the tensor. loss = tf.reduce_mean(tensor[:,:,:,feature]) # Get the gradient for the loss-function with regard to # the resized input image. This creates a mathematical # function for calculating the gradient. gradient = tf.gradients(loss, resized_image) # Create a TensorFlow session so we can run the graph. session = tf.Session(graph=model.graph) # Generate a random image of the same size as the raw input. # Each pixel is a small random value between 128 and 129, # which is about the middle of the colour-range. image_shape = resized_image.get_shape() image = np.random.uniform(size=image_shape) + 128.0 # Perform a number of optimization iterations to find # the image that maximizes the loss-function. for i in range(num_iterations): # Create a feed-dict. This feeds the image to the # tensor in the graph that holds the resized image, because # this is the final stage for inputting raw image data. feed_dict = {model.tensor_name_resized_image: image} # Calculate the predicted class-scores, # as well as the gradient and the loss-value. pred, grad, loss_value = session.run([y_pred, gradient, loss], feed_dict=feed_dict) # Squeeze the dimensionality for the gradient-array. grad = np.array(grad).squeeze() # The gradient now tells us how much we need to change the # input image in order to maximize the given feature. # Calculate the step-size for updating the image. # This step-size was found to give fast convergence. # The addition of 1e-8 is to protect from div-by-zero. step_size = 1.0 / (grad.std() + 1e-8) # Update the image by adding the scaled gradient # This is called gradient ascent. image += step_size * grad # Ensure all pixel-values in the image are between 0 and 255. image = np.clip(image, 0.0, 255.0) if show_progress: print("Iteration:", i) # Convert the predicted class-scores to a one-dim array. pred = np.squeeze(pred) # The predicted class for the Inception model. pred_cls = np.argmax(pred) # Name of the predicted class. cls_name = model.name_lookup.cls_to_name(pred_cls, only_first_name=True) # The score (probability) for the predicted class. cls_score = pred[pred_cls] # Print the predicted score etc. msg = "Predicted class-name: {0} (#{1}), score: {2:>7.2%}" print(msg.format(cls_name, pred_cls, cls_score)) # Print statistics for the gradient. msg = "Gradient min: {0:>9.6f}, max: {1:>9.6f}, stepsize: {2:>9.2f}" print(msg.format(grad.min(), grad.max(), step_size)) # Print the loss-value. print("Loss:", loss_value) # Newline. print() # Close the TensorFlow session inside the model-object. model.close() return image.squeeze()复制代码
绘制图像和噪声的帮助函数
函数对图像做归一化,则像素值在0.0到1.0之间。
def normalize_image(x): # Get the min and max values for all pixels in the input. x_min = x.min() x_max = x.max() # Normalize so all values are between 0.0 and 1.0 x_norm = (x - x_min) / (x_max - x_min) return x_norm复制代码
这个函数绘制一张图像。
def plot_image(image): # Normalize the image so pixels are between 0.0 and 1.0 img_norm = normalize_image(image) # Plot the image. plt.imshow(img_norm, interpolation='nearest') plt.show()复制代码
这个函数在坐标系内绘制6张图。
def plot_images(images, show_size=100): """ The show_size is the number of pixels to show for each image. The max value is 299. """ # Create figure with sub-plots. fig, axes = plt.subplots(2, 3) # Adjust vertical spacing. fig.subplots_adjust(hspace=0.1, wspace=0.1) # Use interpolation to smooth pixels? smooth = True # Interpolation type. if smooth: interpolation = 'spline16' else: interpolation = 'nearest' # For each entry in the grid. for i, ax in enumerate(axes.flat): # Get the i'th image and only use the desired pixels. img = images[i, 0:show_size, 0:show_size, :] # Normalize the image so its pixels are between 0.0 and 1.0 img_norm = normalize_image(img) # Plot the image. ax.imshow(img_norm, interpolation=interpolation) # Remove ticks. ax.set_xticks([]) ax.set_yticks([]) # Ensure the plot is shown correctly with multiple plots # in a single Notebook cell. plt.show()复制代码
优化和绘制图像的帮助函数
这个函数优化多张图像并绘制它们。
def optimize_images(conv_id=None, num_iterations=30, show_size=100): """ Find 6 images that maximize the 6 first features in the layer given by the conv_id. Parameters: conv_id: Integer identifying the convolutional layer to maximize. It is an index into conv_names. If None then use the last layer before the softmax output. num_iterations: Number of optimization iterations to perform. show_size: Number of pixels to show for each image. Max 299. """ # Which layer are we using? if conv_id is None: print("Final fully-connected layer before softmax.") else: print("Layer:", conv_names[conv_id]) # Initialize the array of images. images = [] # For each feature do the following. Note that the # last fully-connected layer only supports numbers # between 1 and 1000, while the convolutional layers # support numbers between 0 and some other number. # So we just use the numbers between 1 and 7. for feature in range(1,7): print("Optimizing image for feature no.", feature) # Find the image that maximizes the given feature # for the network layer identified by conv_id (or None). image = optimize_image(conv_id=conv_id, feature=feature, show_progress=False, num_iterations=num_iterations) # Squeeze the dim of the array. image = image.squeeze() # Append to the list of images. images.append(image) # Convert to numpy-array so we can index all dimensions easily. images = np.array(images) # Plot the images. plot_images(images=images, show_size=show_size)复制代码
结果
为浅处的卷积层优化图像
举个例子,寻找让卷积层conv_names[conv_id]
中的2号特征最大化的输入图像,其中conv_id=5
。
image = optimize_image(conv_id=5, feature=2, num_iterations=30, show_progress=True)复制代码
Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.81% Gradient min: -0.000083, max: 0.000100, stepsize: 76290.32 Loss: 4.83793Iteration: 1
Predicted class-name: kite (#397), score: 15.12% Gradient min: -0.000142, max: 0.000126, stepsize: 71463.42 Loss: 5.59611Iteration: 2
Predicted class-name: wall clock (#524), score: 6.85% Gradient min: -0.000119, max: 0.000121, stepsize: 80427.39 Loss: 6.91725...
Iteration: 28 Predicted class-name: bib (#941), score: 19.26% Gradient min: -0.000043, max: 0.000043, stepsize: 214742.82 Loss: 17.7469Iteration: 29
Predicted class-name: bib (#941), score: 18.87% Gradient min: -0.000047, max: 0.000059, stepsize: 218511.00 Loss: 17.9321
plot_image(image)复制代码
为卷积层优化多张图像
下面,我们为Inception模型中的卷积层优化多张图像,并绘制它们。这些图像展示了卷积层“想看到的”内容。注意更深的层次里图案变得越来越复杂。
optimize_images(conv_id=0, num_iterations=10)复制代码
Layer: conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5
optimize_images(conv_id=3, num_iterations=30)复制代码
Layer: conv_3/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=4, num_iterations=30)复制代码
Layer: conv_4/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=5, num_iterations=30)复制代码
Layer: mixed/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=6, num_iterations=30)复制代码
Layer: mixed/tower/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=7, num_iterations=30)复制代码
Layer: mixed/tower/conv_1/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=8, num_iterations=30)复制代码
Layer: mixed/tower_1/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=9, num_iterations=30)复制代码
Layer: mixed/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=10, num_iterations=30)复制代码
Layer: mixed/tower_1/conv_2/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=20, num_iterations=30)复制代码
Layer: mixed_2/tower/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=30, num_iterations=30)复制代码
Layer: mixed_4/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=40, num_iterations=30)复制代码
Layer: mixed_5/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=50, num_iterations=30)复制代码
Layer: mixed_6/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=60, num_iterations=30)复制代码
Layer: mixed_7/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=70, num_iterations=30)复制代码
Layer: mixed_8/tower/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=80, num_iterations=30)复制代码
Layer: mixed_9/tower_1/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=90, num_iterations=30)复制代码
Layer: mixed_10/tower_1/conv_1/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
optimize_images(conv_id=93, num_iterations=30)复制代码
Layer: mixed_10/tower_2/conv/Conv2D
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
Softmax前最终的全连接层
现在,我们为Inception模型中的最后一层优化并绘制图像。这是在softmax分类器前的全连接层。该层特征对应了输出的类别。
我们可能希望在这些图像里看到一些可识别的图案,比如对应输出类别的猴子、鸟类等,但图像只显示了一些复杂的、抽象的图案。
optimize_images(conv_id=None, num_iterations=30)复制代码
Final fully-connected layer before softmax.
Optimizing image for feature no. 1 Optimizing image for feature no. 2 Optimizing image for feature no. 3 Optimizing image for feature no. 4 Optimizing image for feature no. 5 Optimizing image for feature no. 6
Inception模型以大约100%的确信度将结果图像分类成“敏狐”,但在人眼看来,图像只是一些抽象的图案。
如果你想测试另一个特征号码,要注意,号码必须介于0到1000之间,因为它对应了最终输出层的一个有效类别号。
image = optimize_image(conv_id=None, feature=1, num_iterations=100, show_progress=True)复制代码
Iteration: 0
Predicted class-name: dishwasher (#667), score: 4.98% Gradient min: -0.006252, max: 0.004451, stepsize: 3734.48 Loss: -0.837608Iteration: 1
Predicted class-name: ballpoint (#907), score: 8.52% Gradient min: -0.007303, max: 0.006427, stepsize: 2152.89 Loss: -0.416723... Iteration: 98 Predicted class-name: kit fox (#1), score: 100.00% Gradient min: -0.007732, max: 0.010692, stepsize: 1286.44 Loss: 67.5603Iteration: 99
Predicted class-name: kit fox (#1), score: 100.00% Gradient min: -0.005850, max: 0.006159, stepsize: 1863.65 Loss: 75.6356
plot_image(image=image)复制代码
关闭TensorFlow会话
在上面使用Inception模型的函数中已经关闭了TensorFlow会话。这么做是为了节省内存,因此当计算图中添加了很多梯度函数时,电脑不会奔溃。
总结
这篇教程说明了如何优化输入图像,使得神经网络内的特征最大化。由于神经网络内给定特征(或神经元)对特定的图像反应最强烈,这让我们可以对其“喜欢看到的东西”进行可视化分析。
对神经网络的较低层,图像包含了简单的图案,比如不同类型的波浪线。随着网络越来越深,图像模式越来越复杂。我们可能会希望深层网络的模式是可识别的,比如猴子、狐狸、汽车等等,但实际上深层网络的图像模式更加复杂和抽象。
这是为什么?回想在教程 #11中,Inception模型很容易就被一些对抗噪声糊弄,而将任何输入图分类为另外的目标类别。因此,不难想象Inception模型可以识别这些在人眼看来并不清楚的抽象图像模式。可能存在无穷多的能够最大化神经网络内部特征的图像,并且人类只能识别出其中的一小部分。这也许是优化过程只找到抽象图像模式的原因。
其他方法
研究文献中还有许多指导优化过程的建议,从而找到人类更易识别的图像模式。
提出了一种结合启发式来引导图像模式的优化过程。论文中展示了一些类别的样本图像,比如火烈鸟、鹈鹕、黑天鹅,人眼多多少少都能识别出来。在有方法的实现(精确的行数以后可能会改变)。这个方法需要启发式的组合并对参数进行微调,以生成这些图像。但论文中参数的选择并不明确。尽管尝试了一番,我还是无法重现他们的结果。也许我误解了这篇论文,或许启发式对他们网络架构(一种AlexNet的变体)的微调是好的,然而这篇教程中用的是更先进的Inception模型。
提出了另一种生成人眼可识别的图像的方法。然而,实际上这个方法作弊了,因为它遍历训练集中的所有图像(比如ImageNet),找到能最大激活神经网络中给定特征的图像。然后对相似的图像做聚类和平均。将这个作为优化程序的初始图像。因此,当使用从真实照片构造的图像时,这个方法能得到更好的结果也不足为怪了。
练习
下面使一些可能会让你提升TensorFlow技能的一些建议练习。为了学习如何更合适地使用TensorFlow,实践经验是很重要的。
在你对这个Notebook进行修改之前,可能需要先备份一下。
- 尝试为网络中较低层的特征运行多次优化。得到的图像总是相同吗?
- 试着用更少或更多的优化迭代。这对图像质量有何影响?
- 试着改变卷积特征的损失函数。这可以用不同的方法来做。它将如何影响图样模式?为什么?
- 你认为优化器除了增大我们想要最大化的那个特征之外,会放大其他特征吗?你要怎么度量这个?你确定优化器一次只会最大化一个特征吗?
- 试着同时最大化多个特征。
- 在MNIST数据集上训练一个小一点的网络,然后试着对特征和层次做可视化。会更容易在图像中看到图案吗?
- 试着实现上述论文中的方法。
- 试着用你自己的方法来改善优化的图像。
- 向朋友解释程序如何工作。