Tensorflow学习笔记（一） Tensorflow学习笔记（一）

因为博士研究方向主要集中在运筹优化方向，希望博士毕业之后工作转向人工智能方向。因此决定在未来的半年内逐渐丰富自己的AI相关技术栈，集中于深度学、GAN等非监督学习技术、强化学习等方面。
Tensorflow作为google开源的一款优秀的深度学习框架，吸引了很多关注，也成为和FB的pytorch并驾齐驱的框架。因此有必要对Tensorflow进行深入的研究。本学习笔记主要参考了《Tensorflow：实战Google深度学习框架》这本书，其对于入门来说非常合适，因此本系列文章主要是此书的最重要内容的境界和个人的一些理解。
Tensorflow的安装和配置 【Tensorflow学习笔记（一）】如果是在个人PC上配置Tensorflow，推荐安装pycharm破解版+anaconda的python环境，很容易利用pycharm的包管理器添加Tensorflow的功能。如果在服务器上，可以参考网上的配置教程，这不是本文的重点，相关安装可参考点我配置。
Tensorflow的主要思想

张量和图
Tensorflow最重要的概念就是张量(tensor)，一个张量可以理解为一个多维数组，如果是标量则定义为0维。Tensorflow第一步需要做的事情就是定义一个图(graph)，图的顶点由张量构成，这些张量之间存在一些运算关系，这种运算关系则相当于顶点之间的边。当定义好图后，就相当于定义好了所有张量的一个计算流。
值得注意的是，定义好的张量其本质是一个指针，而不是具体数值，直接打印定义好的张量，会打印出张量的数据结构信息，考虑如下代码

import tensorflow as tf a = tf.constant([1,0, 2.0], name='a') print(a)

其输出结果并不是预想的[1.0, 2.0]，而是

>>Tensor("a:0", shape=(3,), dtype=float32)

会话
因此，Tensorflow的第三个重要的概念就是会话(session)，只有当图运行在会话上时，才能使得张量计算得到具体值。如下代码给出了一个最简单的使用tensorflow计算加法的demo。

import tensorflow as tf a = tf.constant([1.0, 2.0], name='a', dtype=tf.float32) b = tf.constant([3.0, 4.0], name='b', dtype=tf.float32) result = a + b with tf.Session() as sess: print(sess.run(result))

上面的代码with ... as 实际上是一个局部空间，能够保证tf.Session()在使用完后会话关闭，若tensor是某定义好的张量节点，则运行sess.run(tensor)可以得到其具体值。同时注意到tf.Session()会运行默认图，只有当传入tf.Session(graph=g1)才能保证该会话运行指定的图g1。可以按照如下方式定义图且运行指定图：

g1 = tf.Graph() with g1.as_default(): u = tf.Variable(initial_value=https://www.it610.com/article/tf.random_normal(/ shape=[200, 100], stddev=0.1), name='u') with tf.Session(graph=g1) as sess: tf.initialize_all_variables().run() with tf.variable_scope("", reuse=True): print(sess.run(tf.get_variable('u')))##打印图g1中张量u的计算结果

如果要定义默认会话，可以使用tf.InteractiveSession()，例如下面可以直接计算变量结果的代码

result = tf.constant(1,name='result') sess=tf.InteractiveSession() print(result.eval()) sess.close() --------------------- >>1

变量
变量是tensorflow重要的一种张量，在深度学习训练中，变量可用来存储可迭代更新的待优化的参数。常见的两种声明变量的方法包括

w1 = tf.Variable(initial_value=https://www.it610.com/article/tf.random(/ shape[1,1],stddev=0.1), name='weights') w2 =tf.get_variable(name = 'weights', shape=[1, 1],\ initializer=tf.random_normal_initializer(stddev=0.1))

采用这两种方式定义变量是等价的。tf.Variable需要传入初始值，而tf.get_variable则要求传入一个initializer的初始化器。在run session的之前init这些变量将会按照代码中的方式进行初始化。tf.Variable和tf.get_variable存在一些细微的差别：

tf.Variable在出现变量的name冲突时会系统自动处理加上后缀，而tf.get_variable则会系统报错，不允许重名。

import tensorflow as tf w_1 = tf.Variable(1,name="w_1") w_2 = tf.Variable(2,name="w_1") print(w_1.name) print(w_2.name) #输出 >>w_1:0 >>w_1_1:0 ----------------------------------- import tensorflow as tf w_1 = tf.get_variable(name="w_1",initializer=1) w_2 = tf.get_variable(name="w_1",initializer=2) #错误信息 #ValueError: Variable w_1 already exists, disallowed. Did #you mean to set reuse=True in VarScope?

当需要共享变量的时候，需要使用tf.get_variable()。为了方便变量管理，tensorflow有一个变量管理器，叫做tf.variable_scope，类似于C++中的命名空间，在不同命名空间内，tf.get_variable()可以重名。但tf.get_variable()在同一个命名空间内共享一个变量的时候需要声明reuse，否则报错，reuse代表是指向同一个变量。如下为例：

import tensorflow as tf with tf.variable_scope("scope1"): # scopename is scope1 w1 = tf.get_variable("w1", shape=[]) w2 = tf.Variable(0.0, name="w2") with tf.variable_scope("scope1", reuse=True): w1_p = tf.get_variable("w1", shape=[]) w2_p = tf.Variable(1.0, name="w2") with tf.variable_scope("scope2"): w1_pp = tf.get_variable("w1", shape=[]) print(w1 is w1_p, w2 is w2_p, w1 is w1_pp)# True False False

最后再简单说一下变量的初始化，如果已经声明了两个变量w1和w2，那么下面几种初始化方式等价：

sess.run(w1.initializer) sess.run(w2.initializer) ----------------------------- sess.run(tf.initialize_all_variables()) ----------------------------- tf.initialize_all_variables().run()

值得注意的是，在最新版本的python和tensorflow中，函数tf.initialize_all_variables被替换成了tf.global_variables_initializer，但前者还可以兼容使用，只不过会报警告。

赋值操作
tensorflow为张量节点提供了赋值运算的操作tf.assign，可以将一个变量赋给另外一个变量，但是要求变量的dtype是一样的，维数shape不同的时候还需要声明validate_shape=False。举例如下

## 尝试赋不同类型 w1 = tf.Variable(tf.random_normal([2,3], stddev=1), name='w1') w2 = tf.Variable(tf.random_normal([3,4], dtype=tf.float64, stddev=1), name='w2') w1.assign(w2) >>TypeError: Input 'value' of 'Assign' Op has type float64 that does not match type float32 of argument 'ref'. ## 赋不同shape w1 = tf.Variable(tf.zeros([2,2]), name='w1') w2 = tf.Variable(tf.zeros([2,3]), name='w2') tf.assign(w1, w2) >> ValueError: Dimension 1 in both shapes must be equal, but are 2 and 3. Shapes are [2,2] and [2,3]. for 'Assign' (op: 'Assign') with input shapes: [2,2], [2,3]. # 这一句可以成功执行 tf.assign(w1, w2, validate_shape=False)

占位张量(placeholder)
通常而言，定义一个计算图会使用常量tf.constant，但是频繁使用tf.constant会增大网络图的规模，带来急速的计算膨胀。因此，tensorflow里面引入了placeholder占位符用来在循环结构中不断接收常量。例如在前向传播算法中一次迭代用多个样本（一个batch）来计算输出值，迭代1000次，如果不用placeholder去接收batch，而改成声明常量的方法，必然导致计算图的规模急剧膨胀。因此做法是为每个batch声明一个固定的placeholder作为占位张量节点，然后每次迭代将具体的样本数据feed进去。placeholder必须要有固定的数据类型，但是维度可以是变化的，比如可设定shape=(none, 2)，则第一维可以变化。在sess.run()的时候需要将所有必须的placeholder都填上具体的数据，例如

x = tf.placeholder(shape=[1,2], dtype=tf.float32, name='x-input') y = tf.const([0.5,0.5]) result = x+y with tf.Session() as sess: print(sess.run(result, feed_dict={x:[0.5,0.5]})) >> [1.0, 1.0]

使用tensorflow编写最简单的神经网络熟悉了上一节介绍的tensorflow思想，我们就可以写一个最简单的神经网络的代码了

import tensorflow as tf #Numpy是一个科学计算的工具包，这里通过Numpy工具包生成模拟数据集 from numpy.random import RandomState #定义训练数据batch的大小 batch_size = 8 #定义神经网络的参数 w1 = tf.Variable(tf.random_normal([2, 3], stddev = 1, seed = 1)) w2 = tf.Variable(tf.random_normal([3, 1], stddev = 1, seed = 1)) ''' 在shape的一个维度上使用None可以方便的使用不同的batch大小。在训练时把数据分成比较小的batch，但是在测试时，可以一次性使用全部的数据。当数据集比较小时这样比较方便测试。但是数据集比较大时，将大量数据放入一个batch可能造成内存溢出。 ''' x = tf.placeholder(tf.float32, shape=(None, 2), name='x-input') y_ = tf.placeholder(tf.float32, shape=(None, 1), name='y-input') #定义神经网络的前向传播结构 a = tf.matmul(x, w1) y = tf.matmul(a, w2) #定义损失函数和反向传播算法 y = tf.sigmoid(y) cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)) + (1-y)*tf.log(tf.clip_by_value(1-y, 1e-10, 1.0))) train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy) #通过随机数生成一个模拟数据集 rdm = RandomState(1) dataset_size = 128 X = rdm.rand(dataset_size, 2) ''' 定义规则来给出样本的标签。在这里所有x1+x2<1的样例都被认为是正样本（比如零件合格），而其他为负样本（比如零件不合格）。和tensorflow游乐场中的表示法不太一样的地方是，在这里使用0表示负样本，1表示正样本。大部分解决分类问题的神经网络都会采用0和1表示法。 ''' Y = [[int(x1+x2<1)] for (x1, x2) in X] #创建一个会话来运行tensorflow程序 with tf.Session() as sess: init_op = tf.global_variables_initializer() #初始化变量 sess.run(init_op) #打印训练前神经网络参数的值 print(sess.run(w1)) print(sess.run(w2)) #设定训练的轮数 STEPS = 5000 for i in range(STEPS): #每次选取batch个样本进行训练 start = (i * batch_size) % dataset_size end = min(start+batch_size, dataset_size) #通过选取的样本训练神经网络并更新参数 sess.run(train_step, feed_dict={x:X[start:end], y_:Y[start:end]}) if i % 1000 == 0: #每隔一段时间计算所在数据上的交叉熵并输出 total_cross_entropy = sess.run(cross_entropy, feed_dict={x:X, y_:Y}) print("After %d training steps(s), cross entropy on all data is %g" % (i, total_cross_entropy)) print(sess.run(w1)) print(sess.run(w2))

可以总结撰写神经网络的大致步骤：