TensorFlow踩坑实录:从张量到神经网络的实战经验

TensorFlow踩坑实录:从张量到神经网络的实战经验

折腾TensorFlow的时候踩了不少坑,记录一下核心概念和代码实现,主要是张量操作、Session管理和神经网络搭建这几个部分。

张量到底是什么

TensorFlow用数据流图做计算,节点是数学操作,边是张量(多维数组)。名字就是这么来的——张量在节点间流动。

维度 实际对应 例子
0阶 单个数字 1
1阶 数组 [1, 2, 3]
2阶 矩阵 [[1,2,3],[4,5,6]]
3阶+ 高维数组 图像数据等
1
2
3
4
5
6
7
import tensorflow.compat.v1 as tf
import numpy as np

# 创建不同维度的张量
scalar = tf.constant(1)
vector = tf.constant([1, 2, 3])
matrix = tf.constant([[1, 2, 3], [4, 5, 6]])

线性回归示例

用TensorFlow实现个简单的线性回归,拟合 y = 0.1x + 0.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import tensorflow.compat.v1 as tf
import numpy as np

tf.disable_eager_execution()

# 造点训练数据
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# 定义模型参数
Weights = tf.Variable(tf.random.uniform([1], -1.0, 1.0))
biases = tf.Variable(tf.zeros([1]))

# 预测模型
y = Weights * x_data + biases

# 损失函数
loss = tf.reduce_mean(tf.square(y - y_data))

# 优化器
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# 初始化并训练
init = tf.global_variables_initializer()

with tf.Session() as sess:
sess.run(init)
for step in range(201):
sess.run(train)
if step % 20 == 0:
print(f"Step {step}: Weights={sess.run(Weights)}, biases={sess.run(biases)}")

Session的两种写法

执行计算图必须用Session,有两种写法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import tensorflow as tf

matrix1 = tf.constant([[3, 3]])
matrix2 = tf.constant([[2], [2]])
product = tf.matmul(matrix1, matrix2)

# 写法1:手动关闭
sess = tf.Session()
result = sess.run(product)
print(result)
sess.close()

# 写法2:上下文管理器(推荐)
with tf.Session() as sess:
result2 = sess.run(product)
print(result2)

Variable变量

Variable用来存模型参数,训练时会自动更新:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import tensorflow.compat.v1 as tf

tf.compat.v1.disable_eager_execution()

state = tf.Variable(0, name='counter')
one = tf.constant(1)
new_value = tf.add(state, one)
update = tf.assign(state, new_value)

init = tf.global_variables_initializer()

with tf.Session() as sess:
sess.run(init)
for _ in range(3):
sess.run(update)
print(sess.run(state)) # 输出 1, 2, 3

Placeholder占位符

Placeholder用于从外部传入数据:

1
2
3
4
5
6
7
8
9
10
11
import tensorflow.compat.v1 as tf

tf.compat.v1.disable_eager_execution()

input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)
output = tf.multiply(input1, input2)

with tf.Session() as sess:
result = sess.run(output, feed_dict={input1: [7.], input2: [2.]})
print(result) # [14.]

神经网络搭建

自己封装个add_layer函数,搭个简单的神经网络:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
import tensorflow.compat.v1 as tf
import numpy as np
import matplotlib.pyplot as plt

tf.compat.v1.disable_eager_execution()

def add_layer(inputs, in_size, out_size, activation_function=None):
Weights = tf.Variable(tf.random_normal([in_size, out_size]))
biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
Wx_plus_b = tf.matmul(inputs, Weights) + biases

if activation_function is None:
outputs = Wx_plus_b
else:
outputs = activation_function(Wx_plus_b)
return outputs

# 生成带噪声的训练数据
x_data = np.linspace(-1, 1, 300, dtype=np.float32)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape).astype(np.float32)
y_data = np.square(x_data) - 0.5 + noise

# 占位符
xs = tf.placeholder(tf.float32, [None, 1])
ys = tf.placeholder(tf.float32, [None, 1])

# 网络结构:输入1个 -> 隐藏层10个 -> 输出1个
l1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu)
prediction = add_layer(l1, 10, 1, activation_function=None)

# 损失和优化器
loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# 训练
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# 可视化
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(x_data, y_data)
plt.ion()
plt.show()

for i in range(1000):
sess.run(train_step, feed_dict={xs: x_data, ys: y_data})
if i % 50 == 0:
try:
ax.lines.remove(ax.lines[0])
except:
pass
prediction_value = sess.run(prediction, feed_dict={xs: x_data})
ax.plot(x_data, prediction_value, 'r-', lw=5)
plt.pause(0.1)

踩过的坑

libiomp5md.dll错误

用PyTorch或TensorFlow时可能遇到:

1
Initializing libiomp5md.dll, but found libiomp5md.dll already initialized

临时解决(代码里加):

1
2
3
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
# 注意:这行必须在import torch之前

彻底解决
保留 site-packages\torch\lib 下的 libiomp5md.dll,删掉其他路径的同名文件。

版本对应关系

TensorFlow Python CUDA
2.x 3.6-3.9 11.2+
1.15 3.6-3.7 10.0
1.14 3.5-3.7 10.0

环境配置

1
2
3
4
5
6
7
8
9
# Anaconda创建环境
conda create -n tensorflow_env python=3.8
conda activate tensorflow_env

# 安装TensorFlow
pip install tensorflow -i https://pypi.tuna.tsinghua.edu.cn/simple

# 装可视化工具
pip install tensorboard matplotlib numpy pandas

一点优化建议

  1. GPU加速要装对应的CUDA驱动
  2. 数据预处理用tf.dataAPI效率更高
  3. 记得定期保存检查点,防止训练中断白跑
  4. batch size根据显存调整,别设太大

核心就这些:张量、Session、Variable、Placeholder,再加上神经网络搭建。代码都是实际跑过的,有问题的部分已经标出来了。

参考