让TensorFlow在Macbook M1上性能翻倍 _Tensorflow

业无高卑志当坚，男儿有求安得闲？这篇文章主要讲述让TensorFlow在Macbook M1上性能翻倍相关的知识，希望能为你提供帮助。
手头有台MacBook M1笔记本，大部分应用都不兼容，VMware Fusion不支持Linux虚拟机。Parallel据说支持arm版的Windows和Linux，但是好像也不好用。唯一还有点用的地方就是做机器学习，目前tensorflow2.5原生支持M1，性能相比于2.4有较大提升，但是必须得用MacOS 12，还处于beta阶段。本文记录了在M1上配置tensorflow环境的过程，并且做了一些简单测试，从测试结果来看，性能提升还是比较明显的。
升级MacOS 12目前苹果为适配M1开发的tensorflow版本已经不用了，tensorflow2.5原生支持M1，所以第一步是升级MacOS12，可以参考下面的教程。
https://zhuanlan.zhihu.com/p/378946858
配置Conda环境因为Anaconda还不支持m1处理器，自带的python也是3.8的，不能原生支持arm处理器，所以需要使用开源的miniforge代替，它带了python3.9。
以下摘自miniforge的github的主页。
Miniforge3Latest installers with Python 3.9 (*) in the base environment:

OS	Architecture	Download
Linux	x86_64 (amd64)	Miniforge3-Linux-x86_64
Linux	aarch64 (arm64) `(**)`	Miniforge3-Linux-aarch64
Linux	ppc64le (POWER8/9)	Miniforge3-Linux-ppc64le
OS X	x86_64	Miniforge3-MacOSX-x86_64
OS X	arm64 (Apple Silicon) `(***)`	Miniforge3-MacOSX-arm64
Windows	x86_64	Miniforge3-Windows-x86_64

(*) The Python version is specific only to the base environment. Conda can create new environments with different Python versions and implementations.
(**) While the Raspberry PI includes a 64 bit processor, the RasbianOS is built on a 32 bit kernel and is not a supported configuration for these installers. We recommend using a 64 bit linux distribution such as Ubuntu for Raspberry PI.
(***) Apple silicon builds are experimental and haven\'t had testing like the other platforms.
虽然conda对m1对支持还处于experimental阶段，但是python3.9是原生支持m1处理器的，我们只是用conda管理python的包。
在安装过程中，可能是因为之前安装了anaconda，遇到了conda被zsh kill的问题，试了好多方法，包括装了完整的xcode，都没解决问题，后来换了个安装路径解决了。理论上不需要安装xcode，直接安装miniforge就行。
https://github.com/conda-forge/miniforge/issues/190
安装很简单，只要下载了安装程序，直接执行即可。

./Miniforge3-MacOSX-arm64.sh

一路yes或者默认即可，安完之后重启终端，看看conda和python能否运行，我的运行结果是python3.9.6。

(base)~ % python Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:35:11) [Clang 11.1.0 ] on darwin Type "help", "copyright", "credits" or "license" for more information. > > >

修改成国内仓库，打开或者创建~/.condarc，然后添加如下内容：

channels: - https://mirrors.ustc.edu.cn/anaconda/pkgs/main/ - https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge/ - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ - defaults show_channel_urls: true

安装一个包看是否用了国内源，可以看到，已经用了国内源

(base) niuxinli@niuxinlideMacBook-Pro ~ % conda install pandas Collecting package metadata (current_repodata.json): done Solving environment: done## Package Plan ##environment location: /Users/niuxinli/miniforge3added / updated specs: - pandasThe following packages will be downloaded:package|build ---------------------------|----------------- bottleneck-1.3.2|py39heec5a64_196 KBhttps://mirrors.ustc.edu.cn/anaconda/pkgs/main ca-certificates-2021.7.5|hca03da5_1113 KB

安装PyCharmPyCharm支持M1处理器，下载PyCharm社区版即可。

文章图片

给pycharm创建一个环境

文章图片

安装TensorFlow安装依赖

conda activate pycharm conda install -c apple tensorflow-deps

用pip安装tensorflow
pip默认源太慢，临时用阿里的源

python -m pip install tensorflow-macos -i https://mirrors.aliyun.com/pypi/simple/

安装metal plugin

python -m pip install tensorflow-metal -i https://mirrors.aliyun.com/pypi/simple/

安装一些其他依赖

brew install libjpeg pip install tensorflow-datasets -i https://mirrors.aliyun.com/pypi/simple/ conda install -y pandas matplotlib scikit-learn jupyterlab

安装完后，import numpy报错，

Original error was: dlopen(/Users/niuxinli/miniforge3/envs/pycharm/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib

查了一下，随便用安装opencv看看能解决吗，确实把import的报错解决了，不过有个错误，说tensorflow2.5与numpy1.21.2不兼容，先不管。

pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/

以下为安装时的报错

ERROR: pip\'s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow-macos 2.5.0 requires numpy~=1.19.2, but you have numpy 1.21.2 which is incompatible.

从下面的运行来看这个报错没有影响tensorflow正常工作。
测试TensorFlow为了对比m1下tensorflow的性能，我在网上找了一个博主写的对比结果和代码，链接如下：
https://zhuanlan.zhihu.com/p/350955566
他还是在mac os 11下安装的，理论上性能不如上面的安装方法。代码我稍微调整了一下兼容性相关的东西，其他的都不变。

import tensorflow as tf import tensorflow_datasets as tfds import time from datetime import timedelta from tensorflow.python.framework.ops import disable_eager_execution disable_eager_execution()(ds_train, ds_test), ds_info = tfds.load( \'mnist\', split=[\'train\', \'test\'], shuffle_files=True, as_supervised=True, with_info=True, )def normalize_img(image, label): return tf.cast(image, tf.float32) / 255., labelbatch_size = 128ds_train = ds_train.map( normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE) ds_train = ds_train.cache() ds_train = ds_train.shuffle(ds_info.splits[\'train\'].num_examples) ds_train = ds_train.batch(batch_size) ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)ds_test = ds_test.map( normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE) ds_test = ds_test.batch(batch_size) ds_test = ds_test.cache() ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation=\'relu\'), tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation=\'relu\'), tf.keras.layers.MaxPooling2D(pool_size=(2, 2)), tf.keras.layers.Flatten(), tf.keras.layers.Dense(128, activation=\'relu\'), tf.keras.layers.Dense(10, activation=\'softmax\') ])model.compile( loss=\'sparse_categorical_crossentropy\', optimizer=tf.keras.optimizers.Adam(0.001), metrics=[\'accuracy\'], )start = time.time()model.fit( ds_train, epochs=10, # validation_steps=1, # steps_per_epoch=469, # validation_data=https://www.songbingjia.com/android/ds_test # 此处如果按原脚本添加这行，脚本无法运行，暂时未有解决方法 )delta = (time.time() - start) elapsed = str(timedelta(seconds=delta)) print(/'Elapsed Time: {}\'.format(elapsed))

运行的时候可以看到，GPU使用率接近100%