业无高卑志当坚,男儿有求安得闲?这篇文章主要讲述让TensorFlow在Macbook M1上性能翻倍相关的知识,希望能为你提供帮助。
手头有台MacBook M1笔记本,大部分应用都不兼容,VMware Fusion不支持Linux虚拟机。Parallel据说支持arm版的Windows和Linux,但是好像也不好用。唯一还有点用的地方就是做机器学习,目前tensorflow2.5原生支持M1,性能相比于2.4有较大提升,但是必须得用MacOS 12,还处于beta阶段。本文记录了在M1上配置tensorflow环境的过程,并且做了一些简单测试,从测试结果来看,性能提升还是比较明显的。
升级MacOS 12目前苹果为适配M1开发的tensorflow版本已经不用了,tensorflow2.5原生支持M1,所以第一步是升级MacOS12,可以参考下面的教程。
https://zhuanlan.zhihu.com/p/378946858
配置Conda环境因为Anaconda还不支持m1处理器,自带的python也是3.8的,不能原生支持arm处理器,所以需要使用开源的miniforge代替,它带了python3.9。
以下摘自miniforge的github的主页。
Miniforge3Latest installers with Python 3.9 (*)
in the base environment:
OS | Architecture | Download |
---|---|---|
Linux | x86_64 (amd64) | Miniforge3-Linux-x86_64 |
Linux | aarch64 (arm64) (**) |
Miniforge3-Linux-aarch64 |
Linux | ppc64le (POWER8/9) | Miniforge3-Linux-ppc64le |
OS X | x86_64 | Miniforge3-MacOSX-x86_64 |
OS X | arm64 (Apple Silicon) (***) |
Miniforge3-MacOSX-arm64 |
Windows | x86_64 | Miniforge3-Windows-x86_64 |
(*)
The Python version is specific only to the base environment. Conda can create new environments with different Python versions and implementations.(**)
While the Raspberry PI includes a 64 bit processor, the RasbianOS is built on a 32 bit kernel and is not a supported configuration for these installers. We recommend using a 64 bit linux distribution such as Ubuntu for Raspberry PI.(***)
Apple silicon builds are experimental and haven\'t had testing like the other platforms.虽然conda对m1对支持还处于experimental阶段,但是python3.9是原生支持m1处理器的,我们只是用conda管理python的包。
在安装过程中,可能是因为之前安装了anaconda,遇到了conda被zsh kill的问题,试了好多方法,包括装了完整的xcode,都没解决问题,后来换了个安装路径解决了。理论上不需要安装xcode,直接安装miniforge就行。
https://github.com/conda-forge/miniforge/issues/190
安装很简单,只要下载了安装程序,直接执行即可。
./Miniforge3-MacOSX-arm64.sh
一路yes或者默认即可,安完之后重启终端,看看conda和python能否运行,我的运行结果是python3.9.6。
(base)~ % python
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:35:11)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>
>
>
修改成国内仓库,打开或者创建~/.condarc,然后添加如下内容:
channels:
- https://mirrors.ustc.edu.cn/anaconda/pkgs/main/
- https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge/
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
- defaults
show_channel_urls: true
安装一个包看是否用了国内源,可以看到,已经用了国内源
(base) niuxinli@niuxinlideMacBook-Pro ~ % conda install pandas
Collecting package metadata (current_repodata.json): done
Solving environment: done## Package Plan ##environment location: /Users/niuxinli/miniforge3added / updated specs:
- pandasThe following packages will be downloaded:package|build
---------------------------|-----------------
bottleneck-1.3.2|py39heec5a64_196 KBhttps://mirrors.ustc.edu.cn/anaconda/pkgs/main
ca-certificates-2021.7.5|hca03da5_1113 KB
安装PyCharmPyCharm支持M1处理器,下载PyCharm社区版即可。
文章图片
给pycharm创建一个环境
文章图片
安装TensorFlow安装依赖
conda activate pycharm
conda install -c apple tensorflow-deps
用pip安装tensorflow
pip默认源太慢,临时用阿里的源
python -m pip install tensorflow-macos -i https://mirrors.aliyun.com/pypi/simple/
安装metal plugin
python -m pip install tensorflow-metal -i https://mirrors.aliyun.com/pypi/simple/
安装一些其他依赖
brew install libjpeg
pip install tensorflow-datasets -i https://mirrors.aliyun.com/pypi/simple/
conda install -y pandas matplotlib scikit-learn jupyterlab
安装完后,import numpy报错,
Original error was: dlopen(/Users/niuxinli/miniforge3/envs/pycharm/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib
查了一下,随便用安装opencv看看能解决吗,确实把import的报错解决了,不过有个错误,说tensorflow2.5与numpy1.21.2不兼容,先不管。
pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/
以下为安装时的报错
ERROR: pip\'s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-macos 2.5.0 requires numpy~=1.19.2, but you have numpy 1.21.2 which is incompatible.
从下面的运行来看这个报错没有影响tensorflow正常工作。
测试TensorFlow为了对比m1下tensorflow的性能,我在网上找了一个博主写的对比结果和代码,链接如下:
https://zhuanlan.zhihu.com/p/350955566
他还是在mac os 11下安装的,理论上性能不如上面的安装方法。代码我稍微调整了一下兼容性相关的东西,其他的都不变。
import tensorflow as tf
import tensorflow_datasets as tfds
import time
from datetime import timedelta
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()(ds_train, ds_test), ds_info = tfds.load(
\'mnist\',
split=[\'train\', \'test\'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)def normalize_img(image, label):
return tf.cast(image, tf.float32) / 255., labelbatch_size = 128ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits[\'train\'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation=\'relu\'),
tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation=\'relu\'),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=\'relu\'),
tf.keras.layers.Dense(10, activation=\'softmax\')
])model.compile(
loss=\'sparse_categorical_crossentropy\',
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=[\'accuracy\'],
)start = time.time()model.fit(
ds_train,
epochs=10,
# validation_steps=1,
# steps_per_epoch=469,
# validation_data=https://www.songbingjia.com/android/ds_test # 此处如果按原脚本添加这行,脚本无法运行,暂时未有解决方法
)delta = (time.time() - start)
elapsed = str(timedelta(seconds=delta))
print(/'Elapsed Time: {}\'.format(elapsed))
运行的时候可以看到,GPU使用率接近100%
文章图片
运行时间几乎稳定在1分32秒,比博主3分20秒的成绩提高了一半,接近Colab GPU。
文章图片
【让TensorFlow在Macbook M1上性能翻倍】因此,在m1上安装macos 12以及tensorflow 2.5, 性能比之前接近翻倍。
推荐阅读
- 如何配置一个高效漂亮爱不释手的终端()
- 网络I/O模型
- Linux内存指标
- linux之mktemp命令
- 性能分析之C++ core dump分析
- Mysql集群高可用架构(MHA)
- Powershell 访问Exchange EWS API
- 浅谈系列之跨站脚本工了个鸡(XSS)
- k8s实践liveness与readiness 2种探针使用