软件工具|macOS安装Scrapy，不要踩坑了 scrapy|python|爬虫|Python

文章目录

- 安装Scrapy
- - 1.使用Anaconda或Miniconda
  - 2.推荐使用虚拟环境安装
  - 3.特定平台安装说明
  - - 3.1 Windows
    - 3.2 macOS
  - 4.测试你的第一个Scrapy项目
- 【不推荐】踩坑：安装Scrapy出现报错：MemoryError: Cannot allocate write+execute memory for ffi.callback().
- - 【不可行】方法1：删除pyopenssl库，安装openssl。
  - 【不可行】方法2：更新requests库。
  - 【慎重】方法3：将python版本升级为3.10.4，然后重新安装scrapy。

安装Scrapy 根据Scrapy官方安装指导：https://docs.scrapy.org/en/latest/intro/install.html#，来进行安装。
Scrapy要求Python版本：Python 3.6+
Scrapy 是用纯 Python 编写的，并且依赖于一些关键的 Python 包（以及其他包）：

lxml，一个高效的 XML 和 HTML 解析器
parsel，一个写在 lxml 之上的 HTML/XML 数据提取库，
w3lib，一个用于处理 URL 和网页编码的多用途助手
twisted，一个异步网络框架
cryptography和pyOpenSSL，处理各种网络级别的安全需求

可测试的Scrapy最小版本要求安装：

twisted 14.0
lxml 3.4
pyOpenSSL 0.14

1.使用Anaconda或Miniconda
从conda-forge安装软件包，运行：conda install -c conda-forge scrapy，可以避免多数安装问题。
2.推荐使用虚拟环境安装
建议在所有平台上的虚拟环境中安装 Scrapy。虚拟环境允许您不与已经安装的 Python 系统包发生冲突。
有关如何创建虚拟环境的信息，请参阅虚拟环境和包。
Python 包可以全局安装（也就是系统范围），也可以在用户空间安装。不建议在系统范围内安装 Scrapy。
3.特定平台安装说明
3.1 Windows 如果已经安装了Anaconda或Miniconda，安装 Scrapy ：conda install -c conda-forge scrapy。
如果未安装Anaconda或Miniconda，直接在Windows上使用pip安装Scrapy：
首先需要“Microsoft Visual C++”来安装一些 Scrapy 依赖项：

下载并执行Microsoft C++ Build Tools以安装 Visual Studio 安装程序。
运行 Visual Studio 安装程序。
在 Workloads 部分下，选择C++ build tools。
检查安装详细信息并确保选择以下软件包作为可选组件：
- MSVC （例如 MSVC v142 - VS 2019 C++ x64/x86 构建工具 (v14.23) ）
- Windows SDK （例如 Windows 10 SDK (10.0.18362.0)）
安装 Visual Studio 生成工具。

然后，就可以使用pip install Scrapy。
3.2 macOS 构建 Scrapy 的依赖项需要 C 编译器和开发头文件的存在。在 macOS 上，这通常由 Apple 的 Xcode 开发工具提供。要安装 Xcode 命令行工具，请打开终端窗口并运行：

xcode-select --install

利用conda创建一个虚拟环境py310：

conda create -n py310 python=3.10 conda info -e conda activate py310

然后安装 Scrapy：

pip install Scrapy

注意一定使用pip install，我用conda install就出现了后面的踩坑了。踩坑的操作方法都是CSDN上常见的做法，这里不推荐大家按照去做，引以为戒。

文章图片
---------------------------------- 分割线 ----------------------------------------
这里根据官方的方法安装，基本没有出现问题了，我再重复一遍安装流程吧。
本机环境：MacOS 12.0+

安装 Xcode 命令行工具，打开terminal运行：xcode-select --install。
在你的miniconda或者Anaconda下，直接新建一个虚拟环境，如上的py310。
在这个虚拟环境下，使用pip安装Scrapy：pip install Scrapy。注意，一定是pip，pip， pip。

4.测试你的第一个Scrapy项目
爬虫程序：

import scrapyclass QuotesSpider(scrapy.Spider): name = 'quotes' start_urls = [ 'https://quotes.toscrape.com/tag/humor/', ]def parse(self, response): for quote in response.css('div.quote'): yield { 'author': quote.xpath('span/small/text()').get(), 'text': quote.css('span.text::text').get(), }next_page = response.css('li.next a::attr("href")').get() if next_page is not None: yield response.follow(next_page, self.parse)

在termial命令行输入:

scrapy runspider quotes_spider.py -o quotes.jl

输出结果：

最后运行成功，会生成一个quotes.jl文件，里面的内容应如下所示：

文章图片

【不推荐】踩坑：安装Scrapy出现报错：MemoryError: Cannot allocate write+execute memory for ffi.callback(). 本机环境：MacOS 12.0+，Python3.8，Scrapy 2.6.1
以下是运行一个Scrapy的简单例子，在命令行输入scrapy runspider quotes_spider.py -o quotes.jl

import scrapyclass QuotesSpider(scrapy.Spider): name = 'quotes' start_urls = [ 'https://quotes.toscrape.com/tag/humor/', ]def parse(self, response): for quote in response.css('div.quote'): yield { 'author': quote.xpath('span/small/text()').get(), 'text': quote.css('span.text::text').get(), }next_page = response.css('li.next a::attr("href")').get() if next_page is not None: yield response.follow(next_page, self.parse)

最后运行成功，会生成一个quotes.jl文件，里面的内容应如下所示：

{"author": "Jane Austen", "text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d"} {"author": "Steve Martin", "text": "\u201cA day without sunshine is like, you know, night.\u201d"} {"author": "Garrison Keillor", "text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d"} ...

但是最后出现了报错：

MemoryError: Cannot allocate write+execute memory for ffi.callback(). You might be running on a system that prevents this. For more information, see https://cffi.readthedocs.io/en/latest/using.html#callbacks

2022-03-28 15:57:37 [scrapy.core.engine] INFO: Closing spider (finished)
【不可行】方法1：删除pyopenssl库，安装openssl。
问题伴随SSL认证失败，经查资料：根据https://github.com/pyca/pyopenssl/issues/873，初步确定是pyopenssl，openssl有写和执行权限，而pyopenssl则没有。但是Scrapy依赖pyopenssl。
查看所在环境下是否存在pyopenssl，通过 pip show pyopenssl发现是存在的。如果执行删除conda uninstall pyopenssl，安装openssl，结果不可行。
【不可行】方法2：更新requests库。
根据Stack上关于Scrapy在M1 Mac上的回复，尝试更新requests，pip3 install --upgrade requests。先更新conda，conda update conda，再更新requests。我的requests是2.27.1已经是当前最新的。最后还是不可行。
【慎重】方法3：将python版本升级为3.10.4，然后重新安装scrapy。
创建了一个Python=3.10的虚拟环境py310，重新安装scrapy，运行scrapy runspider quotes_spider.py -o quotes.jl后，出现了这个问题：Library not loaded: @rpath/libssl.1.1.dylib.

ImportError: dlopen(/Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so, 0x0002): Library not loaded: @rpath/libssl.1.1.dylib

Referenced from: /Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so

Reason: tried: '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/../../../../../libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/lib/python3.10/site-packages/cryptography/hazmat/bindings/../../../../../libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/bin/../lib/libssl.1.1.dylib' (no such file), '/Users/dan/miniforge3/envs/py310/bin/../lib/libssl.1.1.dylib' (no such file), '/usr/local/lib/libssl.1.1.dylib' (no such file), '/usr/lib/libssl.1.1.dylib' (no such file)

参考 Library not loaded: libcrypto.1.0.0.dylib issue in mac可以看到：
错误原因：

Homebrew默认安装的openssl为1.0版本；最新的scrapy要求使用1.1版本
动态链接库路径有问题

解决方法：
Step 1. 使用brew安装openssl.
检查是否安装openssl：brew info openssl，发现并没有安装。使用brew安装brew install openssl。

文章图片

如果检查发现已安装了openssl 1.0，就更新到openssl 1.1，使用brew reinstall openssl@1.1，最后检查下openssl版本：brew info brew，记住你的openssl动态库文件libssl.1.1.dylib所在的地址：/opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib/libssl.1.1.dylib。

Step 2. 复制动态库文件libssl.1.1.dylib到路径@rpath中。
首先，定位到你的动态库文件所在地址：cd /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib;

文章图片

然后，复制动态库文件libssl.1.1.dylib和libcrypto.1.1.dylib到路径@rpath中，如最开始报错提示的一些路径中，如’/Users/dan/miniforge3/envs/py310/lib/libssl.1.1.dylib’ (no such file)，‘/usr/lib/libssl.1.1.dylib’(no such file)。
这里，将动态库文件添加到/Users/dan/miniforge3/envs/py310/lib/中：

sudo cp /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib/libssl.1.1.dylib /Users/dan/miniforge3/envs/py310/lib/ sudo cp /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib/libcrypto.1.1.dylib /Users/dan/miniforge3/envs/py310/lib/

需要注意的是，如果是复制到/usr/lib/下，见如下命令，可能会出现Operation not permitted的权限问题情况：
这是因为电脑启用了SIP（System Integrity Protection），加入了Rootless机制，导致即使在root权限下依然无法修改文件。主要是因为Rootless机制是对抗恶意程序的最后防线。
解决方法1：如果即使在 sudo 之后您也收到权限被拒绝错误。尝试手动复制到 /usr/lib 。

打开Finder，使用command+shift+G，在弹出的目录中填写/opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib，进入该目录，找到文件libssl.1.1.dylib进行复制：command + C；
使用command+shift+G，在弹出的目录中填写/usr/lib，进入该目录，粘贴文件libssl.1.1.dylib：command + V;
同上，复制libcrypto.1.1.dylib 到/usr/lib。

解决办法2：在必要时候为了能够修改文件，只能关闭该保护机制。【不推荐】
1）重启，过程中按住 command+R，进入恢复分区. 然后找到 Terminal启动运行.
2）打开Terminal终端，输入csrutil disable
3）再次重启，即可对 usr/lib 目录下文件进行修改。
4）恢复保护机制，重新进入保护模式，输入 csrutil enable。

sudo cp /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib/libssl.1.1.dylib /usr/lib/ sudo cp /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib/libcrypto.1.1.dylib /usr/lib/

或者

ln -s /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib/libssl.1.1.dylib /usr/lib/ ln -s /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib/libcrypto.1.1.dylib /usr/lib/

总结所以的操作：

brew reinstall openssl@1.1#下载1.1版本 cd /opt/homebrew/Cellar# 地址根据你的库文件定位 # 如果没有安装openssl可忽略后两步 mv openssl openssl@1.0 #重命名之前的版本 mv openssl@1.1 openssl # 使用1.1 # 定位到openssl库所在 cd /opt/homebrew/Cellar/openssl@1.1/1.1.1k/lib sudo cp libssl.1.1.dylib libcrypto.1.1.dylib /usr/lib/ # 接下来三步似乎不是必须的 sudo rm libssl.dylib libcrypto.dylib sudo ln -s libssl.1.1.dylib libssl.dylib sudo ln -s libcrypto.1.1.dylib libcrypto.dylib

【软件工具|macOS安装Scrapy，不要踩坑了】欢迎各位关注我的个人公众号：HsuDan，我将分享更多自己的学习心得、避坑总结、面试经验、AI最新技术资讯。