scrapy|scrapy post请求payload参数

由于scrapy目前不支持payload格式的formdata请求,所以如果直接写scrapy.FormRequest()会出现401或400错误,看一个例子:

# payload参数 payload_query = {"query": { "allOf": [{"allOf": [{"anyOf": [{"is": {"name": "xxxName", "value": "xxxvalue"}}]}]}]}, "page": '1', "pageSize": '100', "sorting": {"column": "xxxxxx", "order": "desc"}} url = 'https://www.xxxx.com/' # 请求头 headers = { 'authority': 'api.xxx.xxxx.com', 'method': 'POST', 'path': '/xxxx/xxx/xxx', 'scheme': 'https', 'accept': '*/*', # 不必要的参数注释掉 # 'accept-encoding': 'gzip, deflate, br', # 'accept-language': 'zh-CN,zh; q=0.9', # 'Connection': 'keep - alive', # 'content-length': '199',# 正常情况下务必注释 'content-type': 'application/json', 'origin': 'https://xxx.xxxx.com', 'referer': 'https://xxxxxxxx.com', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36', }

如果将payload参数直接当做formdata访问,会出现401等错误
yield scrapy.FormRequest(url=url, formdata=https://www.it610.com/article/payload_query, headers=headers)

结果如下
scrapy|scrapy post请求payload参数
文章图片
image.png 正确的做法:
yield scrapy.Request(url=url, body=json.dumps(payload_query), method='POST', headers=headers)

运行结果
scrapy|scrapy post请求payload参数
文章图片
image.png 这里有一个注意点,headers构造的时候不必要的参数一定注释掉

    推荐阅读