python爬虫|高清！手把手教你Python爬取LOL英雄皮肤套图 json|爬虫|ajax|python

作者: 锋小刀
微信搜索【Python与Excel之交】关注我的公众号查看更多内容

文章图片

目标URL：

https://lol.qq.com/data/info-heros.shtml

里面是LOL所有英雄的头像和名称，本次的爬取任务是该网页中所有英雄的皮肤图片：

文章图片

分析网页点击任一英雄头像，进去该英雄的详情页，里面存放着该英雄的信息和皮肤图片，所以要获取该英雄皮肤就需要从前面的url中进入该详情页：

文章图片
我们点击鼠标右键查看网页源代码，发现网页不存在我们需要的内容，可以肯定该网页是动态加载的：

文章图片

进入浏览器的开发者工具抓包，这里我们成功的抓取到存放英雄皮肤图片的url：

文章图片

对比几条url，发现url后面的数字会发生变动，是不规则的，是每条url特有的id值，所以我们需要从网页中获取：

https://game.gtimg.cn/images/lol/act/img/js/hero/1.js https://game.gtimg.cn/images/lol/act/img/js/hero/2.js https://game.gtimg.cn/images/lol/act/img/js/hero/11.js

进入主页面，该网页依然是动态加载的，所以我们需要进行抓包：
【python爬虫|高清！手把手教你Python爬取LOL英雄皮肤套图】

文章图片

得到以下url:

https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js

里面的heroId就是我们需要的id值了:

文章图片

爬取思路：

从主页面抓包获取真实url，从该url中获取每个英雄详情页url的id值；
利用id值拼接成每个英雄详情页的url；
从拼接成的url中获取英雄名称、英雄皮肤名称、英雄皮肤图片的url。

实战代码获取详情页url的id值函数，这里网页是json数据格式，所以用.json()进行解析，然后要yield进行返回：

def name_data(): url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js' res = get_response(url).json() hero = res['hero'] for i in hero: heroId = i['heroId'] yield heroId

图片提取函数，这里的网页依然是用.json()进行提取，这里用os模块进行创建套图文件夹以及用英雄的名称进行命名文件夹，用英雄皮肤的名称+.jpg进行图片的保存：

def main_Img(html_url): res_ = get_response(html_url).json() skins = res_['skins'] for e in skins: mainImg = e['mainImg'] name = e['name'] heroTitle = e['heroTitle'] print(heroTitle) if not os.path.exists(f'./image/{heroTitle}/'): os.mkdir(f'./image/{heroTitle}/') file_name = f'image/{heroTitle}/' + name + '.jpg' save(mainImg, file_name)

创建并发任务：

executor = concurrent.futures.ThreadPoolExecutor(max_workers=10) name_ = name_data() for o in name_: url_data = https://www.it610.com/article/f'https://game.gtimg.cn/images/lol/act/img/js/hero/{o}.js' executor.submit(main_Img, url_data) executor.shutdown()

效果展示：