爬取猫眼电影排行榜-Python程序-效果演示

【爬取猫眼电影排行榜-Python程序-效果演示】代码效果演示
Gitee源码

# -*- coding: utf-8 -*- # Version: Python 3.9.7 # Author: TRIX # Date: 2021-10-04 10:58:56 # Use: 利用re 爬取猫眼电影TOP电影信息 包括 排名 封面 电影名 主演 上映时间 评分 并储存到 txt json jpg import requests,re,json #个人headers获取方法 f12-网络-f5-名称-任意一个条目-标头-请求标头-复制所有 根据网页进行部分修改 再用 headersStrToDict.py(作者的Gitee上有该文件 https://gitee.com/trix_repository/python_primary_programs/tree/master/grab_from_webs/headersStrToDict) 将复制的字符串转换成字典字符串形式 代码演示:https://www.bilibili.com/video/BV17f4y177tK filmHeaders={ "Accept": "text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/*; q=0.8,application/signed-exchange; v=b3; q=0.9", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "zh-CN,zh; q=0.9,en; q=0.8,en-GB; q=0.7,en-US; q=0.6", "Connection": "keep-alive", "Cookie": "__mta=121600695.1633276133288.1633486391780.1633486486213.25; uuid_n_v=v1; uuid=68300D50246111ECBC330D5DE33F2BD5DC4E5699C4B34D37ABD9B18E36A7FE41; _lxsdk_cuid=17c46d7ff1dc8-0e4be0ab9874b1-513c164a-bfe6e-17c46d7ff1dc8; _lxsdk=68300D50246111ECBC330D5DE33F2BD5DC4E5699C4B34D37ABD9B18E36A7FE41; __mta=121600695.1633276133288.1633317988496.1633318498554.22; _csrf=34b8929cfc79025ebc9bb219233450e5cf2fe4210bc8d3df569351110be80adb; Hm_lvt_703e94591e87be68cc8da0da7cbd0be2=1633276133,1633308246,1633486385; Hm_lpvt_703e94591e87be68cc8da0da7cbd0be2=1633486486; _lxsdk_s=17c53602e76-352-32a-94e%7C%7C5", "Host": "maoyan.com", #"Referer": "https://maoyan.com/board/4?offset=20", "sec-ch-ua": "\"Chromium\"; v=\"94\", \"Microsoft Edge\"; v=\"94\", \"; Not A Brand\"; v=\"99\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"Windows\"", "Sec-Fetch-Dest": "document", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-User": "?1", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Edg/94.0.992.38" } coverHeaders={ "Referer": "https://maoyan.com/", "sec-ch-ua": "\"Chromium\"; v=\"94\", \"Microsoft Edge\"; v=\"94\", \"; Not A Brand\"; v=\"99\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"Windows\"", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36 Edg/94.0.992.38" } def getHtml(url,getContent=False,head=filmHeaders):#获取网页html getContent获取二进制数据 response=requests.get(url,headers=head) print(f'正在尝试获取 {url} 数据,响应码:{response.status_code}') if response.status_code==200: if getContent: return response.content return response.text else: print(f'网页 {url} 获取数据失败') return Nonedef getPageFilmsInfo(page):#获取页面多个电影信息 返回多个电影信息字典组成的列表 filmHtml=getHtml(baseUrl+f'/board/4?offset={page}') filmPattern=re.compile(r'
.*?board-index.*?>(.*?).*?(.*?).*?star.*?>(.*?)
.*?releasetime.*?>(.*?)
.*?integer.*?>(.*?).*?fraction.*?>(.*?).*?
',re.S)#匹配电影的信息 re.S =re.DOTALL 使 . 能够匹配所有字符 filmsInfo=filmPattern.findall(filmHtml)#多个电影信息 排名 封面 电影名 主演 上映时间 评分coverPattern=re.compile(r'爬取猫眼电影排行榜-Python程序-效果演示
文章图片

    推荐阅读