6.简单提取小红书app数据保存txt-2

家资是何物,积帙列梁梠。这篇文章主要讲述6.简单提取小红书app数据保存txt-2相关的知识,希望能为你提供帮助。

对页面信息进行简单抓取:

需要注意的问题 :

auth-sign 和 auth 都是有一定的时效性,还有url原url是https这里要改为http请求。

这参数的问题需要通过mitmdump去获取请求的具体参数并将之取出,不用手动去截获分析http请求和响应,写好请求和相应的处理逻辑,通过python实现二次操作。

后期通过appium模拟人为操作去滑动请求刷新界面,得到相应再做处理。


import requestsdef main(): headers = { "charset":"utf-8", "Accept-Encoding":"gzip", "referer":"https://servicewechat.com/wxffc08ac7df482a27/117/page-frame.html", "authorization":"5bda7657a4ce660001f7eed8", "auth":"eyJoYXNoIjoibWQ0IiwiYWxnIjoiSFMyNTYiLCJ0eXAiOiJKV1QifQ.eyJzaWQiOiI0M2RkNGY2YS01NTk1LTRjNGEtYTkyMi05ODEzNjdiMTlmMTEiLCJleHBpcmUiOjE1NDExMzAyNjJ9.9AC8VBcXiBG48vHa-LLgVEWOnloTdQvNWzYAyvqGnMA", "content-type":"application/json", "auth-sign":"c475525b214bb5d9ae431ac029cb9b50", "User-Agent":"Mozilla/5.0 (Linux; android 7.1.2; MI 5X Build/N2G47H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/64.0.3282.137 Mobile Safari/537.36 MicroMessenger/6.7.3.1360(0x26070336) NetType/WIFI Language/zh_CN Process/appbrand2", "Host":"www.xiaohongshu.com", "Connection":"Keep-Alive", } # url = "http://www.xiaohongshu.com/sapi/wx_mp_api/sns/v1/homefeed?oid=homefeed.cosmetics_v2& cursor_score=& sid=session.1540996623416187718" url = "http://www.xiaohongshu.com/sapi/wx_mp_api/sns/v1/homefeed?oid=homefeed.cosmetics_v2& cursor_score=1541067389.9550& sid=session.1540996623416187718"datas = requests.get(url= url, headers=headers ).json() data = https://www.songbingjia.com/android/datas[/'data\'] # print(data) for i in data: print(i) # print(i[\'title\']) # print(i[\'share_link\']) title = \'标题: \' + i[\'mini_program_info\'][\'share_title\'] print(title) link_url = \'链接: \' + i[\'share_link\'] print(link_url) b_picture = \'封面图片: \'+ i[\'mini_program_info\'][\'thumb\'] print(b_picture) type = \'类型: \' + i[\'type\'] print(type) level = \'级别: \' + str(i[\'level\']) print(level) h_picture = \'用户头像: \' + i[\'user\'][\'images\'] print(h_picture) username = \'用户名: \' + i[\'user\'][\'nickname\'] print(username) user_id = \'userid: \' + i[\'user\'][\'userid\'] print(user_id) zan = \'喜欢点心: \' + str(i[\'likes\']) print(zan)# 以追加的方式及打开一个文件,文件指针放在文件结尾,追加读写! with open(\'text\', \'a\', encoding=\'utf-8\')as f: f.write(\'\\n\'.join([title,link_url,b_picture,type,level,h_picture,username,user_id,zan])) f.write(\'\\n\' + \'=\' * 100 + \'\\n\') if __name__ == "__main__": main()

 
保存本地

6.简单提取小红书app数据保存txt-2

文章图片
 
【6.简单提取小红书app数据保存txt-2】 
字段信息:

标题: 王者荣耀——貂蝉~仲夏夜之梦 游戏角色貂蝉皮肤印象妆容 主色
链接: https://www.xiaohongshu.com/discovery/item/5bc0b2bf910cf646cc1087aa
封面图片: http://ci.xiaohongshu.com/161f03cb-0cf6-355f-b178-712a928a7720?imageView2/2/w/540/format/jpg
类型: normal
级别: 4
用户头像: https://img.xiaohongshu.com/avatar/5bb1047b0fd0590001997f83.jpg@80w_80h_90q_1e_1c_1x.jpg
用户名: zanleo
userid: 582c5f8982ec393b5ec866ba
喜欢点心: 233
====================================================================================================
标题:

    推荐阅读