python.requests实战58写字楼
1.先看效果
文章图片
image.png 2.思路
反爬虫,武装user-agent
【python.requests实战58写字楼】3.上源代码
import re
import requests
from bs4 import BeautifulSoup
class Guiyang(object):def __init__(self):
self.page = range(1,10)
self.url = 'http://gy.58.com/zhaozu/?PGTID=0d00000d-0000-0ee8-d8e7-f5dce12e009e&ClickID={}'.format(self.page)
self.headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0;
Win64;
x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
,'Host':'gy.58.com'
,'Upgrade-Insecure-Requests':'1'
}self.link_url = self.get_data()def get_data(self):data = https://www.it610.com/article/{'PGTID':'0d00000d-0000-0ee8-d8e7-f5dce12e009e'
,'ClickID':'2'
}r = requests.get(url=self.url,headers=self.headers,data=https://www.it610.com/article/data).texts = BeautifulSoup(r,'lxml').find('a',class_='on').get_text()
#print(s)
soup = BeautifulSoup(r,'lxml').find('ul',class_='house-list-wrap').find_all('li')for items in soup:
link_url = items.find('a')['href']#每个url的链接
#get_link = requests.get(item_link_url,headers=headers).text
name = items.find('span',attrs={'class':'title_des'}).get_text()
location =items.find('p',class_='baseinfo').get_text().replace('\n','')
#pricea = items.find('p',class_='sum').get_text().replace('\n','')+str('>每平米')+'\n\n'
try:
pricetoday = items.find('p',class_='unit').get_text().replace(' ','').replace('\n','').replace('\r','')
print('{},{},{}'.format(pricetoday,location,name))except:
passc = Guiyang()
c.get_data()
推荐阅读
- 《机器学习实战》高清中文版PDF英文版PDF+源代码下载
- --木木--|--木木-- 第二课作业#翼丰会(每日一淘6+1实战裂变被动引流# 6+1模式)
- 2020-07-29《吴军·阅读与写作50讲》24实战才能转化效能
- Python实战计划学习笔记(9)为大规模爬取准备
- 韵达基于云原生的业务中台建设 | 实战派
- 【V课会】第3季-30天小学思维导图实战营
- 【思维导图实战派】刻意练习计划“遇见……”|【思维导图实战派】刻意练习计划“遇见……” 1/300 人教版数学五下第三单元《正方体和长方体的认识》
- OpenCV|OpenCV-Python实战(18)——深度学习简介与入门示例
- 分布式|《Python3网络爬虫开发实战(第二版)》内容介绍
- 区块链FISCO|区块链FISCO BCOS实战基础篇(视频教程)