本次我们将爬取Ajax动态加载数据并进行简单数据分析,其主要方式是找到数据的json包,将其保存到本地目录,进行数据分析
文章目录
- 一、业务分析
- 二、步骤
-
- 1.找到数据
- 2.抓取数据
- 3.分析数据
- 总结
一、业务分析 目标网站:NBA中国官方网站https://china.nba.com/statistics/
爬取字段:
文章图片
【python爬虫实操|python爬虫实操|爬取nba中国官网球员数据】使用到的库:requests, json,csv,pandas numpy ,matplotlib
其中requests,json进行数据抓取
cxv保存到本地
pandas,numpy进行数据分析
matplotlib可视化
二、步骤 1.找到数据 首先,我们要找到储存数据的json包,F12进入开发者模式
文章图片
点击网络,选择XHR,进行刷新,就可以看到json包了
文章图片
这里我们可以得到请求头信息以及json包
文章图片
这就是一会儿要抓取的数据
2.抓取数据 这里我选择了抓取本赛季前50球员的数据,在json包中寻找,可以看到
文章图片
这里存放的是本赛季的数据
导入库
import requests
import json
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
先定义抓取json包方法
def getJson(url):
headers={
'user-agent':'Mozilla/5.0 (Windows NT 10.0;
Win64;
x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.53'
}
response = requests.get(url,headers=headers)
json_data = https://www.it610.com/article/json.loads(response.text)
return json_data
定义抓取数据的方法
defgetData(json_data):
playerList=[]
for item in json_data['payload']['players']:
player_dataDict={}
#球员名字
name=item['playerProfile']['code']
#出场次数
games=item['statAverage']['games']
#先发
gamesStarted=item['statAverage']['gamesStarted']
#分钟
mins=item['statAverage']['minsPg']
#三分命中
tpm=item['statAverage']['tppct']
#罚球命中
ftm=item['statAverage']['ftpct']
#进攻
offRebs=item['statAverage']['offRebsPg']
#防守
defRebs=item['statAverage']['defRebsPg']
#篮板
rebs=item['statAverage']['rebsPg']
#助攻
assists=item['statAverage']['assistsPg']
#抢断
steals=item['statAverage']['stealsPg']
#盖帽
blocks=item['statAverage']['blocksPg']
#失误
turnovers=item['statAverage']['turnoversPg']
#犯规
fouls=item['statAverage']['foulsPg']
#得分
points=item['statAverage']['pointsPg']
player_dataDict['球员']=name
player_dataDict['场次']=games
player_dataDict['先发']=gamesStarted
player_dataDict['出场时间']=mins
player_dataDict['三分命中率']=tpm
player_dataDict['罚球命中率']=ftm
player_dataDict['进攻效率']=offRebs
player_dataDict['防守效率']=defRebs
player_dataDict['篮板']=rebs
player_dataDict['助攻']= assists
player_dataDict['抢断']=steals
player_dataDict['盖帽']=blocks
player_dataDict['失误']=turnovers
player_dataDict['犯规']=fouls
player_dataDict['得分']=points
print(player_dataDict)
playerList.append(player_dataDict)
return playerList
接下来进行存储
def writeData(playerList):
#写入数据
with open('player_data.csv','w',encoding='utf-8',newline='')as f:
write=csv.DictWriter(f, fieldnames=['球员','场次','先发','出场时间','三分命中率','罚球命中率',
'进攻效率','防守效率','篮板','助攻',
'抢断','盖帽','失误','犯规','得分'])
write.writeheader()
for each in playerList:
write.writerow(each)
完整代码
import requests
import json
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
url='https://china.nba.com/static/data/league/playerstats_All_All_All_0_All_false_2021_2_All_Team_points_All_perGame.json'
def getJson(url):
headers={
'user-agent':'Mozilla/5.0 (Windows NT 10.0;
Win64;
x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.53'
}
response = requests.get(url,headers=headers)
json_data = https://www.it610.com/article/json.loads(response.text)
return json_data
defgetData(json_data):
playerList=[]
for item in json_data['payload']['players']:
player_dataDict={}
#球员名字
name=item['playerProfile']['code']
#出场次数
games=item['statAverage']['games']
#先发
gamesStarted=item['statAverage']['gamesStarted']
#分钟
mins=item['statAverage']['minsPg']
#三分命中
tpm=item['statAverage']['tppct']
#罚球命中
ftm=item['statAverage']['ftpct']
#进攻
offRebs=item['statAverage']['offRebsPg']
#防守
defRebs=item['statAverage']['defRebsPg']
#篮板
rebs=item['statAverage']['rebsPg']
#助攻
assists=item['statAverage']['assistsPg']
#抢断
steals=item['statAverage']['stealsPg']
#盖帽
blocks=item['statAverage']['blocksPg']
#失误
turnovers=item['statAverage']['turnoversPg']
#犯规
fouls=item['statAverage']['foulsPg']
#得分
points=item['statAverage']['pointsPg']
player_dataDict['球员']=name
player_dataDict['场次']=games
player_dataDict['先发']=gamesStarted
player_dataDict['出场时间']=mins
player_dataDict['三分命中率']=tpm
player_dataDict['罚球命中率']=ftm
player_dataDict['进攻效率']=offRebs
player_dataDict['防守效率']=defRebs
player_dataDict['篮板']=rebs
player_dataDict['助攻']= assists
player_dataDict['抢断']=steals
player_dataDict['盖帽']=blocks
player_dataDict['失误']=turnovers
player_dataDict['犯规']=fouls
player_dataDict['得分']=points
print(player_dataDict)
playerList.append(player_dataDict)
return playerList
def writeData(playerList):
#写入数据
with open('player_data.csv','w',encoding='utf-8',newline='')as f:
write=csv.DictWriter(f, fieldnames=['球员','场次','先发','出场时间','三分命中率','罚球命中率',
'进攻效率','防守效率','篮板','助攻',
'抢断','盖帽','失误','犯规','得分'])
write.writeheader()
for each in playerList:
write.writerow(each)
if __name__ == "__main__":
json_data = https://www.it610.com/article/getJson(url)
playerList=[]
playerList += getData(json_data)
writeData(playerList)
3.分析数据 数据都存放到本地了,我们当然可以为所欲为
这里我们选取了几个字段,生成了每个球员的雷达图,方便进行比较
import pandas as pd
import numpy as np
import matplotlib.pyplot as pltdf=pd.read_csv('player_data.csv')
for i in range(50):
x=df.loc[i]
name=x.loc[['球员']]
y=x.loc[['犯规','篮板','助攻','抢断','盖帽','失误']]
labels=np.array(['犯规','篮板','助攻','抢断','盖帽','失误'])
data=https://www.it610.com/article/np.array(y)
plt.rcParams['font.sans-serif']=['SimHei']
angles=np.linspace(0, 2*np.pi,len(labels),endpoint=False)
labels=np.concatenate((labels,[labels[0]]))
data=https://www.it610.com/article/np.concatenate((data,[data[0]]))
angles=np.concatenate((angles,[angles[0]]))
plt.polar(angles, data,'bo-',linewidth=1)
plt.thetagrids(angles*180/np.pi,labels)
plt.fill(angles, data,facecolor='b',alpha=0.25)
plt.title(str(name))
plt.show()
文章图片
文章图片
就不一个个上图了
总结 Ajax动态数据还是非常容易爬取的,同时pandas和numpy库也非常值得学习
最后一句
“湖人总冠军”
推荐阅读
- 超级实习生|2022年校招互联网大厂薪酬状况如何(“白菜”总包接近40W是真是假?)
- 超级实习生|简历空白怎么办(计算机专业应届生和在校生怎么写简历?)
- Flink|Flink 面试跳槽指南(1)——带领你疏通航道
- Docker|Docker【5】 | Docker常用应用部署
- 分布式|基于Docker搭建 ELK分布式日志管理解决方案
- 二叉树(Binary|LeetCode 337. House Robber III - 二叉树系列题25
- Python中转换数据类型的函数
- Python中常用数据类型转换函数的使用方法和步骤
- Pycharm交互式开发环境的使用方法