python 爬取强智科技教务系统（湖南） Python

扯闲：说在前面的废话: 又开学了，我要好好学习，想找个空教室自习真不容易（虽然有书院，但是人多啊，找个没人的教室自习多好~），一楼一楼的往上找教室，就算我不觉得麻烦，但是小姐姐（基友）也会觉得麻烦），所以（摊手）。
进入正题：想到两种方法：
1：先通过谷歌抓包，获得所有校区教学楼上课的课表，然后自己写个小程序自动判断下，然后汇总。缺点如果教务系统更新了，原来的就不能用了（课表都变了，你还咋用233333）
2：还是先分析各个数据包，然后模拟登录教务系统（非pyautogui，而是使用session模拟浏览器），然后发送特定数据包，获得课表，再统计输出，实时获得最新信息（如果教务系统上出错，那我也无奈啊）
【python 爬取强智科技教务系统（湖南）】为了让小姐姐（基友）有更好的使用体验，肯定选择第二种啦~
一、使用账号密码登录教务系统（以湖南工商大学教务系统为例）
百度直接搜到的网址需要验证码，但是有个不需要验证码的（这就很爽了）点击这里进入登录
1、分析登录网页
先打开F12 点击Network，先自己登录走一遍，发现它直接跳转到了教务系统里面，
我们找到LoginToXk这个数据包，注意框框里面的是加密后的数据，也就是我们需要让服务器认证的东西。
当然这个数据是（账号密码）被加密后再发送的，所以我们再退到登录页面。

文章图片

进入登录页面后，点击右键，检查。
我们发现我们点了登录那个按钮后，会先经过一个JS处理，就是submitForm1(）这个，然后把加密后的东西传送给我们上面找到的LoginToXk这个数据包。

文章图片

把这个代码展开，我们找到这个方法，再最下面。
大概看下代码，加密在这两个地方
encodeInp(xh)；
encodeInp(pwd);
我们通过浏览器自带的search 查找，发现这个加密函数在另外一个JS里面，顺藤摸瓜。

文章图片

文章图片

重新看下登录界面的数据包，找到conwork.js 我们看下如如何加密的。

文章图片

恩，有点麻烦，怎么办。。。。
模拟加密，写了开头，既然它有加密的JS，我为啥不直接用？我用的是python！问下度年，发现

import execjs

可以直接调用本地的js 然后获得返回的信息，nice! 我们先把这个JS保存在本地。
然后写个调用js加密，然后返回结果的函数,注意显示账号加密，然后中间有三个%,然后再接密码加密。

import execjs def get_js(self, msg):# python 调用JS加密返回加密后的结果 with open('conwork.js', encoding='utf-8') as f: js = execjs.compile(f.read()) return js.call('encodeInp', msg)encode = str(self.get_js(account)) + "%%%" + str(self.get_js(psw)) + "="# 获得加密后的东西

加密搞定。
然后我们模拟LoginToXk，发送数据包，这里我们需要有个cookies,cookies而且这个cookies 在全局都有用。
cookie在进入登录页面就会自动给你分配。所以我们先获得cookies.

?# -*- conding:utf-8 -*- import requests import re import execjsses = requests.session()def get_login_cookies(): header = { "Content-Type": "text/html; charset=GBK", "Vary": "Accept-Encoding" } url = "http://jwgl.hnuc.edu.cn/jsxsd/" ses.get(url=url, headers=header, timeout=1000)# ses已经获得了cookies cookies = ses.cookies.get_dict()# 获得临时的cookies cookies = str(cookies).replace("{", '').replace("'", '').replace(":", '=').replace('}', '').replace(",", "; ") cookies = cookies.replace(" ", '') return cookies

然后发送我们验证

def login(cookies, jsmsg): header = { "Accept":"text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/*; " "q=0.8,application/signed-exchange; v=b3", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh; q=0.9", "Cache-Control": "max-age=0", "Content-Length": "47", "Content-Type": "application/x-www-form-urlencoded",# 接收类型 "Cookie": cookies, "Host": "jwgl.hnuc.edu.cn", "Origin": "http://jwgl.hnuc.edu.cn", "Proxy-Connection": "keep-alive", "Referer": "http://jwgl.hnuc.edu.cn/jsxsd/", "Upgrade-Insecure-Requests": "1", "User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/76.0.3809.132 Mobile Safari/537.36", } PostData = https://www.it610.com/article/{'encoded': jsmsg# 账号密码加密后的东西 } url = 'http://jwgl.hnuc.edu.cn/jsxsd/xk/LoginToXk' msg = ses.post(url, headers=header, data=https://www.it610.com/article/PostData, timeout=1000).text# 这个跳转 # print("cookies验证:"+str(msg))

这样我们的cookies 就可以通过教务系统的登录验证了
二、教室课表查询和过滤
找到教室课表查询，这两个参数必须选择。

文章图片

点击查询，教务系统比较烂，相应的慢，233333.
找到这个/kbxx_classroom_ifr数据包，再看下Preview信息。

文章图片

文章图片

没错就是表格形式的。
既然是表格信息，我们直接保存为csv格式，然后在处理。
原因：教务系统响应太慢，对小姐姐（基友）不友好，其次，表格我就更新一次，确保误差在可控范围。
表格处理我们是用pandas 这个库就好了

import pandas as pd

获得教室信息代码保存为csv代码：

def get_all_room(self, where, what):# 获得教室信息week = get_now_week() beg = week end = week url = "http://jwgl.hnuc.edu.cn/jsxsd/kbcx/kbxx_classroom_ifr" data = https://www.it610.com/article/{"xnxqh": "2019-2020-1",# 时间不变 "skyx": "", "xqid": where,# 哪个校区 "jzwid": what,# 哪个教学楼 "classroomID": "", "jx0601id": "", "jx0601mc": "", "zc1": beg, "zc2": end, "jc1": "", "jc2": "", } msg = ses.post(url, data=https://www.it610.com/article/data).text msg = pd.read_html(msg, encoding="UTF-8", header=1)[0]# 第几个表格 pd.set_option('display.width', None) pd.set_option('display.unicode.east_asian_width', True)# 宽度对齐 msg.to_csv(r''+str(where)+str(what)+'.csv', mode='w+', encoding='utf_8_sig', header=1, index=0) print(str(where)+str(what)+"csv文件保存成功！\n")

然后就是简单空教室判断和查询了，一个一个的查看是否为空，是的话记录教室，否则跳过。
时间原因就不多阐述了，我封装成了一个类，大家可以直接调用，注意记得把加密的JS文件放在目录下，或者你写绝对路径。

# -*- conding:utf-8 -*- import requests import re import execjs from bs4 import BeautifulSoup import pandas as pd import csv from datetime import datetime """ @author:ym @time:09-2019/9/2 """class School: """ :login_in(user,psw)# 登录到教务系统，必须先进行这这步：get_xskb_lis()# 返回登录者本人的课表返回的是元组数据，s[0] 去掉 :day_init()# 每天更新教室情况使用定时更新：get_msg_by_csv（）# 返回教室情况每一节大课一个\n 字符串型 :get_cj()#返回登录者的成绩，得到后使用’\n'分割然后分块输出字符串型 """ def __init__(self): self.ses = requests.session()def get_js(self, msg):# python 调用JS加密返回加密后的结果 with open('conwork.js', encoding='utf-8') as f: js = execjs.compile(f.read()) return js.call('encodeInp', msg)def get_login_cookies(self): header = { "Content-Type": "text/html; charset=GBK", "Vary": "Accept-Encoding" } url = "http://jwgl.hnuc.edu.cn/jsxsd/" self.ses.get(url=url, headers=header, timeout=1000) cookies = self.ses.cookies.get_dict()# 获得临时的cookies cookies = str(cookies).replace("{", '').replace("'", '').replace(":", '=').replace('}', '').replace(",", "; ") cookies = cookies.replace(" ", '') return cookiesdef login(self, cookies, jsmsg): header = { "Accept":"text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/*; " "q=0.8,application/signed-exchange; v=b3", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh; q=0.9", "Cache-Control": "max-age=0", "Content-Length": "47", "Content-Type": "application/x-www-form-urlencoded",# 接收类型 "Cookie": cookies, "Host": "jwgl.hnuc.edu.cn", "Origin": "http://jwgl.hnuc.edu.cn", "Proxy-Connection": "keep-alive", "Referer": "http://jwgl.hnuc.edu.cn/jsxsd/", "Upgrade-Insecure-Requests": "1", "User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/76.0.3809.132 Mobile Safari/537.36", } PostData = https://www.it610.com/article/{'encoded': jsmsg# 账号密码加密后的东西 } url = 'http://jwgl.hnuc.edu.cn/jsxsd/xk/LoginToXk' msg = self.ses.post(url, headers=header, data=https://www.it610.com/article/PostData, timeout=1000).text# 这个跳转 # print("cookies验证:"+str(msg))def login_in(self, account, psw):# 输入账号密码，让cookies 生效 jsmsg = str(self.get_js(account)) + "%%%" + str(self.get_js(psw)) + "="# 获得加密后的东西 self.get_login_cookies()# cookies 初始化 self.login(account, jsmsg)#def get_now_week(self): now = "2019-09-02 00:03:00"# 第一周 now = datetime.strptime(now, '%Y-%m-%d %H:%M:%S')# 第一周 end = datetime.now() week = int((end - now).days / 7) + 1# return weekdef get_all_room(self, where, what):# 获得教室信息week = self.get_now_week() beg = week end = week url = "http://jwgl.hnuc.edu.cn/jsxsd/kbcx/kbxx_classroom_ifr" data = https://www.it610.com/article/{"xnxqh": "2019-2020-1",# 时间不变 "skyx": "", "xqid": where,# 哪个校区 "jzwid": what,# 哪个教学楼 "classroomID": "", "jx0601id": "", "jx0601mc": "", "zc1": beg, "zc2": end, "jc1": "", "jc2": "", } msg = self.ses.post(url, data=https://www.it610.com/article/data).text msg = pd.read_html(msg, encoding="UTF-8", header=1)[0]# 第几个表格 pd.set_option('display.width', None) pd.set_option('display.unicode.east_asian_width', True)# 宽度对齐 msg.to_csv(r''+str(where)+str(what)+'.csv', mode='w+', encoding='utf_8_sig', header=1, index=0) print(str(where)+str(what)+"csv文件保存成功！\n")def get_msg_by_csv_pre(self, today, classs, where, what): filname = str(where)+str(what)+".csv" ans = "" with open(file=filname, encoding="UTF-8") as f: f_csv = csv.reader(f) temp = 1 for i in f_csv:# 遍历每一行 if temp == 1:# 第一行自动跳过 temp += 1 continue # print(i[6*(today-1)+classs]) if i[6*(today-1)+classs] in (None, ""):# 是空 ans += i[0] + " " return ans + "\n"# 返回所有的，这天这节课没有课的教室def exchange(self, where, what): temp = []if where == 1: temp.append("00001")# 南院 if int(what) in range(1, 4):# 左开右闭 temp.append("0000"+str(what)) return temp else: return None elif where == 2: temp.append("00002")# 北院 temp.append("332328065C2440CBAC97F4A714E8937F") return temp return Nonedef get_msg_by_csv(self, where, what): temp = self.exchange(where, what) if temp in (None, ""): return None where = temp[0] what = temp[1] d = datetime.today()# 获取当前日期时间 today = d.isoweekday()# 获得当前的星期 week = str(self.get_now_week()) ans = "本学期第"+week+"周星期" + str(today) + "" if where == "00001": ans += "（南校区，" if what == '00001': ans += "一教)" elif what == '00002': ans += "二教)" else: ans += "三教)" else: ans += "(北校区，教学楼)" ans += "空闲教室如下：\n" for i in range(1, 5): ans += "第"+str(i*2-1)+"-"+str(i*2)+"节课:" ans += self.get_msg_by_csv_pre(today, i, where, what) return ansdef day_init(self):# 每天自动初始化 try: try: self.get_all_room("00001", "00001")# 南院一教 except: pass try: self.get_all_room("00001", "00002")# 南院二教 except:pass try: self.get_all_room("00001", "00003")# 南院三教 except: pass try: self.get_all_room("00002", "332328065C2440CBAC97F4A714E8937F")# 北院教学楼 except: pass return True except: return Falseif __name__ == '__main__': a = School() a.login_in("学号", "密码") msg = a.get_msg_by_csv(1, 1) print(msg)

效果图：