urllib urllib - 锐客网

url的异常错误处理: URLerror产生的主要原因:

没有网络连接
服务器连接失败
找不到指定的服务器

HTTPError:
有三个属性:

code:返回HTTP的状态码
reason:返回错误原因
headers:返回请求头

下面是常见的HTTP状态码：

200 - 请求成功
301 - 资源（网页等）被永久转移到其它URL
302 - 资源（网页等）被临时转移到其它URL
401 - 未授权
403 - 禁止访问
408 - 请求超时
404 - 请求的资源（网页等）不存在
500 - 内部服务器错误
503 - 服务器不可用

什么是urllib

Urllib是python内置的HTTP请求库
包括以下模块:

urllib.request 请求模块
urllib.parse url解析模块
urlopen

简单的自定义opener()

import urllib.request# 构建一个HTTPHandler 处理器对象，支持处理HTTP请求 http_handler = urllib.request.HTTPHandler()# 构建一个HTTPHandler 处理器对象，支持处理HTTPS请求 # http_handler = urllib.request.HTTPSHandler()# 调用urllib.request.build_opener()方法，创建支持处理HTTP请求的opener对象 opener = urllib.request.build_opener(http_handler)# 构建 Request请求 request = urllib.request.Request("http://www.baidu.com/")# 调用自定义opener对象的open()方法，发送request请求 response = opener.open(request)# 获取服务器响应内容 print (response.read().decode())

代理：

原理：其实是发送了请求给Web服务器，Web服务器把响应传回给我们。

import urllib.request import randomproxy_list = [ {"https" : "124.88.67.81:80"}, {"https" : "124.88.67.81:80"}, {"https" : "124.88.67.81:80"}, {"https" : "124.88.67.81:80"}, {"https" : "124.88.67.81:80"} ]# 随机选择一个代理 proxy = random.choice(proxy_list)# 使用选代理构建代理处理器对象 proxy_handler = request.ProxyHandler( proxies=proxy )#根据proxy_handler实例化一个opener对象 opener = request.build_opener(proxy_handler)url = 'http://www.baidu.com/'try: response = opener.open(url,timeout=5) print(response.status) except error.HTTPError as err: print(err.reason) except error.URLError as err: print(err.reason)