【Python】BeautifulSoup的使用【Python】BeautifulSoup的使用

1、遍历文档树使用示例：

html_doc = """ The Dormouse's story - 锐客网 【【Python】BeautifulSoup的使用】The Dormouse's story
Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.
...
"""from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser')

1.1、子节点
一个Tag可能包含多个字符串或其它的Tag,这些都是这个Tag的子节点.Beautiful Soup提供了许多操作和遍历子节点的属性.
注意: Beautiful Soup中字符串节点不支持这些属性,因为字符串没有子节点
1.2、tag的名字
操作文档树最简单的方法就是告诉它你想获取的tag的name.如果想获取标签,只要用 soup.head :

soup.head # The Dormouse's story - 锐客网soup.title # The Dormouse's story - 锐客网

这是个获取tag的小窍门,可以在文档树的tag中多次调用这个方法.下面的代码可以获取标签中的第一个标签:

soup.body.b # The Dormouse's story

通过点取属性的方式只能获得当前名字的第一个tag:

soup.a # Elsie

如果想要得到所有的标签,或是通过名字得到比一个tag更多的内容的时候,就需要用到 Searching the tree 中描述的方法,比如: find_all()

soup.find_all('a') # [Elsie, #Lacie, #Tillie]

1.3、.contents和.children
tag的 .contents 属性可以将tag的子节点以列表的方式输出:

head_tag = soup.head head_tag # The Dormouse's story - 锐客网head_tag.contents [The Dormouse's story - 锐客网]title_tag = head_tag.contents[0] title_tag # The Dormouse's story - 锐客网 title_tag.contents # [u'The Dormouse's story']

BeautifulSoup 对象本身一定会包含子节点,也就是说标签也是 BeautifulSoup 对象的子节点:

len(soup.contents) # 1 soup.contents[0].name # u'html'

字符串没有 .contents 属性,因为字符串没有子节点:

text = title_tag.contents[0] text.contents # AttributeError: 'NavigableString' object has no attribute 'contents'

通过tag的 .children 生成器,可以对tag的子节点进行循环:

for child in title_tag.children: print(child) # The Dormouse's story

1.4、.descendants
.contents 和 .children 属性仅包含tag的直接子节点.例如,标签只有一个直接子节点<br /> 但是<title>标签也包含一个子节点:字符串 “The Dormouse’s story”,这种情况下字符串 “The Dormouse’s story”也属于标签的子孙节点. <code>.descendants</code> 属性可以对所有tag的子孙节点进行递归循环 [5] :<br /> <blockquote>for child in head_tag.descendants: print(child) # <title>The Dormouse's story - 锐客网 # The Dormouse's story
1.5、.string
如果tag只有一个 NavigableString 类型子节点,那么这个tag可以使用 .string 得到子节点:

title_tag.string # u'The Dormouse's story'

1.6、.strings和stripped_strings
如果tag中包含多个字符串 [2] ,可以使用 .strings 来循环获取:

for string in soup.strings: print(repr(string)) # u"The Dormouse's story" # u'\n\n' # u"The Dormouse's story" # u'\n\n' # u'Once upon a time there were three little sisters; and their names were\n' # u'Elsie' # u',\n' # u'Lacie' # u' and\n' # u'Tillie' # u'; \nand they lived at the bottom of a well.' # u'\n\n' # u'...' # u'\n'

2、父节点 2.1、.parent
通过 .parent 属性来获取某个元素的父节点.在例子“爱丽丝”的文档中,标签是标签的父节点:<br /> <blockquote>title_tag = soup.title title_tag # <title>The Dormouse's story - 锐客网 title_tag.parent # The Dormouse's story - 锐客网
2.2、.parents
通过元素的 .parents 属性可以递归得到元素的所有父辈节点,下面的例子使用了 .parents 方法遍历了标签到根节点的所有节点.

link = soup.a link # Elsie for parent in link.parents: if parent is None: print(parent) else: print(parent.name) # p # body # html # [document] # None

3、兄弟节点 3.1、.next_sibling和.previous_sibling
在文档树中,使用 .next_sibling 和 .previous_sibling 属性来查询兄弟节点:

sibling_soup.b.next_sibling # text2sibling_soup.c.previous_sibling # text1

3.2、.next_siblings和.previous_siblings
通过 .next_siblings 和 .previous_siblings 属性可以对当前节点的兄弟节点迭代输出:

for sibling in soup.a.next_siblings: print(repr(sibling)) # u',\n' # Lacie # u' and\n' # Tillie # u'; and they lived at the bottom of a well.' # Nonefor sibling in soup.find(id="link3").previous_siblings: print(repr(sibling)) # ' and\n' # Lacie # u',\n' # Elsie # u'Once upon a time there were three little sisters; and their names were\n' # None

推荐阅读

如何科学坐月子春季坐月子这6件事千万别做

广汽本田广汽本田皓影三大件是进口的吗

小森生活抗寒衣服怎么做小森生活抗寒套装制作攻略

冬枣泡了糖精水的怎么看

怎么用炉甘石治荨麻疹

警察是否有权上门抓狗

伊利鲜奶好还是蒙牛纯牛奶好带你了解伊利纯牛奶和蒙牛纯牛奶有什么区别

正常胸和下垂胸对比如何揉胸才能变大

ps快速替换靠山颜色

夏天发面要几个小时才会发的好夏天发面是用冷水还是用温水

圆点|每日好价：大牌小圆点机械键盘开售颜值超高

臀部|臀部引发各种疼痛，令人很苦恼，这些锻炼方法，您会吗？尝试看一看

接触区域按接触还是挤压强度分析

NLP教程（Python NLTK用法示例和完整指南）

三国志战略版震慑状态是什么九大震慑类战法使用攻略

如何为企业选择合适的云服务器？企业要用云服务器怎么搞

lol2018战斗之夜奖励重随在哪领取战斗之夜奖励重随领取地址

果树环剥的方法及作用果树环剥注意事项

考研什么时候出成绩考研是什么

瓜蒌皮的功效与作用及禁忌

0718足彩分析

一针见血的真话

【YC创业第八课观后感】|【YC创业第八课观后感】创业要学会吃力不讨好

CodeBlocks下使用FFmpeg（2）

降低企业数字化成本，蓝凌亮相B2B企业节

【教程】树莓派自动发送邮件

【Day|【Day 14】专业知识真的很重要啊

字符串的相关操作

杂项|mobaxterm居然无法使用rz sz命令

一线车讯|售价30.98-37.98万元，全新福特探险者正式上市 | 一线车讯