BeautifulSoup HTML或XML文件中提取数据的Python库
BeautifulSoup简介
BeautifulSoup,Python第三方库,可以从HTML或XML文件中提取数据,能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式。
BeautifulSoup应用实例
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())
# <html>
# <head>
# <title>
# The Dormouse's story
# </title>
# </head>
# <body>
# <p class="title">
# <b>
# The Dormouse's story
# </b>
# </p>
# <p class="story">
# Once upon a time there were three little sisters; and their names were
# <a class="sister" href="http://example.com/elsie" id="link1">
# Elsie
# </a>
# ,
# <a class="sister" href="http://example.com/lacie" id="link2">
# Lacie
# </a>
# and
# <a class="sister" href="http://example.com/tillie" id="link2">
# Tillie
# </a>
# ; and they lived at the bottom of a well.
# </p>
# <p class="story">
# ...
# </p>
# </body>
# </html>
soup.title
# <title>The Dormouse's story</title>
soup.title.name
# u'title'
soup.title.string
# u'The Dormouse's story'
BeautifulSoup Github统计数据
MIT license
Github 75 stars(BeautifulSoup4.3.0)
BeautifulSoup安装命令
pip install beautifulsoup4
BeautifulSoup Python版本要求
csvkit4.3.0 Python3.6+
免责声明:内容编辑自网络,仅供参考,不保证正确性,不作任何决策依据!!以上数据皆截止于博文的写稿日期。