BeautifulSoup HTML或XML文件中提取数据的Python库

BeautifulSoup简介

BeautifulSoup，Python第三方库，可以从HTML或XML文件中提取数据，能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式。

BeautifulSoup应用实例

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())
# <html>
#  <head>
#   <title>
#    The Dormouse's story
#   </title>
#  </head>
#  <body>
#   <p class="title">
#    <b>
#     The Dormouse's story
#    </b>
#   </p>
#   <p class="story">
#    Once upon a time there were three little sisters; and their names were
#    <a class="sister" href="http://example.com/elsie" id="link1">
#     Elsie
#    </a>
#    ,
#    <a class="sister" href="http://example.com/lacie" id="link2">
#     Lacie
#    </a>
#    and
#    <a class="sister" href="http://example.com/tillie" id="link2">
#     Tillie
#    </a>
#    ; and they lived at the bottom of a well.
#   </p>
#   <p class="story">
#    ...
#   </p>
#  </body>
# </html>
soup.title
# <title>The Dormouse's story</title>

soup.title.name
# u'title'

soup.title.string
# u'The Dormouse's story'

BeautifulSoup Github统计数据

MIT license

Github 75 stars（BeautifulSoup4.3.0）

BeautifulSoup安装命令

pip install beautifulsoup4

BeautifulSoup Python版本要求

csvkit4.3.0 Python3.6+

免责声明：内容编辑自网络，仅供参考，不保证正确性，不作任何决策依据！！以上数据皆截止于博文的写稿日期。

hot：热门