Scrapy Python快速web爬行和抓取框架

Scrapy简介

Scrapy，Python第三方库，是一个快速的web爬行和抓取的框架，并可以从被抓取的页面中提取结构化数据，被广泛地应用于数据挖掘、监控和自动化测试。

Scrapy应用实例

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "https://www.example.example/tag/humor/",
    ]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "author": quote.xpath("span/small/text()").get(),
                "text": quote.css("span.text::text").get(),
            }

        next_page = response.css('li.next a::attr("href")').get()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

Scrapy Github统计数据

BSD-3-Clause license

Github 49.4k stars

Scrapy安装命令

pip install scrapy

Scrapy Python版本要求

Python3.8+

免责声明：内容编辑自网络，仅供参考，不保证正确性，不作任何决策依据！！以上数据皆截止于博文的写稿日期。

hot：热门