Metadata-Version: 2.1
Name: PrSpiders
Version: 0.3.6
Summary: Inherit the requests module, add xpath functionality to expand the API, and handle request failures and retries
Home-page: https://github.com/peng0928/prequests
Author: penr
Author-email: 1944542244@qq.com
License: MIT
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt

# *[PrSpiders线程池爬虫框架](https://github.com/peng0928/PrSpiders)*

## *PrSpiders安装*

 - *`pip install PrSpiders`*


## *开始	Go start!*

**1.Demo**
   

    from PrSpider import PrSpiders  
      
      
    class Spider(PrSpiders):  
        start_urls = 'https://www.runoob.com'  
      
        def parse(self, response):  
            # print(response.text)  
		    print(response, response.code, response.url)  
      #<Response Code=200 Len=323273> 200 https://www.runoob.com/
      
    if __name__ == '__main__':  
        Spider()
       
**2.重写入口函数-start_requests**

> start_requests是框架的启动入口，PrSpiders.Requests是发送请求的发送，参数下面会列举。

    from PrSpider import PrSpiders  
      
      
    class Spider(PrSpiders):  
      
        def start_requests(self, **kwargs):  
            start_urls = 'https://www.runoob.com'  
            PrSpiders.Requests(url=start_urls, callback=self.parse)  
      
        def parse(self, response):  
            # print(response.text)  
            print(response, response.code, response.url)  
      
      
    if __name__ == '__main__':  
        Spider()


**3.PrSpiders基本配置**

> 底层使用ThreadPoolExecutor

    workers: 线程数
    retry: 是否开启请求失败重试，默认开启
    download_delay: 请求周期
    download_num: 每次线程请求数量，默认1秒5个请求

> 使用方法如下

    from PrSpider import PrSpiders  
      
      
    class Spider(PrSpiders):  
      workers = 5  
      retry = False  
      download_delay = 3  
      download_num = 10  
      
      def start_requests(self, **kwargs):  
            start_urls = 'https://www.runoob.com'  
            PrSpiders.Requests(url=start_urls, callback=self.parse)  
      
      def parse(self, response):  
            # print(response.text)  
            print(response, response.code, response.url)  
    
      
      
    if __name__ == '__main__':  
        Spider()

**4.PrSpiders.Requests基本配置**

> 基本参数：
> url：请求网址
> callback：回调函数
> headers：请求头
> retry_time：请求失败重试次数
> method：请求方式（默认Get方法），
> meta：回调参数传递
> encoding：编码格式（默认utf-8）
> retry_interval：重试间隔
> timeout：请求超时时间（默认10s）
> **kwargs：继承requests的参数如（data, params, proxies）

        PrSpiders.Requests(url=start_urls, headers={}, method='post', encoding='gbk', callback=self.parse,  
      retry_time=10, retry_interval=0.5, meta={'hhh': 'ggg'})

  

## *Api*

**GET Status Code**

    response.code

**GET Text**

    response.text

**GET Content**

    response.content
**GET Url**

    response.url

**GET History**

    response.history

**GET Headers**

    response.headers

**GET Text Length**

    response.len

**GET Lxml Xpath**

    response.xpath

## *Xpath Api*

 1. text()方法:将xpath结果转成text
 2. date()方法:将xpath结果转成date
 3. get()方法:将xpath结果提取
 4. getall()方法:将xpath结果全部提取，拥有text()方法和date()方法

    from PrSpider import PrSpiders
    
    
    class Spider(PrSpiders):
	    def start_requests(self, **kwargs):
		    start_urls = "https://www.runoob.com"
		    PrSpiders.Requests(url=start_urls, callback=self.parse)
	    
	    def parse(self, response):
		    label = response.xpath("//div[@class='navto-nav']")
		    label_text = response.xpath("//div[@class='navto-nav']").text()
		    label_get = response.xpath("//div[@class='navto-nav']").get()
		    label_getall = response.xpath("//div[@class='navto-nav']").getall()
		    print(label)
		    print(label_text)
		    print(label_get)
		    print(label_getall)
    
    
    if __name__ == "__main__":
	    Spider()

       
## *Please contact me if there are any bugs*


> email ->
> 1944542244@qq.com
