pyscrapper.assembly.managers

API

class pyscrapper.assembly.managers.BaseScrapeManager(url_loader: pyscrapper.assembly.urlloaders.UrlLoader)

Bases: pyscrapper.assembly.observers.Observable

A scrape manager which takes in an UrlLoader instance, It manages takes responsibility to load all the urls and scrape them as per the given configuration.

scrape(url, config, **kwargs)

This method intakes url, configuration. Returns an unique id, which refers to current scrape request. The response of current request is pushed, into the callback methods with the unique id referring to the request made

Returns:An unique id, which is generated on scrape request is created.
shutdown()

Shuts down the current manager

class pyscrapper.assembly.managers.StandardScrapeManager(url_loader: pyscrapper.assembly.urlloaders.UrlLoader)

Bases: pyscrapper.assembly.managers.BaseScrapeManager, pyscrapper.assembly.observers.Observer

add_observer(observer: pyscrapper.assembly.observers.Observer)

Add observer to observers list

on_parse_completed(url, obj, **kwargs)

This method is called when parsing of response html is completed, as per given configuration.

on_url_loaded(url, response, **kwargs)

This method is called when url’s http response is received

Parameters:
  • url – The url which is being loaded
  • response – The html response of the http request
scrape(url, config, **kwargs)

This method intakes url, configuration. Returns an unique id, which refers to current scrape request. The response of current request is pushed, into the callback methods with the unique id referring to the request made

Returns:An unique id, which is generated on scrape request is created.
shutdown(wait=True)

Shuts down the current manager