pyscrapper.assembly.managers
¶
API¶
-
class
pyscrapper.assembly.managers.
BaseScrapeManager
(url_loader: pyscrapper.assembly.urlloaders.UrlLoader)¶ Bases:
pyscrapper.assembly.observers.Observable
A scrape manager which takes in an UrlLoader instance, It manages takes responsibility to load all the urls and scrape them as per the given configuration.
-
scrape
(url, config, **kwargs)¶ This method intakes url, configuration. Returns an unique id, which refers to current scrape request. The response of current request is pushed, into the callback methods with the unique id referring to the request made
Returns: An unique id, which is generated on scrape request is created.
-
shutdown
()¶ Shuts down the current manager
-
-
class
pyscrapper.assembly.managers.
StandardScrapeManager
(url_loader: pyscrapper.assembly.urlloaders.UrlLoader)¶ Bases:
pyscrapper.assembly.managers.BaseScrapeManager
,pyscrapper.assembly.observers.Observer
-
add_observer
(observer: pyscrapper.assembly.observers.Observer)¶ Add observer to observers list
-
on_parse_completed
(url, obj, **kwargs)¶ This method is called when parsing of response html is completed, as per given configuration.
-
on_url_loaded
(url, response, **kwargs)¶ This method is called when url’s http response is received
Parameters: - url – The url which is being loaded
- response – The html response of the http request
-
scrape
(url, config, **kwargs)¶ This method intakes url, configuration. Returns an unique id, which refers to current scrape request. The response of current request is pushed, into the callback methods with the unique id referring to the request made
Returns: An unique id, which is generated on scrape request is created.
-
shutdown
(wait=True)¶ Shuts down the current manager
-