grab
Subpackages
Submodules
Package Contents
Classes
Network response. |
|
Attributes
- class grab.Grab(transport: None | BaseTransport | type[BaseTransport] = None, **kwargs: Any)[source]
-
- __slots__ = ['proxylist', 'config', 'transport', 'request_method', 'cookies', 'meta', '_doc']
- document_class :type[grab.document.Document]
- clonable_attributes = ['proxylist']
- process_transport_option(transport: None | BaseTransport | type[BaseTransport], default_transport: type[grab.base_transport.BaseTransport]) grab.base_transport.BaseTransport
- clone(**kwargs: Any) Grab
Create clone of Grab instance.
Cloned instance will have the same state: cookies, referrer, response document data
- Parameters
\**kwargs – overrides settings of cloned grab instance
- dump_config() collections.abc.MutableMapping[str, Any]
Make clone of current config.
- load_config(config: grab.types.GrabConfig) None
Configure grab instance with external config object.
- setup(**kwargs: Any) None
Set up Grab instance configuration.
- prepare_request() grab.request.Request
Configure all things to make real network request.
This method is called before doing real request via transport extension.
- create_request_from_config(config: collections.abc.MutableMapping[str, Any]) grab.request.Request
- sync_cookie_manager_with_request_cookies(cookies: collections.abc.Mapping[str, Any], request_url: str) None
- log_request(req: grab.request.Request, extra: str = '') None
Send request details to logging system.
- find_redirect_url(doc: grab.document.Document) None | str
- request(url: None | str = None, **kwargs: Any) grab.document.Document
Perform network request.
You can specify grab settings in
**kwargs. Any keyword argument will be passed toself.config.Returns:
Documentobjects.
- submit(make_request: bool = True, **kwargs: Any) None | Document
Submit current form.
- Parameters
make_request – if False then grab instance will be configured with form post data but request will not be performed
For details see Document.submit() method
- process_request_result(req: grab.request.Request) grab.document.Document
Process result of real request performed via transport extension.
- reset_temporary_options() None
- change_proxy(random: bool = True) None
Set random proxy from proxylist.
- classmethod common_headers() dict[str, str]
Build headers which sends typical browser.
- make_url_absolute(url: str, resolve_base: bool = False) str
Make url absolute using previous request url as base url.
- clear_cookies() None
Clear all remembered cookies.
- __getstate__() dict[str, Any]
- __setstate__(state: collections.abc.Mapping[str, Any]) None
- class grab.Document(body: None | bytes = None, *, document_type: None | str = 'html', head: None | bytes = None, headers: None | email.message.Message = None, encoding: None | str = None, code: None | int = None, url: None | str = None, cookies: None | CookieJar = None)[source]
Network response.
- property status: None | int
- property json: Any
Return response body deserialized into JSON object.
- property pyquery: Any
Return pyquery handler.
- property body: None | bytes
- property tree: lxml.etree._Element
Return DOM tree of the document built with HTML DOM builder.
- property form: lxml.html.FormElement
Return default document’s form.
If form was not selected manually then select the form which has the biggest number of input elements.
The form value is just an lxml.html form element.
Example:
g.request('some URL') # Choose form automatically print g.form # And now choose form manually g.choose_form(1) print g.form
- __slots__ = ['document_type', 'code', 'head', '_bytes_body', 'headers', 'url', 'cookies', 'encoding',...
- __call__(query: str) selection.SelectorList[lxml.etree._Element]
- select(*args: Any, **kwargs: Any) selection.SelectorList[lxml.etree._Element]
- process_encoding(encoding: None | str = None) str
Process explicitly defined encoding or auto-detect it.
If encoding is explicitly defined, ensure it is a valid encoding the python can deal with. If encoding is not specified, auto-detect it.
Raises unicodec.InvalidEncodingName if explicitly set encoding is invalid.
- save(path: str) None
Save response body to file.
- url_details() urllib.parse.SplitResult
Return result of urlsplit function applied to response url.
- query_param(key: str) str
Return value of parameter in query string.
- browse() None
Save response in temporary file and open it in GUI browser.
- __getstate__() collections.abc.Mapping[str, Any]
Reset cached lxml objects which could not be pickled.
- __setstate__(state: collections.abc.Mapping[str, Any]) None
- text_search(anchor: str | bytes) bool
Search the substring in response body.
- Parameters
anchor – string to search
byte – if False then anchor should be the unicode string, and search will be performed in response.unicode_body() else anchor should be the byte-string and search will be performed in response.body
If substring is found return True else False.
- text_assert(anchor: str | bytes) None
If anchor is not found then raise DataNotFound exception.
- text_assert_any(anchors: list[str | bytes]) None
If no anchors were found then raise DataNotFound exception.
- rex_text(regexp: str | bytes | Pattern[str] | Pattern[bytes], flags: int = 0, default: Any = NULL) Any
Return content of first matching group of regexp found in response body.
- rex_search(regexp: str | bytes | Pattern[str] | Pattern[bytes], flags: int = 0, default: Any = NULL) Any
Search the regular expression in response body.
Return found match object or None
- rex_assert(rex: str | bytes | Pattern[str] | Pattern[bytes]) None
Raise DataNotFound exception if rex expression is not found.
- get_body_chunk() None | bytes
- unicode_body() None | str
Return response body as unicode string.
- set_body(body: bytes) None
- classmethod wrap_io(inp: bytes | str) StringIO | BytesIO
- classmethod _build_dom(content: bytes | str, mode: str, encoding: str) lxml.etree._Element
- build_html_tree() lxml.etree._Element
- build_xml_tree() lxml.etree._Element
- choose_form(number: None | int = None, xpath: None | str = None, name: None | str = None, **kwargs: Any) None
Set the default form.
- Parameters
number – number of form (starting from zero)
id – value of “id” attribute
name – value of “name” attribute
xpath – XPath query
- Raises
DataNotFoundif form not found- Raises
GrabMisuseErrorif method is called without parameters
Selected form will be available via form attribute of Grab instance. All form methods will work with default form.
Examples:
# Select second form g.choose_form(1) # Select by id g.choose_form(id="register") # Select by name g.choose_form(name="signup") # Select by xpath g.choose_form(xpath='//form[contains(@action, "/submit")]')
- get_cached_form() None | FormElement
Get form which has been already selected.
Returns None if form has not been selected yet.
It is for testing mainly. To not trigger pylint warnings about accessing protected element.
- set_input(name: str, value: Any) None
Set the value of form element by its name attribute.
- Parameters
name – name of element
value – value which should be set to element
To check/uncheck the checkbox pass boolean value.
Example:
g.set_input('sex', 'male') # Check the checkbox g.set_input('accept', True)
- set_input_by_id(_id: str, value: Any) None
Set the value of form element by its id attribute.
- Parameters
_id – id of element
value – value which should be set to element
- set_input_by_number(number: int, value: Any) None
Set the value of form element by its number in the form.
- Parameters
number – number of element
value – value which should be set to element
- set_input_by_xpath(xpath: str, value: Any) None
Set the value of form element by xpath.
- Parameters
xpath – xpath path
value – value which should be set to element
- process_extra_post(post_items: list[tuple[str, Any]], extra_post_items: collections.abc.Sequence[tuple[str, Any]]) list[tuple[str, Any]]
- clean_submit_controls(post: collections.abc.MutableMapping[str, Any], submit_name: None | str) None
- get_form_request(submit_name: None | str = None, url: None | str = None, extra_post: None | Mapping[str, Any] | Sequence[tuple[str, Any]] = None, remove_from_post: None | Sequence[str] = None) tuple[str, str, bool, collections.abc.Sequence[tuple[str, Any]]]
Submit default form.
- Parameters
submit_name – name of button which should be “clicked” to submit form
url – explicitly specify form action url
extra_post – (dict or list of pairs) additional form data which will override data automatically extracted from the form.
remove_from_post – list of keys to remove from the submitted data
Following input elements are automatically processed:
input[type=”hidden”] - default value
select: value of last option
radio - ???
checkbox - ???
Multipart forms are correctly recognized by grab library.
- build_fields_to_remove(fields: collections.abc.Mapping[str, Any], form_inputs: collections.abc.Sequence[lxml.html.HtmlElement]) set[str]
- process_form_fields(fields: collections.abc.MutableMapping[str, Any]) None
- form_fields() collections.abc.MutableMapping[str, lxml.html.HtmlElement]
Return fields of default form.
Fill some fields with reasonable values.
- choose_form_by_element(xpath: str) None
- exception grab.GrabError[source]
Bases:
ExceptionAll custom Grab exception should be children of that class.
- exception grab.GrabNetworkError(*args: Any, **kwargs: Any)[source]
Bases:
OriginalExceptionGrabErrorRaises in case of network error.
- exception grab.GrabTimeoutError(*args: Any, **kwargs: Any)[source]
Bases:
GrabNetworkErrorRaises when configured time is outed for the request.
- class grab.Request(method: str, url: str, *, headers: None | dict[str, Any] = None, timeout: None | int | Timeout = None, cookies: None | dict[str, Any] = None, encoding: None | str = None, proxy_type: None | str = None, proxy: None | str = None, proxy_userpwd: None | str = None, fields: Any = None, body: None | bytes = None, multipart: None | bool = None, document_type: None | str = None)[source]
- get_full_url() str
- _process_timeout_param(value: None | float | Timeout) grab.util.timeout.Timeout