Module grab.document¶
The Document class is the result of network request made with Grab instance.
- class grab.document.Document(grab=None)[source]¶
- Document (in most cases it is a network response
i.e. result of network request)
- choose_form(number=None, xpath=None, name=None, **kwargs)[source]¶
Set the default form.
- Parameters
number – number of form (starting from zero)
id – value of “id” attribute
name – value of “name” attribute
xpath – XPath query
- Raises
DataNotFoundif form not found- Raises
GrabMisuseErrorif method is called without parameters
Selected form will be available via form attribute of Grab instance. All form methods will work with default form.
Examples:
# Select second form g.choose_form(1) # Select by id g.choose_form(id="register") # Select by name g.choose_form(name="signup") # Select by xpath g.choose_form(xpath='//form[contains(@action, "/submit")]')
- detect_charset()[source]¶
Detect charset of the response.
Try following methods: * meta[name=”Http-Equiv”] * XML declaration * HTTP Content-Type header
Ignore unknown charsets.
Use utf-8 as fallback charset.
- property form¶
This attribute points to default form.
If form was not selected manually then select the form which has the biggest number of input elements.
The form value is just an lxml.html form element.
Example:
g.go('some URL') # Choose form automatically print g.form # And now choose form manually g.choose_form(1) print g.form
- get_form_request(submit_name=None, url=None, extra_post=None, remove_from_post=None)[source]¶
Submit default form.
- Parameters
submit_name – name of button which should be “clicked” to submit form
url – explicitly specify form action url
extra_post – (dict or list of pairs) additional form data which will override data automatically extracted from the form.
remove_from_post – list of keys to remove from the submitted data
Following input elements are automatically processed:
input[type=”hidden”] - default value
select: value of last option
radio - ???
checkbox - ???
Multipart forms are correctly recognized by grab library.
- property json¶
Return response body deserialized into JSON object.
- parse(charset=None, headers=None)[source]¶
Parse headers.
This method is called after Grab instance performs network request.
- property pyquery¶
Returns pyquery handler.
- rex_assert(rex, byte=False)[source]¶
If rex expression is not found then raise DataNotFound exception.
- rex_search(regexp, flags=0, byte=False, default=<object object>)[source]¶
Search the regular expression in response body.
- Parameters
byte – if False then search is performed in response.unicode_body() else the rex is searched in response.body.
- Note: if you use default non-byte mode than do not forget to build your
regular expression with re.U flag.
Return found match object or None
- rex_text(regexp, flags=0, byte=False, default=<object object>)[source]¶
Search regular expression in response body and return content of first matching group.
- Parameters
byte – if False then search is performed in response.unicode_body() else the rex is searched in response.body.
- save_hash(location, basedir, ext=None)[source]¶
Save response body into file with special path builded from hash. That allows to lower number of files per directory.
- Parameters
location – URL of file or something else. It is used to build the SHA1 hash.
basedir – base directory to save the file. Note that file will not be saved directly to this directory but to some sub-directory of basedir
ext – extension which should be appended to file name. The dot is inserted automatically between filename and extension.
- Returns
path to saved file relative to basedir
Example:
>>> url = 'http://yandex.ru/logo.png' >>> g.go(url) >>> g.response.save_hash(url, 'some_dir', ext='png') 'e8/dc/f2918108788296df1facadc975d32b361a6a.png' # the file was saved to $PWD/some_dir/e8/dc/...
TODO: replace basedir with two options: root and save_to. And returns save_to + path
- set_input(name, value)[source]¶
Set the value of form element by its name attribute.
- Parameters
name – name of element
value – value which should be set to element
To check/uncheck the checkbox pass boolean value.
Example:
g.set_input('sex', 'male') # Check the checkbox g.set_input('accept', True)
- set_input_by_id(_id, value)[source]¶
Set the value of form element by its id attribute.
- Parameters
_id – id of element
value – value which should be set to element
- set_input_by_number(number, value)[source]¶
Set the value of form element by its number in the form
- Parameters
number – number of element
value – value which should be set to element
- set_input_by_xpath(xpath, value)[source]¶
Set the value of form element by xpath
- Parameters
xpath – xpath path
value – value which should be set to element
- text_assert_any(anchors, byte=False)[source]¶
If no anchors were found then raise DataNotFound exception.
- text_search(anchor, byte=False)[source]¶
Search the substring in response body.
- Parameters
anchor – string to search
byte – if False then anchor should be the unicode string, and search will be performed in response.unicode_body() else anchor should be the byte-string and search will be performed in response.body
If substring is found return True else False.
- property tree¶
Return DOM tree of the document built with HTML DOM builder.
- unicode_body(ignore_errors=True, fix_special_entities=True)[source]¶
Return response body as unicode string.
- property xml_tree¶
Return DOM-tree of the document built with XML DOM builder.