Base (Private) Module: objects/_pageobject.py

Purpose:

This module provides the implementation for the PageObject object.

Platform:

Linux/Windows | Python 3.10+

Developer:

J Berendt

Email:

development@s3dev.uk

Comments:

n/a

class PageObject(content: str = '', pageno: int = 0, parser: object = None)[source]

Bases: object

This class provides the implementation for the PageObject.

For each page in a document, an instance of this class is created, populated and appended into the document’s pages list attribute.

Parameters:
  • content (str, optional) – Page content as a single string. Defaults to ‘’.

  • pageno (int, optional) – Page number. Defaults to 0.

  • parser (object, optional) – The underlying document parser object. Defaults to None.

property content: str

Accessor to the page’s textual content.

property hastext: bool

Flag indicating if the content attribute is populated.

property pageno: int

Accessor to the page number.

Note

This is the page number with regard to the page’s sequence in the overall document. This is not guaranteed to be the page’s number per the document’s page labeling scheme.

property parser: object

Accessor to the document parser’s internal functionality.

Note

The population of this property is determined by the document-type-specific docp parser. If the underlying parsing library has functionality worth preserving and making available to the user, it is stored to this property. Otherwise, this property will remain as None.

property tables: list

Accessor to the page’s tables, if parsed.

show() pdfplumber.display.PageImage[source]

Display the page as an image.

Additionally, the return value exposes access to the underlying pdfplumber debugging visualisation methods such as:

  • img.debug_tablefinder()

  • img.draw_*()

  • img.outline_chars()

  • img.outline_words()

  • img.reset()

  • etc.