API¶

API stability¶

Everything described here is considered “public”: this is what you can rely on. We will try to maintain backward-compatibility, and we really often do, but there is no hard promise.

Anything else should not be used outside of WeasyPrint itself. We reserve the right to change it or remove it at any point. Use it at your own risk, or have dependency to a specific WeasyPrint version.

Versioning¶

Since version 43, WeasyPrint only provides major releases and does not follow semantic versioning. This choice may look odd, but it is close to what many browsers do, including Firefox and Chrome.

Even if each version does not break the API, each version does break the way documents are rendered, which is what really matters at the end. Providing minor versions would give the illusion that developers can just update WeasyPrint without checking that everything works.

Unfortunately, we have the same problem as the other browsers: when a new version is released, most of the user’s websites are rendered exactly the same, but a small part is not. And the only ways to know that, for web developers, are to read the changelog and to check that their pages are correctly rendered.

More about this choice can be found in issue #900.

Command-line API¶

weasyprint.__main__.main(argv=sys.argv)¶

The weasyprint program takes at least two arguments:

weasyprint [options] <input> <output>

The input is a filename or URL to an HTML document, or - to read HTML from stdin. The output is a filename, or - to write to stdout.

Options can be mixed anywhere before, between, or after the input and output.

-e <input_encoding>, --encoding <input_encoding>¶: Force the input character encoding (e.g. -e utf8).

-f <output_format>, --format <output_format>¶: Choose the output file format among PDF and PNG (e.g. -f png). Required if the output is not a .pdf or .png filename.

-s <filename_or_URL>, --stylesheet <filename_or_URL>¶: Filename or URL of a user cascading stylesheet (see Stylesheet origins) to add to the document (e.g. -s print.css). Multiple stylesheets are allowed.

-m <type>, --media-type <type>¶: Set the media type to use for @media. Defaults to print.

-r <dpi>, --resolution <dpi>¶: For PNG output only. Set the resolution in PNG pixel per CSS inch. Defaults to 96, which means that PNG pixels match CSS pixels.

-u <URL>, --base-url <URL>¶: Set the base for relative URLs in the HTML input. Defaults to the input’s own URL, or the current directory for stdin.

-a <file>, --attachment <file>¶: Adds an attachment to the document. The attachment is included in the PDF output. This option can be used multiple times.

-p, --presentational-hints¶: Follow HTML presentational hints.

-o, --optimize-images¶: Try to optimize the size of embedded images.

-v, --verbose¶: Show warnings and information messages.

-d, --debug¶: Show debugging messages.

-q, --quiet¶: Hide logging messages.

--version¶: Show the version number. Other options and arguments are ignored.

-h, --help¶: Show the command-line usage. Other options and arguments are ignored.

Python API¶

class weasyprint.HTML(input, **kwargs)¶

Represents an HTML document parsed by html5lib.

You can just create an instance with a positional argument: doc = HTML(something) The class will try to guess if the input is a filename, an absolute URL, or a file object.

Alternatively, use one named argument so that no guessing is involved:

Parameters

filename (str or pathlib.Path) – A filename, relative to the current directory, or absolute.
url (str) – An absolute, fully qualified URL.
file_obj (file object) – Any object with a read method.
string (str) – A string of HTML source.

Specifying multiple inputs is an error: HTML(filename="foo.html", url="localhost://bar.html") will raise a TypeError.

You can also pass optional named arguments:

Parameters

encoding (str) – Force the source character encoding.
base_url (str) – The base used to resolve relative URLs (e.g. in <img src="../foo.png">). If not provided, try to use the input filename, URL, or name attribute of file objects.
url_fetcher (function) – A function or other callable with the same signature as default_url_fetcher() called to fetch external resources such as stylesheets and images. (See URL fetchers.)
media_type (str) – The media type to use for @media. Defaults to 'print'. Note: In some cases like HTML(string=foo) relative URLs will be invalid if base_url is not provided.

render(stylesheets=None, enable_hinting=False, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶

Lay out and paginate the document, but do not (yet) export it to PDF or PNG.

This returns a Document object which provides access to individual pages and various meta-data. See write_pdf() to get a PDF directly.

New in version 0.15.

Parameters

stylesheets (list) – An optional list of user stylesheets. List elements are CSS objects, filenames, URLs, or file objects. (See Stylesheet origins.)
enable_hinting (bool) – Whether text, borders and background should be hinted to fall at device pixel boundaries. Should be enabled for pixel-based output (like PNG) but not for vector-based output (like PDF).
presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (FontConfiguration) – A font configuration handling @font-face rules.
counter_style (CounterStyle) – A dictionary storing @counter-style rules.
image_cache (dict) – A dictionary used to cache images.

Returns

A Document object.

write_pdf(target=None, stylesheets=None, zoom=1, attachments=None, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶

Render the document to a PDF file.

This is a shortcut for calling render(), then Document.write_pdf().

Parameters

target (str, pathlib.Path or file object) – A filename where the PDF file is generated, a file object, or None.
stylesheets (list) – An optional list of user stylesheets. The list’s elements are CSS objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)
zoom (float) – The zoom factor in PDF units per CSS units. Warning: All CSS units are affected, including physical units like cm and named sizes like A4. For values other than 1, the physical CSS units will thus be “wrong”.
attachments (list) – A list of additional file attachments for the generated PDF document or None. The list’s elements are Attachment objects, filenames, URLs or file-like objects.
presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (FontConfiguration) – A font configuration handling @font-face rules.
counter_style (CounterStyle) – A dictionary storing @counter-style rules.
image_cache (dict) – A dictionary used to cache images.

Returns

The PDF as bytes if target is not provided or None, otherwise None (the PDF is written to target).

write_image_surface(stylesheets=None, resolution=96, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶

Render pages vertically on a cairo image surface.

New in version 0.17.

There is no decoration around pages other than those specified in CSS with @page rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.

This is a shortcut for calling render(), then Document.write_image_surface().

Parameters

stylesheets (list) – An optional list of user stylesheets. The list’s elements are CSS objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)
resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS px unit.
presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (FontConfiguration) – A font configuration handling @font-face rules.
counter_style (CounterStyle) – A dictionary storing @counter-style rules.
image_cache (dict) – A dictionary used to cache images.

Returns

A cairo ImageSurface.

write_png(target=None, stylesheets=None, resolution=96, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶

Paint the pages vertically to a single PNG image.

There is no decoration around pages other than those specified in CSS with @page rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.

This is a shortcut for calling render(), then Document.write_png().

Parameters

target (str, pathlib.Path or file object) – A filename where the PNG file is generated, a file object, or None.
stylesheets (list) – An optional list of user stylesheets. The list’s elements are CSS objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)
resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS px unit.
presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (FontConfiguration) – A font configuration handling @font-face rules.
counter_style (CounterStyle) – A dictionary storing @counter-style rules.
image_cache (dict) – A dictionary used to cache images.

Returns

The image as bytes if target is not provided or None, otherwise None (the image is written to target.)

class weasyprint.CSS(input, **kwargs)¶

Represents a CSS stylesheet parsed by tinycss2.

An instance is created in the same way as HTML, with the same arguments.

An additional argument called font_config must be provided to handle @font-config rules. The same fonts.FontConfiguration object must be used for different CSS objects applied to the same document.

CSS objects have no public attributes or methods. They are only meant to be used in the write_pdf(), write_png() and render() methods of HTML objects.

class weasyprint.Attachment(input, **kwargs)¶

Represents a file attachment for a PDF document.

New in version 0.22.

An instance is created in the same way as HTML, except that the HTML specific arguments (encoding and media_type) are not supported. An optional description can be provided with the description argument.

Parameters: description – A description of the attachment to be included in the PDF document. May be None.

weasyprint.default_url_fetcher(url, timeout=10, ssl_context=None)¶

Fetch an external resource such as an image or stylesheet.

Another callable with the same signature can be given as the url_fetcher argument to HTML or CSS. (See URL fetchers.)

Parameters

url (str) – The URL of the resource to fetch.
timeout (int) – The number of seconds before HTTP requests are dropped.
ssl_context (ssl.SSLContext) – An SSL context used for HTTP requests.

Raises

An exception indicating failure, e.g. ValueError on syntactically invalid URL.

Returns

A dict with the following keys:

One of string (a bytestring) or file_obj (a file object).
Optionally: mime_type, a MIME type extracted e.g. from a Content-Type header. If not provided, the type is guessed from the file extension in the URL.
Optionally: encoding, a character encoding extracted e.g. from a charset parameter in a Content-Type header
Optionally: redirected_url, the actual URL of the resource if there were e.g. HTTP redirects.
Optionally: filename, the filename of the resource. Usually derived from the filename parameter in a Content-Disposition header

If a file_obj key is given, it is the caller’s responsibility to call file_obj.close(). The default function used internally to fetch data in WeasyPrint tries to close the file object after retreiving; but if this URL fetcher is used elsewhere, the file object has to be closed manually.

class weasyprint.document.Document(pages, metadata, url_fetcher, font_config)¶

A rendered document ready to be painted on a cairo surface.

Typically obtained from HTML.render(), but can also be instantiated directly with a list of pages, a set of metadata, a url_fetcher function, and a font_config.

pages¶: A list of Page objects.

metadata¶: A DocumentMetadata object. Contains information that does not belong to a specific page but to the whole document.

url_fetcher¶: A function or other callable with the same signature as default_url_fetcher() called to fetch external resources such as stylesheets and images. (See URL fetchers.)

copy(pages='all')¶

Take a subset of the pages.

New in version 0.15.

Parameters: pages (iterable) – An iterable of Page objects from pages.
Returns: A new Document object.

Examples:

Write two PDF files for odd-numbered and even-numbered pages:

# Python lists count from 0 but pages are numbered from 1.
# [::2] is a slice of even list indexes but odd-numbered pages.
document.copy(document.pages[::2]).write_pdf('odd_pages.pdf')
document.copy(document.pages[1::2]).write_pdf('even_pages.pdf')

Write each page to a numbred PNG file:

for i, page in enumerate(document.pages):
    document.copy(page).write_png('page_%s.png' % i)

Combine multiple documents into one PDF file, using metadata from the first:

all_pages = [p for doc in documents for p in doc.pages]
documents[0].copy(all_pages).write_pdf('combined.pdf')

resolve_links()¶

Resolve internal hyperlinks.

New in version 0.15.

Links to a missing anchor are removed with a warning.

If multiple anchors have the same name, the first one is used.

Returns: A generator yielding lists (one per page) like Page.links, except that target for internal hyperlinks is (page_number, x, y) instead of an anchor name. The page number is a 0-based index into the pages list, and x, y are in CSS pixels from the top-left of the page.

make_bookmark_tree()¶

Make a tree of all bookmarks in the document.

New in version 0.15.

Returns: A list of bookmark subtrees. A subtree is (label, target, children, state). label is a string, target is (page_number, x, y) like in resolve_links(), and children is a list of child subtrees.

add_hyperlinks(links, anchors, context, scale)¶: Include hyperlinks in current PDF page.

New in version 43.

write_pdf(target=None, zoom=1, attachments=None, finisher=None)¶

Paint the pages in a PDF file, with meta-data.

PDF files written directly by cairo do not have meta-data such as bookmarks/outlines and hyperlinks.

Parameters

target (str, pathlib.Path or file object) – A filename where the PDF file is generated, a file object, or None.
zoom (float) – The zoom factor in PDF units per CSS units. Warning: All CSS units are affected, including physical units like cm and named sizes like A4. For values other than 1, the physical CSS units will thus be “wrong”.
attachments (list) – A list of additional file attachments for the generated PDF document or None. The list’s elements are Attachment objects, filenames, URLs or file-like objects.
finisher – A finisher function that accepts a PDFFile instance as its only parameter can be passed to perform post-processing on the PDF right before the trailer is written. The function is then responsible for calling the instances finish() function.

Returns

The PDF as bytes if target is not provided or None, otherwise None (the PDF is written to target).

write_image_surface(resolution=96)¶

Render pages on a cairo image surface.

New in version 0.17.

There is no decoration around pages other than those specified in CSS with @page rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.

Parameters: resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS px unit.
Returns: A (surface, png_width, png_height) tuple. surface is a cairo ImageSurface. png_width and png_height are the size of the final image, in PNG pixels.

write_png(target=None, resolution=96)¶

Paint the pages vertically to a single PNG image.

There is no decoration around pages other than those specified in CSS with @page rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.

Parameters

target – A filename, file-like object, or None.
resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS px unit.

Returns

A (png_bytes, png_width, png_height) tuple. png_bytes is a byte string if target is None, otherwise None (the image is written to target). png_width and png_height are the size of the final image, in PNG pixels.

class weasyprint.document.DocumentMetadata¶

Meta-information belonging to a whole Document.

New in version 0.20.

New attributes may be added in future versions of WeasyPrint.

title¶: The title of the document, as a string or None. Extracted from the <title> element in HTML and written to the /Title info field in PDF.

authors¶: The authors of the document, as a list of strings. (Defaults to the empty list.) Extracted from the <meta name=author> elements in HTML and written to the /Author info field in PDF.

description¶: The description of the document, as a string or None. Extracted from the <meta name=description> element in HTML and written to the /Subject info field in PDF.

keywords¶: Keywords associated with the document, as a list of strings. (Defaults to the empty list.) Extracted from <meta name=keywords> elements in HTML and written to the /Keywords info field in PDF.

generator¶: The name of one of the software packages used to generate the document, as a string or None. Extracted from the <meta name=generator> element in HTML and written to the /Creator info field in PDF.

created¶: The creation date of the document, as a string or None. Dates are in one of the six formats specified in W3C’s profile of ISO 8601. Extracted from the <meta name=dcterms.created> element in HTML and written to the /CreationDate info field in PDF.

modified¶: The modification date of the document, as a string or None. Dates are in one of the six formats specified in W3C’s profile of ISO 8601. Extracted from the <meta name=dcterms.modified> element in HTML and written to the /ModDate info field in PDF.

attachments¶: File attachments, as a list of tuples of URL and a description or None. (Defaults to the empty list.) Extracted from the <link rel=attachment> elements in HTML and written to the /EmbeddedFiles dictionary in PDF.

New in version 0.22.

class weasyprint.document.Page¶

Represents a single rendered page.

New in version 0.15.

Should be obtained from Document.pages but not instantiated directly.

width¶: The page width, including margins, in CSS pixels.

height¶: The page height, including margins, in CSS pixels.

bleed¶: The page bleed widths as a dict with 'top', 'right', 'bottom' and 'left' as keys, and values in CSS pixels.

bookmarks¶: The list of (bookmark_level, bookmark_label, target) tuples. bookmark_level and bookmark_label are respectively an int and a string, based on the CSS properties of the same names. target is an (x, y) point in CSS pixels from the top-left of the page.

links¶

The list of (link_type, target, rectangle) tuples. A rectangle is (x, y, width, height), in CSS pixels from the top-left of the page. link_type is one of three strings:

'external': target is an absolute URL
'internal': target is an anchor name (see Page.anchors). The anchor might be defined in another page, in multiple pages (in which case the first occurence is used), or not at all.
'attachment': target is an absolute URL and points to a resource to attach to the document.

anchors¶: The dict mapping each anchor name to its target, an (x, y) point in CSS pixels from the top-left of the page.

paint(cairo_context, left_x=0, top_y=0, scale=1, clip=False)¶

Paint the page in cairo, on any type of surface.

Parameters

cairo_context (cairocffi.Context) – Any cairo context object.
left_x (float) – X coordinate of the left of the page, in cairo user units.
top_y (float) – Y coordinate of the top of the page, in cairo user units.
scale (float) – Zoom scale in cairo user units per CSS pixel.
clip (bool) – Whether to clip/cut content outside the page. If false or not provided, content can overflow.

class weasyprint.fonts.FontConfiguration¶

A FreeType font configuration.

New in version 0.32.

Keep a list of fonts, including fonts installed on the system, fonts installed for the current user, and fonts referenced by cascading stylesheets.

When created, an instance of this class gathers available fonts. It can then be given to weasyprint.HTML methods or to weasyprint.CSS to find fonts in @font-face rules.

class weasyprint.css.counters.CounterStyle¶

Counter styles dictionary.

New in version 0.52.

Keep a list of counter styles defined by @counter-style rules, indexed by their names.

See https://www.w3.org/TR/css-counter-styles-3/.