API¶
API stability¶
Everything described here is considered “public”: this is what you can rely on. We will try to maintain backward-compatibility, and we really often do, but there is no hard promise.
Anything else should not be used outside of WeasyPrint itself. We reserve the right to change it or remove it at any point. Use it at your own risk, or have dependency to a specific WeasyPrint version.
Versioning¶
Since version 43, WeasyPrint only provides major releases and does not follow semantic versioning. This choice may look odd, but it is close to what many browsers do, including Firefox and Chrome.
Even if each version does not break the API, each version does break the way documents are rendered, which is what really matters at the end. Providing minor versions would give the illusion that developers can just update WeasyPrint without checking that everything works.
Unfortunately, we have the same problem as the other browsers: when a new version is released, most of the user’s websites are rendered exactly the same, but a small part is not. And the only ways to know that, for web developers, are to read the changelog and to check that their pages are correctly rendered.
More about this choice can be found in issue #900.
Command-line API¶
-
weasyprint.__main__.
main
(argv=sys.argv)¶ The
weasyprint
program takes at least two arguments:weasyprint [options] <input> <output>
The input is a filename or URL to an HTML document, or
-
to read HTML from stdin. The output is a filename, or-
to write to stdout.Options can be mixed anywhere before, between, or after the input and output.
-
-e
<input_encoding>
,
--encoding
<input_encoding>
¶ Force the input character encoding (e.g.
-e utf8
).
-
-f
<output_format>
,
--format
<output_format>
¶ Choose the output file format among PDF and PNG (e.g.
-f png
). Required if the output is not a.pdf
or.png
filename.
-
-s
<filename_or_URL>
,
--stylesheet
<filename_or_URL>
¶ Filename or URL of a user cascading stylesheet (see Stylesheet origins) to add to the document (e.g.
-s print.css
). Multiple stylesheets are allowed.
-
-m
<type>
,
--media-type
<type>
¶ Set the media type to use for
@media
. Defaults toprint
.
-
-r
<dpi>
,
--resolution
<dpi>
¶ For PNG output only. Set the resolution in PNG pixel per CSS inch. Defaults to 96, which means that PNG pixels match CSS pixels.
-
-u
<URL>
,
--base-url
<URL>
¶ Set the base for relative URLs in the HTML input. Defaults to the input’s own URL, or the current directory for stdin.
-
-a
<file>
,
--attachment
<file>
¶ Adds an attachment to the document. The attachment is included in the PDF output. This option can be used multiple times.
-
-p
,
--presentational-hints
¶
Follow HTML presentational hints.
-
-o
,
--optimize-images
¶
Try to optimize the size of embedded images.
-
-v
,
--verbose
¶
Show warnings and information messages.
-
-d
,
--debug
¶
Show debugging messages.
-
-q
,
--quiet
¶
Hide logging messages.
-
--version
¶
Show the version number. Other options and arguments are ignored.
-
-h
,
--help
¶
Show the command-line usage. Other options and arguments are ignored.
-
Python API¶
-
class
weasyprint.
HTML
(input, **kwargs)¶ Represents an HTML document parsed by html5lib.
You can just create an instance with a positional argument:
doc = HTML(something)
The class will try to guess if the input is a filename, an absolute URL, or a file object.Alternatively, use one named argument so that no guessing is involved:
- Parameters
filename (str or pathlib.Path) – A filename, relative to the current directory, or absolute.
url (str) – An absolute, fully qualified URL.
file_obj (file object) – Any object with a
read
method.string (str) – A string of HTML source.
Specifying multiple inputs is an error:
HTML(filename="foo.html", url="localhost://bar.html")
will raise aTypeError
.You can also pass optional named arguments:
- Parameters
encoding (str) – Force the source character encoding.
base_url (str) – The base used to resolve relative URLs (e.g. in
<img src="../foo.png">
). If not provided, try to use the input filename, URL, orname
attribute of file objects.url_fetcher (function) – A function or other callable with the same signature as
default_url_fetcher()
called to fetch external resources such as stylesheets and images. (See URL fetchers.)media_type (str) – The media type to use for
@media
. Defaults to'print'
. Note: In some cases likeHTML(string=foo)
relative URLs will be invalid ifbase_url
is not provided.
-
render
(stylesheets=None, enable_hinting=False, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶ Lay out and paginate the document, but do not (yet) export it to PDF or PNG.
This returns a
Document
object which provides access to individual pages and various meta-data. Seewrite_pdf()
to get a PDF directly.New in version 0.15.
- Parameters
stylesheets (list) – An optional list of user stylesheets. List elements are
CSS
objects, filenames, URLs, or file objects. (See Stylesheet origins.)enable_hinting (bool) – Whether text, borders and background should be hinted to fall at device pixel boundaries. Should be enabled for pixel-based output (like PNG) but not for vector-based output (like PDF).
presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (
FontConfiguration
) – A font configuration handling@font-face
rules.counter_style (
CounterStyle
) – A dictionary storing@counter-style
rules.image_cache (dict) – A dictionary used to cache images.
- Returns
A
Document
object.
-
write_pdf
(target=None, stylesheets=None, zoom=1, attachments=None, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶ Render the document to a PDF file.
This is a shortcut for calling
render()
, thenDocument.write_pdf()
.- Parameters
target (str, pathlib.Path or file object) – A filename where the PDF file is generated, a file object, or
None
.stylesheets (list) – An optional list of user stylesheets. The list’s elements are
CSS
objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)zoom (float) – The zoom factor in PDF units per CSS units. Warning: All CSS units are affected, including physical units like
cm
and named sizes likeA4
. For values other than 1, the physical CSS units will thus be “wrong”.attachments (list) – A list of additional file attachments for the generated PDF document or
None
. The list’s elements areAttachment
objects, filenames, URLs or file-like objects.presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (
FontConfiguration
) – A font configuration handling@font-face
rules.counter_style (
CounterStyle
) – A dictionary storing@counter-style
rules.image_cache (dict) – A dictionary used to cache images.
- Returns
The PDF as
bytes
iftarget
is not provided orNone
, otherwiseNone
(the PDF is written totarget
).
-
write_image_surface
(stylesheets=None, resolution=96, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶ Render pages vertically on a cairo image surface.
New in version 0.17.
There is no decoration around pages other than those specified in CSS with
@page
rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.This is a shortcut for calling
render()
, thenDocument.write_image_surface()
.- Parameters
stylesheets (list) – An optional list of user stylesheets. The list’s elements are
CSS
objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS
px
unit.presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (
FontConfiguration
) – A font configuration handling@font-face
rules.counter_style (
CounterStyle
) – A dictionary storing@counter-style
rules.image_cache (dict) – A dictionary used to cache images.
- Returns
A cairo
ImageSurface
.
-
write_png
(target=None, stylesheets=None, resolution=96, presentational_hints=False, optimize_images=False, font_config=None, counter_style=None, image_cache=None)¶ Paint the pages vertically to a single PNG image.
There is no decoration around pages other than those specified in CSS with
@page
rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.This is a shortcut for calling
render()
, thenDocument.write_png()
.- Parameters
target (str, pathlib.Path or file object) – A filename where the PNG file is generated, a file object, or
None
.stylesheets (list) – An optional list of user stylesheets. The list’s elements are
CSS
objects, filenames, URLs, or file-like objects. (See Stylesheet origins.)resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS
px
unit.presentational_hints (bool) – Whether HTML presentational hints are followed.
optimize_images (bool) – Try to optimize the size of embedded images.
font_config (
FontConfiguration
) – A font configuration handling@font-face
rules.counter_style (
CounterStyle
) – A dictionary storing@counter-style
rules.image_cache (dict) – A dictionary used to cache images.
- Returns
The image as
bytes
iftarget
is not provided orNone
, otherwiseNone
(the image is written totarget
.)
-
class
weasyprint.
CSS
(input, **kwargs)¶ Represents a CSS stylesheet parsed by tinycss2.
An instance is created in the same way as
HTML
, with the same arguments.An additional argument called
font_config
must be provided to handle@font-config
rules. The samefonts.FontConfiguration
object must be used for differentCSS
objects applied to the same document.CSS
objects have no public attributes or methods. They are only meant to be used in thewrite_pdf()
,write_png()
andrender()
methods ofHTML
objects.
-
class
weasyprint.
Attachment
(input, **kwargs)¶ Represents a file attachment for a PDF document.
New in version 0.22.
An instance is created in the same way as
HTML
, except that the HTML specific arguments (encoding
andmedia_type
) are not supported. An optional description can be provided with thedescription
argument.- Parameters
description – A description of the attachment to be included in the PDF document. May be
None
.
-
weasyprint.
default_url_fetcher
(url, timeout=10, ssl_context=None)¶ Fetch an external resource such as an image or stylesheet.
Another callable with the same signature can be given as the
url_fetcher
argument toHTML
orCSS
. (See URL fetchers.)- Parameters
url (str) – The URL of the resource to fetch.
timeout (int) – The number of seconds before HTTP requests are dropped.
ssl_context (ssl.SSLContext) – An SSL context used for HTTP requests.
- Raises
An exception indicating failure, e.g.
ValueError
on syntactically invalid URL.- Returns
A
dict
with the following keys:One of
string
(abytestring
) orfile_obj
(a file object).Optionally:
mime_type
, a MIME type extracted e.g. from a Content-Type header. If not provided, the type is guessed from the file extension in the URL.Optionally:
encoding
, a character encoding extracted e.g. from a charset parameter in a Content-Type headerOptionally:
redirected_url
, the actual URL of the resource if there were e.g. HTTP redirects.Optionally:
filename
, the filename of the resource. Usually derived from the filename parameter in a Content-Disposition header
If a
file_obj
key is given, it is the caller’s responsibility to callfile_obj.close()
. The default function used internally to fetch data in WeasyPrint tries to close the file object after retreiving; but if this URL fetcher is used elsewhere, the file object has to be closed manually.
-
class
weasyprint.document.
Document
(pages, metadata, url_fetcher, font_config)¶ A rendered document ready to be painted on a cairo surface.
Typically obtained from
HTML.render()
, but can also be instantiated directly with a list ofpages
, a set ofmetadata
, aurl_fetcher
function, and afont_config
.-
metadata
¶ A
DocumentMetadata
object. Contains information that does not belong to a specific page but to the whole document.
-
url_fetcher
¶ A function or other callable with the same signature as
default_url_fetcher()
called to fetch external resources such as stylesheets and images. (See URL fetchers.)
-
copy
(pages='all')¶ Take a subset of the pages.
New in version 0.15.
- Parameters
- Returns
A new
Document
object.
Examples:
Write two PDF files for odd-numbered and even-numbered pages:
# Python lists count from 0 but pages are numbered from 1. # [::2] is a slice of even list indexes but odd-numbered pages. document.copy(document.pages[::2]).write_pdf('odd_pages.pdf') document.copy(document.pages[1::2]).write_pdf('even_pages.pdf')
Write each page to a numbred PNG file:
for i, page in enumerate(document.pages): document.copy(page).write_png('page_%s.png' % i)
Combine multiple documents into one PDF file, using metadata from the first:
all_pages = [p for doc in documents for p in doc.pages] documents[0].copy(all_pages).write_pdf('combined.pdf')
-
resolve_links
()¶ Resolve internal hyperlinks.
New in version 0.15.
Links to a missing anchor are removed with a warning.
If multiple anchors have the same name, the first one is used.
- Returns
A generator yielding lists (one per page) like
Page.links
, except thattarget
for internal hyperlinks is(page_number, x, y)
instead of an anchor name. The page number is a 0-based index into thepages
list, andx, y
are in CSS pixels from the top-left of the page.
-
make_bookmark_tree
()¶ Make a tree of all bookmarks in the document.
New in version 0.15.
- Returns
A list of bookmark subtrees. A subtree is
(label, target, children, state)
.label
is a string,target
is(page_number, x, y)
like inresolve_links()
, andchildren
is a list of child subtrees.
-
add_hyperlinks
(links, anchors, context, scale)¶ Include hyperlinks in current PDF page.
New in version 43.
-
write_pdf
(target=None, zoom=1, attachments=None, finisher=None)¶ Paint the pages in a PDF file, with meta-data.
PDF files written directly by cairo do not have meta-data such as bookmarks/outlines and hyperlinks.
- Parameters
target (str, pathlib.Path or file object) – A filename where the PDF file is generated, a file object, or
None
.zoom (float) – The zoom factor in PDF units per CSS units. Warning: All CSS units are affected, including physical units like
cm
and named sizes likeA4
. For values other than 1, the physical CSS units will thus be “wrong”.attachments (list) – A list of additional file attachments for the generated PDF document or
None
. The list’s elements areAttachment
objects, filenames, URLs or file-like objects.finisher – A finisher function that accepts a PDFFile instance as its only parameter can be passed to perform post-processing on the PDF right before the trailer is written. The function is then responsible for calling the instances finish() function.
- Returns
The PDF as
bytes
iftarget
is not provided orNone
, otherwiseNone
(the PDF is written totarget
).
-
write_image_surface
(resolution=96)¶ Render pages on a cairo image surface.
New in version 0.17.
There is no decoration around pages other than those specified in CSS with
@page
rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.- Parameters
resolution (float) – The output resolution in PNG pixels per CSS inch. At 96 dpi (the default), PNG pixels match the CSS
px
unit.- Returns
A
(surface, png_width, png_height)
tuple.surface
is a cairoImageSurface
.png_width
andpng_height
are the size of the final image, in PNG pixels.
-
write_png
(target=None, resolution=96)¶ Paint the pages vertically to a single PNG image.
There is no decoration around pages other than those specified in CSS with
@page
rules. The final image is as wide as the widest page. Each page is below the previous one, centered horizontally.
-
-
class
weasyprint.document.
DocumentMetadata
¶ Meta-information belonging to a whole
Document
.New in version 0.20.
New attributes may be added in future versions of WeasyPrint.
-
title
¶ The title of the document, as a string or
None
. Extracted from the<title>
element in HTML and written to the/Title
info field in PDF.
The authors of the document, as a list of strings. (Defaults to the empty list.) Extracted from the
<meta name=author>
elements in HTML and written to the/Author
info field in PDF.
-
description
¶ The description of the document, as a string or
None
. Extracted from the<meta name=description>
element in HTML and written to the/Subject
info field in PDF.
-
keywords
¶ Keywords associated with the document, as a list of strings. (Defaults to the empty list.) Extracted from
<meta name=keywords>
elements in HTML and written to the/Keywords
info field in PDF.
-
generator
¶ The name of one of the software packages used to generate the document, as a string or
None
. Extracted from the<meta name=generator>
element in HTML and written to the/Creator
info field in PDF.
-
created
¶ The creation date of the document, as a string or
None
. Dates are in one of the six formats specified in W3C’s profile of ISO 8601. Extracted from the<meta name=dcterms.created>
element in HTML and written to the/CreationDate
info field in PDF.
-
modified
¶ The modification date of the document, as a string or
None
. Dates are in one of the six formats specified in W3C’s profile of ISO 8601. Extracted from the<meta name=dcterms.modified>
element in HTML and written to the/ModDate
info field in PDF.
-
-
class
weasyprint.document.
Page
¶ Represents a single rendered page.
New in version 0.15.
Should be obtained from
Document.pages
but not instantiated directly.-
width
¶ The page width, including margins, in CSS pixels.
-
height
¶ The page height, including margins, in CSS pixels.
-
bleed
¶ The page bleed widths as a
dict
with'top'
,'right'
,'bottom'
and'left'
as keys, and values in CSS pixels.
-
bookmarks
¶ The
list
of(bookmark_level, bookmark_label, target)
tuples
.bookmark_level
andbookmark_label
are respectively anint
and astring
, based on the CSS properties of the same names.target
is an(x, y)
point in CSS pixels from the top-left of the page.
-
links
¶ The
list
of(link_type, target, rectangle)
tuples
. Arectangle
is(x, y, width, height)
, in CSS pixels from the top-left of the page.link_type
is one of three strings:'external'
:target
is an absolute URL'internal'
:target
is an anchor name (seePage.anchors
). The anchor might be defined in another page, in multiple pages (in which case the first occurence is used), or not at all.'attachment'
:target
is an absolute URL and points to a resource to attach to the document.
-
anchors
¶ The
dict
mapping each anchor name to its target, an(x, y)
point in CSS pixels from the top-left of the page.
-
paint
(cairo_context, left_x=0, top_y=0, scale=1, clip=False)¶ Paint the page in cairo, on any type of surface.
- Parameters
cairo_context (
cairocffi.Context
) – Any cairo context object.left_x (float) – X coordinate of the left of the page, in cairo user units.
top_y (float) – Y coordinate of the top of the page, in cairo user units.
scale (float) – Zoom scale in cairo user units per CSS pixel.
clip (bool) – Whether to clip/cut content outside the page. If false or not provided, content can overflow.
-
-
class
weasyprint.fonts.
FontConfiguration
¶ A FreeType font configuration.
New in version 0.32.
Keep a list of fonts, including fonts installed on the system, fonts installed for the current user, and fonts referenced by cascading stylesheets.
When created, an instance of this class gathers available fonts. It can then be given to
weasyprint.HTML
methods or toweasyprint.CSS
to find fonts in@font-face
rules.
-
class
weasyprint.css.counters.
CounterStyle
¶ Counter styles dictionary.
New in version 0.52.
Keep a list of counter styles defined by @counter-style rules, indexed by their names.