API Reference

tinyhtml5.parse(document, namespace_html_elements=True, **kwargs)

Parse an HTML document into a tree.

Parameters:
  • document (str, bytes, pathlib.Path or file object) – The document to parse as a HTML string, filename, file-like object.

  • namespace_html_elements (bool) – Whether or not to namespace HTML elements.

Extra parameters can be provided to define possible encodings if the document is given as bytes.

Parameters:
  • override_encoding (str or bytes) – Forced encoding provided by user agent.

  • transport_encoding (str or bytes) – Encoding provided by transport layout.

  • same_origin_parent_encoding (str or bytes) – Parent document encoding.

  • likely_encoding (str or bytes) – Possible encoding provided by user agent.

  • default_encoding (str or bytes) – Encoding used as fallback.

Returns:

xml.etree.ElementTree.Element.

Example:

>>> from tinyhtml5 import parse
>>> parse('<html><body><p>This is a doc</p></body></html>')
<Element '{http://www.w3.org/1999/xhtml}html' at …>