API Reference

class cssselect2.Matcher

A CSS selectors storage that can match against HTML elements.

add_selector(selector, payload)

Add a selector and its payload to the matcher.

Parameters
match(element)

Match selectors against the given element.

Parameters

element – An ElementWrapper.

Returns

A list of the payload objects associated to selectors that match element, in order of lowest to highest compiler.CompiledSelector specificity and in order of addition with add_selector() among selectors of equal specificity.

cssselect2.compile_selector_list(input, namespaces=None)

Compile a (comma-separated) list of selectors.

Parameters
  • input – A string, or an iterable of tinycss2 component values such as the tinycss2.ast.QualifiedRule.prelude of a style rule.

  • namespaces – A optional dictionary of all namespace prefix declarations in scope for this selector. Keys are namespace prefixes as strings, or None for the default namespace. Values are namespace URLs as strings. If omitted, assume that no prefix is declared.

Returns

A list of opaque compiler.CompiledSelector objects.

class cssselect2.ElementWrapper(etree_element, parent, index, previous, in_html_document, content_language=None)

Wrapper of xml.etree.ElementTree.Element for Selector matching.

This class should not be instanciated directly. from_xml_root() or from_html_root() should be used for the root element of a document, and other elements should be accessed (and wrappers generated) using methods such as iter_children() and iter_subtree().

ElementWrapper objects compare equal if their underlying xml.etree.ElementTree.Element do.

property ancestors

Tuple of existing ancestors.

Tuple of existing ElementWrapper objects for this element’s ancestors, in reversed tree order, from parent to the root.

classes

The classes of this element, as a set of strings.

etree_children

Children as a list of xml.etree.ElementTree.Element.

Other ElementTree nodes such as comments and processing instructions are not included.

etree_element

The underlying ElementTree xml.etree.ElementTree.Element

etree_siblings

The parent’s children as a list of ElementTree xml.etree.ElementTree.Elements. For the root (which has no parent)

classmethod from_html_root(root, content_language=None)

Same as from_xml_root() with case-insensitive attribute names.

Useful for documents parsed with an HTML parser like html5lib, which should be the case of documents with the text/html MIME type.

classmethod from_xml_root(root, content_language=None)

Wrap for selector matching the root of an XML or XHTML document.

Parameters

root – An ElementTree xml.etree.ElementTree.Element for the root element of a document. If the given element is not the root, selector matching will behave is if it were. In other words, selectors will be not be scoped to the subtree rooted at that element.

Returns

A new ElementWrapper

id

The ID of this element, as a string.

index

The position within the parent’s children, counting from 0. e.etree_siblings[e.index] is always e.etree_element.

iter_ancestors()

Iterate over ancestors.

Return an iterator of existing ElementWrapper objects for this element’s ancestors, in reversed tree order (from parent to the root).

The element itself is not included, this is an empty sequence for the root element.

This method is deprecated and will be removed in version 0.7.0. Use ancestors instead.

iter_children()

Iterate over children.

Return an iterator of newly-created ElementWrapper objects for this element’s child elements, in tree order.

iter_next_siblings()

Iterate over next siblings.

Return an iterator of newly-created ElementWrapper objects for this element’s next siblings, in tree order.

iter_previous_siblings()

Iterate over previous siblings.

Return an iterator of existing ElementWrapper objects for this element’s previous siblings, in reversed tree order.

The element itself is not included, this is an empty sequence for a first child or the root element.

This method is deprecated and will be removed in version 0.7.0. Use previous_siblings instead.

iter_siblings()

Iterate over siblings.

Return an iterator of newly-created ElementWrapper objects for this element’s siblings, in tree order.

iter_subtree()

Iterate over subtree.

Return an iterator of newly-created ElementWrapper objects for the entire subtree rooted at this element, in tree order.

Unlike in other methods, the element itself is included.

This loops over an entire document:

for element in ElementWrapper.from_root(root_etree).iter_subtree():
    ...
lang

The language of this element, as a string.

local_name

The local name of this element, as a string.

matches(*selectors)

Return wether this elememt matches any of the given selectors.

Parameters

selectors – Each given selector is either a compiler.CompiledSelector, or an argument to compile_selector_list().

namespace_url

The namespace URL of this element, as a string.

parent

The parent ElementWrapper, or None for the root element.

previous

The previous sibling ElementWrapper, or None for the root element.

property previous_siblings

Tuple of previous siblings.

Tuple of existing ElementWrapper objects for this element’s previous siblings, in reversed tree order.

query(*selectors)

Return first element that matches any of given selectors.

Parameters

selectors – Each given selector is either a compiler.CompiledSelector, or an argument to compile_selector_list().

Returns

A newly-created ElementWrapper object, or None if there is no match.

query_all(*selectors)

Return elements, in tree order, that match any of given selectors.

Selectors are scoped to the subtree rooted at this element.

Parameters

selectors – Each given selector is either a compiler.CompiledSelector, or an argument to compile_selector_list().

Returns

An iterator of newly-created ElementWrapper objects.

class cssselect2.SelectorError

A specialized ValueError for invalid selectors.

class cssselect2.compiler.CompiledSelector(parsed_selector)

Abstract representation of a selector.