HTML Sanitizer API

Limited availability

This feature is not Baseline because it does not work in some of the most widely-used browsers.

The HTML Sanitizer API allows developers to take strings of HTML and filter out unwanted elements, attributes, and other HTML entities when they are inserted into the DOM or a shadow DOM.

Concepts and usage

Web applications often need to work with untrusted HTML on the client side, for example, as part of a client-side templating solution, when rendering user generated content, or if including data in a frame from another site.

Injecting untrusted HTML can make a site vulnerable to various types of attacks. In particular, cross-site scripting (XSS) attacks work by injecting untrusted HTML into the DOM that then executes JavaScript in the context of the current origin — allowing malicious code to run as though it was served from the site's origin. These attacks can be mitigated by removing unsafe HTML elements and attributes before they are injected into the DOM.

The HTML Sanitizer API provides a number of methods for removing unwanted HTML entities from HTML input before it is injected into the DOM. These come in XSS-safe versions that enforce removal of all unsafe elements and attributes, and potentially unsafe versions that give developers full control over the HTML entities that are allowed.

Sanitization methods

The HTML Sanitizer API provides XSS-safe and XSS-unsafe methods for injecting HTML strings into an Element or a ShadowRoot, and for parsing HTML into a Document.

Safe methods: Element.setHTML(), ShadowRoot.setHTML(), and Document.parseHTML().
Unsafe methods: Element.setHTMLUnsafe(), ShadowRoot.setHTMLUnsafe(), and Document.parseHTMLUnsafe().

All the methods take the HTML to be injected and an optional sanitizer configuration as arguments. The configuration defines the HTML entities that will be filtered out of the input before it is injected. The Element methods are context aware, and will additionally drop any elements that the HTML specification does not allow in the target element.

The safe methods always remove XSS-unsafe elements and attributes. If no sanitizer is passed as a parameter they will use the default sanitizer configuration, which allows all elements and attributes except those that are known to be unsafe, such as <script> elements and onclick event handlers. If a custom sanitizer is used, it is implicitly updated to remove any elements and attributes that are not XSS-safe (note that the passed sanitizer is not modified, and might still allow unsafe entities if used with an unsafe method).

The safe methods should be used instead of Element.innerHTML, Element.outerHTML, or ShadowRoot.innerHTML, for injecting untrusted HTML content. For example, in most case you can use Element.setHTML() with the default sanitizer as a drop-in replacement for Element.innerHTML. The same methods can also be used for injecting trusted HTML strings that do not need to contain any XSS-unsafe elements.

The XSS-unsafe methods will use whatever sanitizer configuration is passed as an argument. If no sanitizer is passed, then all HTML elements and attributes allowed by the context will be injected. This is similar to using Element.innerHTML except that the method will parse shadow roots, drop elements that aren't appropriate in the context, and allow some other input that is not allowed when using the property.

The unsafe methods should only be used with untrusted HTML that needs to contain some XSS-unsafe elements or attributes. This is still unsafe, but allows you to reduce the risk by restricting unsafe entities to the minimal set. For example, if you wanted to inject unsafe HTML but for some reason you needed the input to include the onblur handler, you could more safely do so by amending the default sanitizer and using an unsafe method as shown:

const sanitizer = new Sanitizer(); // Default sanitizer
sanitizer.allowAttribute("onblur"); // Allow onblur

someElement.setHTMLUnsafe(untrustedString, { sanitizer });

Sanitizer configuration

A sanitizer configuration defines what HTML entities will be allowed, replaced, or removed when the sanitizer is used, including elements, attributes, data-* attributes, and comments.

There are two very closely related sanitizer configuration interfaces, either of which can be passed to all the sanitization methods.

SanitizerConfig is a dictionary object that defines arrays for the allowed/disallowed elements and attributes and boolean properties that indicate whether comments and data attributes will be allowed or omitted, and so on.

Only a subset of possible configuration options may be specified in a particular configuration in order to reduce redundancy and ambiguity. The allowed subset is summarized in the Allow and remove configurations section below, and described in detail in Valid configuration.
Sanitizer is essentially a wrapper around a SanitizerConfig that provides methods to ergonomically modify the configuration and ensure that it remains valid.

For example, you can use a method to add an allowed element, and it will also remove the element from the replaceWithChildrenElements array (if present). The interface also provides methods to return a copy of the underlying SanitizerConfig and also to update the sanitizer so that it is XSS-safe. It may provide normalizations of the sanitizer configuration used to construct it, making it easier to understand and reuse.

While you can use either interface in any of the sanitizing methods, Sanitizer is likely to be more efficient to share and reuse than SanitizerConfig.

Allow and remove configurations

You can build up a configuration in two ways:

As an allow configuration: specifying the set of elements and/or attributes that you will allow in the output.
As a remove configuration: specifying the set that must not be present in the output.

These sets are specified as arrays in the configuration object fields: elements and attributes, and removeElements and removeAttributes. You may not specify both allow and remove arrays for elements or attributes in the same configuration, but other combinations of fields are allowed. The following table shows the permitted combinations.

Element arrays	Attribute arrays	Valid?
`elements`	-	✔️
`elements`	`attributes`	✔️
`elements`	`removeAttributes`	✔️
`removeElements`	-	✔️
`removeElements`	`attributes`	✔️
`removeElements`	`removeAttributes`	✔️
-	`attributes`	✔️
-	`removeAttributes`	✔️
`elements` + `removeElements`	(anything)	❌
(anything)	`attributes` + `removeAttributes`	❌
-	-	✔️

An allow configuration can optionally specify whether per-element attributes should be allowed and/or removed in its elements array. The allowed configuration for these local attributes depends on whether or not global attributes or removedAttributes is defined. The valid configuration section outlines the restrictions.

In general an "allow configuration" is safer for both the elements and attributes, because you list the elements and/or attributes that you want and know are safe, rather than all the items that are dangerous or might potentially be considered dangerous in future. If you specify an empty configuration object then an empty allow configuration is used.

Allow configurations

With "allow configurations" you specify the elements and attributes you wish to allow (or replace with child elements) — all other elements/attributes in the input will be dropped. This makes it easy to understand what elements will be allowed in the DOM when the HTML is parsed. They are useful when you know exactly what HTML entities you want to be able to inject in a particular context.

Allow configurations are created by defining a Sanitizer that wraps a SanitizerConfig that includes the elements and/or attributes arrays (and not the removeElements or removeAttributes arrays).

For example, the following configuration is created by passing a SanitizerConfig that allows <p> and <div> elements, and cite and onclick attributes on any allowed element. It will also replace <b> elements with their child nodes.

const sanitizer = new Sanitizer({
  elements: ["p", "div"],
  replaceWithChildrenElements: ["b"],
  attributes: ["cite", "onclick"],
});

The same configuration can also be created using Sanitizer methods. Note that in the following code the Sanitizer() constructor takes an empty object, which results in a Sanitizer where the underlying configuration includes both elements and attributes arrays — in other words, an "allow configuration".

// Create empty sanitizer
const sanitizer = new Sanitizer({});

// Use Sanitizer methods to update the properties.
sanitizer.allowElement("p");
sanitizer.allowElement("div");
sanitizer.replaceElementWithChildren("b");
sanitizer.allowAttribute("cite");
sanitizer.allowAttribute("onclick");

Remove configurations

In "remove configurations" you specify the HTML elements and attributes that you want to remove: any other elements and attributes are permitted by the sanitizer (but may be blocked if you use a safe sanitizer method, or if the element is not allowed in the context).

Remove configurations are created using a SanitizerConfig that includes the removeElements and/or removeAttributes arrays (and not the elements or attributes arrays).

For example, the following Sanitizer configuration would remove the same elements that were allowed in the previous code:

const sanitizer = new Sanitizer({
  removeElements: ["p", "div"],
  removeAttributes: ["cite", "onclick"],
  replaceWithChildrenElements: ["b"],
});

The configuration can also be created using Sanitizer methods. To make this a "remove configuration" we have to declare the removeElements or removeAttributes array when constructing the object (if only one array is specified the other will be defined as part of normalization).

const sanitizer = new Sanitizer({
  removeElements: [],
});
sanitizer.removeElement("p");
sanitizer.removeElement("div");
sanitizer.replaceElementWithChildren("b");
sanitizer.removeAttribute("cite");
sanitizer.removeAttribute("onclick");

Adding and removing from `Sanitizer` configurations

Sanitizer is recommended when you're using a configuration object that you might want to reuse or modify. Whether the sanitizer has an allow or remove configuration depends on the SanitizerConfig passed when the object is created. For example, if you pass a configuration object that has the elements or attributes array (or an empty object) the sanitizer will have an allow configuration.

In the examples above we created an allow configuration and then called allowElement(), allowAttribute(), and replaceElementWithChildren() to allow additional elements and attributes, and similarly we created a remove configuration and called removeElement() and removeAttribute() to specify additional elements to remove.

You can also call the allow methods on a remove configuration, and the remove methods on an allow configuration — but they behave differently. When you call the allow methods on an allow sanitizer the specified elements and attributes are added to the underlying elements and attributes array. However if you call those methods on a remove sanitizer there is no elements and attributes array; instead the specified element is removed from the corresponding removeElements or removeAttributes array, if present. This works because allowing an element in an allow sanitizer is the same as "not removing" an element in a remove sanitizer.

You can call all the Sanitizer methods on either an allow or remove sanitizer, and the method will make whatever changes it is able that result in a valid configuration. For example, if you add an element the method will either add it to elements or remove it from removeElements if present, depending on the type of sanitizer, and also remove the same element from the replaceWithChildrenElements array, if present.

Some operations that are possible for an allow configuration are not possible for a remove configuration. For example, per-element attributes are defined in the elements array, which is not present in a remove sanitizer.

The methods return true or false to indicate whether or not they modified the underlying configuration. So if you call allowElement() on an allow configuration and the specified element is not present, it will be added to the elements array and the method will return true. But if the element is already present then the method would return false. Note that if you call the same method to set a per-element attribute, this will return false if called on a remove sanitizer, because the change cannot be made.

Sanitization and Trusted Types

The Trusted Types API provides mechanisms to ensure that inputs are passed through a user-specified transformation function before being passed to an API that might execute that input. This transformation function is most commonly used to sanitize the input but it doesn't have to: the purpose of the API is primarily to make it easy for developers to audit sanitization code, not to define how or if sanitization is done.

The safe HTML sanitization methods don't use trusted types. Because they always filter all XSS-unsafe entities before input HTML is injected, there is no need to sanitize the input string, or audit the methods.

However the unsafe HTML sanitization methods may inject untrusted HTML, depending on the sanitizer, and so will work with trusted types. The methods can take either a string or a TrustedType as input. If a sanitizer is also supplied, the transformation function will be run first, and then the sanitizer.

Note that the behavior of the transformation function in this case will depend on the website policy (which might be to reject all use of the unsafe methods).

Third party sanitization libraries

Prior to the Sanitizer API, developers typically filtered input strings using third-party libraries such as DOMPurify, perhaps called from transformation functions in trusted types.

These should not be necessary when using the safe HTML sanitization methods as the API is integrated with the browser, and is more aware of the parsing context and what code is allowed to execute than external parser libraries can be.

They may be useful with the unsafe HTML methods and trusted types, depending on website trusted type policies.

Interfaces

Sanitizer: A reusable sanitizer configuration object that defines what elements and attributes should be allowed/removed when sanitizing untrusted strings of HTML. This is used in the methods that insert strings of HTML into the DOM or Document.
SanitizerConfig: A dictionary that defines a sanitizer configuration. This can be used in the same places as Sanitizer but is likely to be less efficient to use and reuse.

Extensions to other interfaces

XSS-safe methods

Element.setHTML(): Parse a string of HTML into a subtree of nodes, dropping any elements that are invalid in the context of the element. Then drop any elements and attributes that are not allowed by the sanitizer configuration, and any that are considered XSS-unsafe (even if allowed by the configuration). The subtree is then inserted into the DOM as a subtree of the element.
ShadowRoot.setHTML(): Parse a string of HTML into a subtree of nodes. Then drop any elements and attributes that are not allowed by the sanitizer configuration, and any that are considered XSS-unsafe (even if allowed by the configuration). The subtree is then inserted as a subtree of the ShadowRoot.
Document.parseHTML(): Parse a string of HTML into a subtree of nodes. Then drop any elements and attributes that are not allowed by the sanitizer configuration, and any that are considered XSS-unsafe (even if allowed by the configuration). The subtree is then set as the root of the Document.

XSS-unsafe methods

Element.setHTMLUnsafe(): Parse a string of HTML into a subtree of nodes, dropping any elements that are invalid in the context of the element. Then drop any elements and attributes that are not allowed by the sanitizer: if no sanitizer is specified allow all elements. The subtree is then inserted into the DOM as a subtree of the element.
ShadowRoot.setHTMLUnsafe(): Parse a string of HTML into a subtree of nodes. Then drop any elements and attributes that are not allowed by the sanitizer: if no sanitizer is specified allow all elements. The subtree is then inserted into as a subtree of the ShadowRoot.
Document.parseHTMLUnsafe(): Parse a string of HTML into a subtree of nodes. Then drop any elements and attributes that are not allowed by the sanitizer: if no sanitizer is specified allow all elements. The subtree is then set as the root of the Document.

Examples

The following examples show how to use the sanitizer API using the default sanitizer (at time of writing configuration operations are not yet supported).

Using `Element.setHTML()` with the default sanitizer

In most cases calling Element.setHTML() without passing a sanitizer can be used as a drop-in replacement for Element.innerHTML. The code below demonstrates how the method is used to sanitize the HTML input before it is injected into an element with id of target.

const untrustedString = "abc <script>alert(1)<" + "/script> def"; // Untrusted HTML (perhaps from user input)
const someTargetElement = document.getElementById("target");

// someElement.innerHTML = untrustedString;
someElement.setHTML(untrustedString);

console.log(target.innerHTML); // abc def

The <script> element is not allowed by the default sanitizer, or by the setHTML() method, so the alert() is removed.

Note that using Element.setHTMLUnsafe() with the default sanitizer will sanitize the same HTML entities. The main difference is that if you use this method with Trusted Types it may still be audited:

someElement.setHTMLUnsafe(untrustedString);

Using an allow sanitizer configuration

This code shows how you might use Element.setHTMLUnsafe() with an allow sanitizer that allows only <p>, <b>, and <div> elements. All other elements in the input string would be removed.

const sanitizer = new Sanitizer({ elements: ["p", "b", "div"] });
someElement.setHTMLUnsafe(untrustedString, { sanitizer });

Note that in this case you should normally use setHTML(). You should only use Element.setHTMLUnsafe() if you need to allow XSS-unsafe elements or attributes.