HTML Encoder

DevStudio's HTML Entity Encoder and Decoder converts text to and from HTML entity form so you can safely embed arbitrary characters inside HTML markup without breaking the document or opening cross-site scripting holes. Paste a string and the encoder replaces the five characters with special meaning in HTML — ampersand, less-than, greater-than, double quote, and single quote — with their named or numeric entity equivalents, which is the minimum escaping required to embed user-supplied content inside an element body or an attribute value. An optional aggressive mode also entity-encodes every non-ASCII character, which is useful when a downstream system has unreliable UTF-8 handling and you want to fall back to a plain-ASCII transport form. The decoder reverses the transformation: it understands the five required named entities, the larger HTML5 named entity set, and both decimal and hexadecimal numeric character references, so any string scraped from a web page or pulled from a database column round-trips back to its original characters. Common use cases include preventing reflected and stored XSS by escaping user-generated content before injecting it into a server-rendered template, decoding HTML pulled from a legacy CMS that exported numeric references for every accented character, safely embedding a JSON value inside an HTML attribute, escaping arbitrary strings into the body of an email's HTML part, and translating between the named-entity style preferred by editors and the numeric-reference style preferred by some XML pipelines. The tool distinguishes between the rules for element body escaping and attribute value escaping so you do not accidentally produce strings that are technically valid in one context but unsafe in the other — for example a single quote inside a single-quoted attribute can break out of the attribute even if the document is otherwise well-formed. Because every transformation runs locally, you can paste sensitive markup such as customer messages, support tickets, or scraped responses without anything leaving your browser, and you can use the tool offline once the page has loaded.

Frequently asked questions

Which characters must be HTML-encoded?

At a minimum, ampersand, less-than, greater-than, double quote, and single quote must be encoded when you embed text inside HTML markup. Ampersand starts every entity reference, less-than and greater-than delimit tags, and the two quote characters are used to delimit attribute values. Failing to encode any of these can break the surrounding markup or, worse, allow user-supplied content to inject script tags or attributes that lead to cross-site scripting.

What is the difference between named and numeric HTML entities?

Named entities use a memorable abbreviation — ampersand-amp-semicolon for the literal ampersand, ampersand-lt-semicolon for less-than. Numeric character references use the character's Unicode code point in decimal or hexadecimal form, such as ampersand-hash-65-semicolon for the letter A. Named entities are easier to read but the set is fixed by the HTML specification; numeric references can express any Unicode character and are useful when you need to escape arbitrary code points.

How does HTML encoding prevent XSS?

Cross-site scripting works by tricking the browser into interpreting attacker-controlled text as HTML or JavaScript instead of as inert content. Encoding the five sensitive characters before injecting user input into a template strips them of their structural meaning — a less-than sign becomes its named entity, which the browser displays as text rather than starting a tag. Combined with a strict Content-Security-Policy, encoding closes the most common injection vectors.

Should I encode every non-ASCII character?

Usually no. Modern browsers and HTTP infrastructure handle UTF-8 reliably, so encoding accented letters, emoji, or CJK glyphs into numeric references just makes the markup larger and harder to read. Aggressive encoding is helpful only when a downstream system has unreliable Unicode handling, when you are emitting strict ASCII for legacy compatibility, or when an XML or email pipeline downstream is known to mangle multi-byte sequences.

What is the difference between attribute and content encoding?

Inside an element body, encoding the five core characters is sufficient to prevent injection. Inside an attribute value, the rules are slightly stricter — you must also escape whichever quote character delimits the attribute, and you should be careful with whitespace characters that some browsers may treat as attribute separators. DevStudio's HTML Encoder offers an attribute-safe mode that applies the stricter rules so you can safely build dynamic attributes.

Related developer tools

Back to all DevStudio developer tools