Question 1

Which characters must be HTML-encoded?

Accepted Answer

At a minimum, ampersand, less-than, greater-than, double quote, and single quote must be encoded when you embed text inside HTML markup. Ampersand starts every entity reference, less-than and greater-than delimit tags, and the two quote characters are used to delimit attribute values. Failing to encode any of these can break the surrounding markup or, worse, allow user-supplied content to inject script tags or attributes that lead to cross-site scripting.

Question 2

What is the difference between named and numeric HTML entities?

Accepted Answer

Named entities use a memorable abbreviation — ampersand-amp-semicolon for the literal ampersand, ampersand-lt-semicolon for less-than. Numeric character references use the character's Unicode code point in decimal or hexadecimal form, such as ampersand-hash-65-semicolon for the letter A. Named entities are easier to read but the set is fixed by the HTML specification; numeric references can express any Unicode character and are useful when you need to escape arbitrary code points.

Question 3

How does HTML encoding prevent XSS?

Accepted Answer

Cross-site scripting works by tricking the browser into interpreting attacker-controlled text as HTML or JavaScript instead of as inert content. Encoding the five sensitive characters before injecting user input into a template strips them of their structural meaning — a less-than sign becomes its named entity, which the browser displays as text rather than starting a tag. Combined with a strict Content-Security-Policy, encoding closes the most common injection vectors.

Question 4

Should I encode every non-ASCII character?

Accepted Answer

Usually no. Modern browsers and HTTP infrastructure handle UTF-8 reliably, so encoding accented letters, emoji, or CJK glyphs into numeric references just makes the markup larger and harder to read. Aggressive encoding is helpful only when a downstream system has unreliable Unicode handling, when you are emitting strict ASCII for legacy compatibility, or when an XML or email pipeline downstream is known to mangle multi-byte sequences.

Question 5

What is the difference between attribute and content encoding?

Accepted Answer

Inside an element body, encoding the five core characters is sufficient to prevent injection. Inside an attribute value, the rules are slightly stricter — you must also escape whichever quote character delimits the attribute, and you should be careful with whitespace characters that some browsers may treat as attribute separators. DevStudio's HTML Encoder offers an attribute-safe mode that applies the stricter rules so you can safely build dynamic attributes.

HTML Encoder

Frequently asked questions

Which characters must be HTML-encoded?

What is the difference between named and numeric HTML entities?

How does HTML encoding prevent XSS?

Should I encode every non-ASCII character?

What is the difference between attribute and content encoding?

Related developer tools