Escaping Strings in JSON: A Developer's Reference
JSON strings are simpler than most string types in most languages — but simpler does not mean easy to get right. A string in JSON is double-quoted, cannot contain raw control characters, and uses backslash escapes for anything that would otherwise be ambiguous or invalid. That is it. But the edge cases around Unicode, line separators, and language-specific round-tripping produce a steady stream of production bugs.
This guide is the piece of paper you can stick to your desk when you want the definitive answer to “how do I put this character in a JSON string?”
The six required escapes
JSON mandates escapes for exactly six characters:
| Character | Escape | Meaning |
|---|---|---|
" | \" | Double quote (closes the string otherwise) |
\ | \\ | Backslash itself |
| control chars U+0000 to U+001F | \uXXXX or the named escapes below | Unprintable control characters must be escaped |
| U+0008 | \b | Backspace (optional alternative to ) |
| U+0009 | \t | Tab |
| U+000A | \n | Line feed |
| U+000C | \f | Form feed |
| U+000D | \r | Carriage return |
Other characters — including the forward slash / — do not need to be escaped. / has an optional escape \/ that is valid but never necessary. The main place you see \/ is in APIs that embed HTML </script> tags inside JSON and escape the slash to prevent XSS. That is a good reason to use it; otherwise, leave it as /.
The general-purpose escape: \uXXXX
Any character can be escaped as \uXXXX, where XXXX is the four-digit hexadecimal code point. So:
"A" // "A"
"é" // "é"
"가" // "가"
" " // non-breaking space (NBSP)
This is the only way to escape characters outside the six special cases, and it is how parsers round-trip characters that might be interpreted as something else by downstream tools.
Surrogate pairs: characters above U+FFFF
A single \uXXXX can only encode code points up to U+FFFF. For characters above that — including every emoji, many historical scripts, and mathematical symbols — JSON uses surrogate pairs, borrowed from UTF-16:
"😀" // 😀 (U+1F600)
"𝄞" // 𝄞 (U+1D11E, musical G-clef)
Each pair is two \uXXXX escapes: a high surrogate in the range \uD800-\uDBFF followed by a low surrogate in \uDC00-\uDFFF. If your parser encounters an unpaired surrogate — a high without a low, or vice versa — well-behaved parsers reject it. Sloppy ones pass it through, which usually breaks something downstream.
What literal UTF-8 buys you
You do not have to use \uXXXX for non-ASCII. JSON strings may contain any Unicode character directly, as long as the file is UTF-8 (which it always is in practice). So both of these are valid and represent the same data:
"naïve"
"naïve"
The second is longer, harder to read, but avoids any chance of downstream tools misinterpreting the bytes. Which you emit depends on your serializer:
JSON.stringifyin Node / browser: emits UTF-8 directly by default. No\uescapes unless you have non-BMP characters that cannot fit.- Python
json.dumps: emits\u-escaped by default. Passensure_ascii=Falseto keep characters as-is. - Go
encoding/json: emits UTF-8 directly by default. HTML-sensitive characters (<,>,&) are escaped as\uXXXXunless you opt out withEncoder.SetEscapeHTML(false). - Rust
serde_json: emits UTF-8 directly by default.
The default-to-escape behaviour of Python trips up a lot of people. If you round-trip a string with a non-ASCII character through json.dumps and compare bytes, you will see "café" where you expected "café". Both are valid. Both represent the same data. Your byte-diff is meaningless.
What you cannot put in a JSON string
Directly, without an escape:
- Literal double quote
"(closes the string). - Literal backslash
\(starts an escape sequence). - Any control character U+0000 through U+001F. This includes the literal tab, newline, and carriage return.
- Unpaired surrogate code points (U+D800 to U+DFFF outside a valid pair).
Indirectly, via escapes, you can put anything. Even the null character:
"hello�world"
is a valid six-character string ending in “world” with a null byte in the middle.
Line separators — the sneaky ones
The Unicode characters LINE SEPARATOR (U+2028) and PARAGRAPH SEPARATOR (U+2029) are valid inside JSON strings without escaping. For years, this was a problem when JSON was embedded inside JavaScript — eval()-ing such a JSON document caused syntax errors because those characters are line terminators in JavaScript source. This was fixed by ECMAScript 2019, but you still see serializers emit
/
defensively, especially in server-side rendering contexts.
If you are embedding JSON in HTML, inside a <script type="application/json"> tag, escaping is not needed. If you are embedding JSON in a <script> block that will be eval()ed or parsed as JS, still escape those two.
Strings in keys
Keys are strings, same rules apply. {"say \"hi\"": true} is a valid key containing a double quote. In practice, keys almost always use ASCII identifiers and don’t need any escaping.
Common mistakes
Forgetting to escape Windows paths
{ "path": "C:\Users\ada\file.txt" }
Every backslash needs to be doubled. The fix:
{ "path": "C:\\Users\\ada\\file.txt" }
You do not need to touch the forward slash or the colon.
Passing a string through two layers of serialization
If you JSON.stringify a value and then JSON.stringify the whole thing again, you get a JSON-encoded string containing JSON-encoded quotes:
JSON.stringify(JSON.stringify({ hello: "world" }));
// → "\"{\\\"hello\\\":\\\"world\\\"}\""
That is rarely what you want. If an upstream system expects an object, give it an object. If it expects a string, give it a string. Do not stringify a stringification.
Trying to include a literal newline
{ "message": "line one
line two" }
That raw newline after “one” is invalid. The right encoding:
{ "message": "line one\nline two" }
Emoji from a source file in a BOM-tagged UTF-8 context
A byte-order mark at the start of a file is invisible in most editors but breaks strict JSON parsers. If your emoji-containing JSON works on your machine but fails in CI, check the encoding on the saved file and make sure it is UTF-8 without BOM.
Language round-tripping table
What abcé\n looks like after round-tripping in each language’s default serialization:
| Input (parsed string) | Python default | Python ensure_ascii=False | JavaScript | Go | Rust (serde_json) |
|---|---|---|---|---|---|
abc | "abc" | "abc" | "abc" | "abc" | "abc" |
café | "café" | "café" | "café" | "café" | "café" |
<script> | "<script>" | "<script>" | "<script>" | "<script>" | "<script>" |
If your output is expected to match byte-for-byte across languages, you need to pick one flavour of output, document it, and configure each language to produce it.
Testing your implementation
A quick litmus test for any JSON serializer:
- Emit a string that contains
",\,/, a newline, a tab, a non-BMP emoji, a non-BMP mathematical symbol, and at least one surrogate-pair edge case. - Parse the result with a different language’s parser.
- Check the decoded value is bitwise identical to the original.
If it passes, you are in the clear. If it fails, you probably have one of the six mistakes above.
Related reading
- What is JSON? — the baseline spec.
- The 10 most common JSON syntax errors — escape mistakes are a top-three source of parse failures.
- Glossary: Escape sequence, UTF-8, BOM.
This guide is written for general information. Always validate against your runtime's official parser before relying on any behaviour in production.