How to use the Unicode Encoder
Encode text to Unicode escapes in three steps:
1
Paste your text
Enter any text containing characters you want to encode as Unicode escapes.
2
Choose a format and scope
Select \uHHHH for JavaScript/JSON compatibility, \u{HHHH} for ES6+ with emoji support, U+HHHH for notation display, or &#xHHHH; for HTML. Toggle 'Non-ASCII only' to preserve readable ASCII characters.
3
Copy the encoded output
Click Copy to grab the Unicode-escaped string for use in your code, markup, or documentation.
When to use this tool
Use this tool when you need to represent text as Unicode escape sequences:
- →Encoding special characters in JavaScript strings and JSON payloads to ensure safe transmission across systems that may not preserve UTF-8
- →Generating \uHHHH escape sequences for embedding non-ASCII characters in JavaScript, TypeScript, or Java source code files
- →Producing HTML numeric character references (&#xHHHH;) for characters that aren't safely representable in HTML markup
- →Debugging internationalization (i18n) issues by inspecting the exact Unicode code points of characters in a string
- →Encoding text for environments like legacy email systems, XML configurations, or APIs that require pure ASCII with Unicode escapes for non-ASCII content
- →Creating unicode art or encoded strings for puzzles, obfuscation demos, and encoding educational exercises
Frequently asked questions
Q:What is the difference between \uHHHH and \u{HHHH} Unicode escapes?
\uHHHH is the classic 4-digit Unicode escape used in JavaScript, Java, C, and JSON — it can only represent the Basic Multilingual Plane (BMP) code points U+0000 through U+FFFF. Characters above U+FFFF (emoji, historic scripts, supplementary CJK characters) require two \uHHHH escapes called a surrogate pair. \u{HHHH} is the ES6+ JavaScript syntax that accepts 1-6 hex digits and represents any Unicode code point directly — \u{1F600} encodes 😀 as a single escape. Use \u{HHHH} for modern JavaScript that needs to handle emoji; use \uHHHH for maximum compatibility with older environments and JSON.
Q:Why would I encode ASCII characters like 'A' as Unicode escapes?
Encoding ASCII characters as Unicode escapes is useful in several scenarios: (1) JavaScript source code obfuscation — browsers execute \u0041 identically to 'A', making the source harder to read; (2) bypassing naive string filters in security testing — some input validation checks for literal characters but not their escape equivalents; (3) ensuring pure ASCII transmission — some legacy protocols, SMTP servers, or serialization formats guarantee only 7-bit ASCII and require non-ASCII characters encoded, but encoding everything avoids having to distinguish printable from non-printable ranges. Use 'Non-ASCII only' mode if you only want to encode the genuinely non-ASCII characters.
Q:What is the difference between U+HHHH notation and \uHHHH escapes?
U+HHHH is a notational convention used in Unicode documentation and specifications to identify code points — for example, U+0041 refers to LATIN CAPITAL LETTER A. It is not a valid escape sequence in any programming language. \uHHHH is an actual string escape sequence interpreted by language runtimes (JavaScript, Java, C, Python, etc.). U+HHHH notation is used when writing about Unicode for humans; \uHHHH escapes are used in source code. This tool outputs both formats — use U+HHHH when writing technical documentation and \uHHHH when writing code.
Q:How does Unicode encoding differ from UTF-8 encoding?
Unicode is the character standard that assigns a unique code point number to every character (e.g. U+1F600 = 😀). UTF-8 is a variable-width encoding that represents those code points as 1–4 bytes for storage and transmission. Unicode escape sequences (\uHHHH) encode code points as text representations. UTF-8 encoding produces the actual binary byte sequences. When you store a file as UTF-8 and type 😀, the byte sequence is F0 9F 98 80 (4 bytes). When you write \u{1F600} in JavaScript source code, the runtime interprets those 9 ASCII characters as the emoji character. The hex encoder tool handles UTF-8 byte-level encoding.
Q:Are Unicode escape sequences safe to use in JSON?
Yes — \uHHHH escapes are part of the JSON specification (RFC 8259) and are always safe to use inside JSON strings. JSON parsers in every language correctly interpret \u escape sequences. This makes Unicode escapes useful for ensuring JSON can be safely transmitted over systems that may corrupt non-ASCII bytes. However, characters above U+FFFF (emoji) require surrogate pairs in JSON (\uD83D\uDE00 for 😀) since JSON only supports the 4-digit \uHHHH form. Modern JSON parsers handle surrogate pairs correctly. Use \u{HHHH} only in JavaScript source strings, not JSON data.
Q:What does 'Non-ASCII only' encoding mode do?
Non-ASCII only mode encodes characters with Unicode code points above 127 (U+007F) while leaving all printable ASCII characters (letters, digits, punctuation) unchanged. For example, 'Café résumé' in non-ASCII mode becomes 'Caf\u00E9 r\u00E9sum\u00E9' — the accented é characters are encoded but the regular ASCII letters are preserved. This produces more readable output than full encoding and is the correct approach for ensuring ASCII safety while keeping human-readable content visible. It's the standard approach used by most JavaScript bundlers and transpilers.