Hidden Character Detector
Paste text and the detector highlights every invisible Unicode character it finds — zero-width spaces, joiners, directional marks, and soft hyphens — each labeled with its code point.
Characters this tool detects
The common invisible Unicode characters the detector looks for, with what each one actually does
Marks a line-break opportunity. Can be planted inside an identifier to make it match differently from how it reads.
Prevents character joining in cursive scripts like Arabic and Persian. Invisible, but changes how adjacent glyphs connect.
The glue in emoji ZWJ sequences — a family emoji is four emoji joined by three U+200D characters, rendered as one glyph.
Forces left-to-right text direction without a visible glyph. Legitimate in mixed-direction text; abused in bidi spoofing.
Forces right-to-left direction. Same dual-use as U+200E — needed for Hebrew and Arabic, weaponized in phishing.
Invisible until a line breaks at that position, when it renders as a hyphen. Often imported accidentally when copying from PDFs.
Why invisible characters are a real problem
Invisible Unicode characters exist for legitimate reasons — line-break control, cursive-script joining, bidirectional text, optional hyphenation. But because they render as nothing, they are also a tool for abuse. The Trojan Source paper (CVE-2021-42574, disclosed October 2021) showed that bidirectional override characters could make source code compile differently from how it reads on screen. The companion disclosure (CVE-2021-42694) covered homoglyph attacks, where a Cyrillic а (U+0430) replaces a Latin a (U+0061) in a function name. Closer to everyday use, zero-width characters get pasted into usernames and identifiers to make two strings that look identical compare as different. A detector that surfaces these characters by code point is the fastest way to find them, because they cannot be seen by eye.
What this is useful for
What a serious sweep should look for
The six characters listed above are the most common, but they are not the full picture. A thorough sweep covers several ranges. The zero-width format characters U+200B through U+200F cover line-break opportunities, joiners, and directional marks. U+2060 (word joiner) and U+FEFF (byte order mark, when it appears mid-stream) are zero-width too. The bidirectional override block — U+202A through U+202E, plus U+2066 through U+2069 — is what Trojan Source exploits. U+00AD (soft hyphen) is invisible until a wrap. The Hangul fillers U+115F and U+3164 and the braille blank U+2800 render as nothing but occupy a cell, which makes them useful for hiding content in plain sight. Confusable detection is a related but separate problem: a Cyrillic а and a Latin a are both visible, so a character detector will not flag them. For that, you need a check against Unicode's confusables data, which maps look-alike characters to a normalized skeleton.
How to use the detector well
Paste only the text you want checked. Everything runs in your browser; nothing is uploaded.
Use the color coding to distinguish character types at a glance — red for ZWSP, green for ZWJ, orange for RLM, and so on.
After detecting, click Remove to strip every invisible character, then Copy Clean Text to get the sanitized version.
Check any username, slug, or identifier that came from user input before you store or compare it.
If a string comparison is failing and the strings look identical, paste both here — a hidden U+200B is the usual cause.
For source code from third parties, scan for U+202A through U+202E before reading it. These are the bidi overrides behind Trojan Source.
How the detector works
The detector runs entirely in your browser as JavaScript. No text leaves your device, which matters when you are pasting sensitive identifiers or source code. It works in Chrome, Firefox, Safari, Edge, and any modern browser on desktop or mobile. It scans for the zero-width characters in the U+200B to U+200F range, the bidirectional overrides in U+202A through U+202E and U+2066 through U+2069, the word joiner U+2060, the soft hyphen U+00AD, and the byte order mark U+FEFF when it appears outside the first position of a file. Note that it flags invisible characters by code point; it does not detect homoglyphs, which are visible characters that merely look like other characters.
