Free Tool

Hidden Character Detector

Paste text and the detector highlights every invisible Unicode character it finds — zero-width spaces, joiners, directional marks, and soft hyphens — each labeled with its code point.

Paste Text to Detect Hidden Characters
Paste your text below and we'll highlight all hidden and invisible characters

Characters this tool detects

The common invisible Unicode characters the detector looks for, with what each one actually does

Zero Width Space
U+200B

Marks a line-break opportunity. Can be planted inside an identifier to make it match differently from how it reads.

Zero Width Non-Joiner
U+200C

Prevents character joining in cursive scripts like Arabic and Persian. Invisible, but changes how adjacent glyphs connect.

Zero Width Joiner
U+200D

The glue in emoji ZWJ sequences — a family emoji is four emoji joined by three U+200D characters, rendered as one glyph.

Left-to-Right Mark
U+200E

Forces left-to-right text direction without a visible glyph. Legitimate in mixed-direction text; abused in bidi spoofing.

Right-to-Left Mark
U+200F

Forces right-to-left direction. Same dual-use as U+200E — needed for Hebrew and Arabic, weaponized in phishing.

Soft Hyphen
U+00AD

Invisible until a line breaks at that position, when it renders as a hyphen. Often imported accidentally when copying from PDFs.

Why invisible characters are a real problem

Invisible Unicode characters exist for legitimate reasons — line-break control, cursive-script joining, bidirectional text, optional hyphenation. But because they render as nothing, they are also a tool for abuse. The Trojan Source paper (CVE-2021-42574, disclosed October 2021) showed that bidirectional override characters could make source code compile differently from how it reads on screen. The companion disclosure (CVE-2021-42694) covered homoglyph attacks, where a Cyrillic а (U+0430) replaces a Latin a (U+0061) in a function name. Closer to everyday use, zero-width characters get pasted into usernames and identifiers to make two strings that look identical compare as different. A detector that surfaces these characters by code point is the fastest way to find them, because they cannot be seen by eye.

What this is useful for

Cleaning copied text
Text pasted from websites, PDFs, and Word documents often carries zero-width spaces and soft hyphens that break search and matching. Paste it here, strip them, copy clean text back out.
Checking usernames and identifiers
Two usernames that read as "admin" can be different strings if one contains a U+200B. Run any suspicious identifier through the detector before trusting a comparison.
Reviewing source code
Bidirectional override characters (U+202A through U+202E) can hide logic in code that looks correct. Paste in any snippet from an untrusted source before you read it.
Catching phishing links
IDN homograph attacks use confusable characters to register look-alike domains. The detector will not resolve the domain, but it will surface invisible characters hidden in the URL text.

What a serious sweep should look for

The six characters listed above are the most common, but they are not the full picture. A thorough sweep covers several ranges. The zero-width format characters U+200B through U+200F cover line-break opportunities, joiners, and directional marks. U+2060 (word joiner) and U+FEFF (byte order mark, when it appears mid-stream) are zero-width too. The bidirectional override block — U+202A through U+202E, plus U+2066 through U+2069 — is what Trojan Source exploits. U+00AD (soft hyphen) is invisible until a wrap. The Hangul fillers U+115F and U+3164 and the braille blank U+2800 render as nothing but occupy a cell, which makes them useful for hiding content in plain sight. Confusable detection is a related but separate problem: a Cyrillic а and a Latin a are both visible, so a character detector will not flag them. For that, you need a check against Unicode's confusables data, which maps look-alike characters to a normalized skeleton.

How to use the detector well

Paste only the text you want checked. Everything runs in your browser; nothing is uploaded.

Use the color coding to distinguish character types at a glance — red for ZWSP, green for ZWJ, orange for RLM, and so on.

After detecting, click Remove to strip every invisible character, then Copy Clean Text to get the sanitized version.

Check any username, slug, or identifier that came from user input before you store or compare it.

If a string comparison is failing and the strings look identical, paste both here — a hidden U+200B is the usual cause.

For source code from third parties, scan for U+202A through U+202E before reading it. These are the bidi overrides behind Trojan Source.

How the detector works

The detector runs entirely in your browser as JavaScript. No text leaves your device, which matters when you are pasting sensitive identifiers or source code. It works in Chrome, Firefox, Safari, Edge, and any modern browser on desktop or mobile. It scans for the zero-width characters in the U+200B to U+200F range, the bidirectional overrides in U+202A through U+202E and U+2066 through U+2069, the word joiner U+2060, the soft hyphen U+00AD, and the byte order mark U+FEFF when it appears outside the first position of a file. Note that it flags invisible characters by code point; it does not detect homoglyphs, which are visible characters that merely look like other characters.

Common questions about detecting hidden characters