Text tool

Hidden Character Detector

Paste text and the detector highlights every invisible Unicode character it finds — zero-width spaces, joiners, directional marks, and soft hyphens — each labeled with its code point.

Character inspection

Inspect invisible characters

Paste text to reveal hidden Unicode characters and clean it when needed.

Detected characters0 total

No known invisible characters found.

Why invisible characters are a real problem

Invisible Unicode characters exist for legitimate reasons — line-break control, cursive-script joining, bidirectional text, optional hyphenation. But because they render as nothing, they are also a tool for abuse. The Trojan Source paper (CVE-2021-42574, disclosed October 2021) showed that bidirectional override characters could make source code compile differently from how it reads on screen. The companion disclosure (CVE-2021-42694) covered homoglyph attacks, where a Cyrillic а (U+0430) replaces a Latin a (U+0061) in a function name. Closer to everyday use, zero-width characters get pasted into usernames and identifiers to make two strings that look identical compare as different. A detector that surfaces these characters by code point is the fastest way to find them, because they cannot be seen by eye.

What this is useful for

Cleaning copied text

Text pasted from websites, PDFs, and Word documents often carries zero-width spaces and soft hyphens that break search and matching. Paste it here, strip them, copy clean text back out.

Checking usernames and identifiers

Two usernames that read as "admin" can be different strings if one contains a U+200B. Run any suspicious identifier through the detector before trusting a comparison.

Reviewing source code

Bidirectional override characters (U+202A through U+202E) can hide logic in code that looks correct. Paste in any snippet from an untrusted source before you read it.

Catching phishing links

IDN homograph attacks use confusable characters to register look-alike domains. The detector will not resolve the domain, but it will surface invisible characters hidden in the URL text.

What a serious sweep should look for

The six characters listed above are the most common, but they are not the full picture. A thorough sweep covers several ranges. The zero-width format characters U+200B through U+200F cover line-break opportunities, joiners, and directional marks. U+2060 (word joiner) and U+FEFF (byte order mark, when it appears mid-stream) are zero-width too. The bidirectional override block — U+202A through U+202E, plus U+2066 through U+2069 — is what Trojan Source exploits. U+00AD (soft hyphen) is invisible until a wrap. The Hangul fillers U+115F and U+3164 and the braille blank U+2800 render as nothing but occupy a cell, which makes them useful for hiding content in plain sight. Confusable detection is a related but separate problem: a Cyrillic а and a Latin a are both visible, so a character detector will not flag them. For that, you need a check against Unicode's confusables data, which maps look-alike characters to a normalized skeleton.

How to use the detector well

Paste only the text you want checked. Everything runs in your browser; nothing is uploaded.
Use the color coding to distinguish character types at a glance — red for ZWSP, green for ZWJ, orange for RLM, and so on.
After detecting, click Remove to strip every invisible character, then Copy Clean Text to get the sanitized version.
Check any username, slug, or identifier that came from user input before you store or compare it.
If a string comparison is failing and the strings look identical, paste both here — a hidden U+200B is the usual cause.
For source code from third parties, scan for U+202A through U+202E before reading it. These are the bidi overrides behind Trojan Source.

How the detector works

The detector runs entirely in your browser as JavaScript. No text leaves your device, which matters when you are pasting sensitive identifiers or source code. It works in Chrome, Firefox, Safari, Edge, and any modern browser on desktop or mobile. It scans for the zero-width characters in the U+200B to U+200F range, the bidirectional overrides in U+202A through U+202E and U+2066 through U+2069, the word joiner U+2060, the soft hyphen U+00AD, and the byte order mark U+FEFF when it appears outside the first position of a file. Note that it flags invisible characters by code point; it does not detect homoglyphs, which are visible characters that merely look like other characters.

Common questions about detecting hidden characters

What counts as a hidden character?

Any Unicode character that renders with no visible glyph. The common ones are zero-width space (U+200B), zero-width joiner (U+200D), zero-width non-joiner (U+200C), the left-to-right and right-to-left marks (U+200E, U+200F), and the soft hyphen (U+00AD). The detector flags each by code point.

Why do hidden characters end up in my text?

Three common sources. Copying from a web page or PDF imports the invisible characters that were in the source. An editor or CMS inserts them for line-break control. Or someone planted them deliberately — for watermarking, for formatting, or to make an identifier compare differently.

Can hidden characters be dangerous?

On their own, no. But in code, usernames, and URLs they can change how text is parsed. The Trojan Source attack (CVE-2021-42574) used bidirectional overrides to make compilers read different logic from what reviewers saw. In usernames, a hidden U+200B makes two identical-looking strings compare as different.

How do I remove hidden characters?

Paste your text, click Analyze, then click Remove Hidden Characters. The tool strips every invisible character it detects and leaves the visible text intact. Copy the result back out with Copy Clean Text.

Is my text sent anywhere?

No. The detector runs entirely in your browser as JavaScript. Nothing is uploaded to a server, which makes it safe to use on source code, credentials, or any text you cannot share.

Does this detect homoglyph attacks?

No. Homoglyphs are visible characters — a Cyrillic а (U+0430) looks identical to a Latin a (U+0061) but both render normally. This tool flags invisible characters. Detecting confusables requires a check against Unicode's confusables data, which maps look-alike characters to a normalized skeleton.