Homoglyph Visualizer

Homoglyphs

A significant subset of Unicode characters are visually similar to Latin letters but possess disjointed symbolic and linguistic meanings. For example, the Latin "a" and the Cyrillic "a" appear visually homogeneous, but their underlying Unicode code points, U+0061 and U+0430, are not equivalent. The noise induced by homoglyphs has ramifications within the fields of cybersecurity and natural language processing, as homoglyphs have been found in material ranging from spoofed domain names to inappropriate tweets.

The following visualization allows for a direct comparison between the 26 Latin letters and the non-Latin homoglyphs they have in common, determined by 1) the Unicode Std, 2) a Human Annotator, and 3) GPT 4o as a normalization tool.

Color is used to encode the data source of each matrix: The letter visual similarity scores (range 1-7), derived by Simpson et al, are depicted in the blue matrix. The common homoglyphs established by the Unicode Standard (std) are presented in the red matrix. Both the common homoglyphs determined by a human annotator and the GPT 4o large language model were found through observation of a sample (n=700) of real-world tweets containing homoglyph characters. As such, both matrices are green.

Usage Tips

Use the checkboxes on the black navbar to toggle visibility of the four matrices.
To view the visual similarity score associated with two letters, hover your mouse over the cell in the Letter Visual Similarity matrix.
To view the number of homolgyphs shared between the two letters, hover your mouse over the cell in any of the commonality matrices.
To generate a visual overlay of two letters and return their common homoglyphs, click on a cell. The Letter Overlay will be displayed below the matrices.
To highlight a cell across all matrices, use the Search for a Cell feature provided below the matrices.
To re-sort all matrices based on the average or maximum score order of one, use the sorting tools provided below the matrices.

A Visual Representation of Homoglyphs and the Latin Letters They Resemble

Homoglyphs

Usage Tips

Sort All Based on Average...

Sort All Based on Maximum...

Search for a Cell

Letter Overlay

Common Homoglyphs