I am a M.S. student in the Department of Computer Science at the University of Arizona working with Dr. Mihai Surdeanu and Dr. Eduardo Blanco.
My interests lie at the intersection of natural language processing and AI security, with a focus on building systems resilient to adversarial attacks. I have led studies investigating Unicode perturbation detection, text normalization, AI-optimized case selection for content moderation, and malicious text-to-image prompt mitigation. My work reflects a deep-rooted enthusiasm for creating safer, more trustworthy AI systems.
.
.
.
pp. .
doi:.
Read Paper
For more information, please see my Google Scholar profile.
A significant subset of Unicode characters are visually similar to Latin letters but possess disjointed symbolic and linguistic meanings. For example, the Latin "a" and the Cyrillic "a" appear visually homogeneous, but their underlying Unicode code points, U+0061 and U+0430, are not equivalent. The noise induced by homoglyphs has ramifications within the fields of cybersecurity and natural language processing, as homoglyphs have been found in material ranging from spoofed domain names to inappropriate tweets.
This visualization allows for a direct comparison between the 26 Latin letters and the non-Latin homoglyphs they have in common, determined by 1) the Unicode Std, 2) a Human Annotator, and 3) GPT 4o as a normalization tool.
Gerrymandering is the process in which the boundaries of electoral districts are decided in a way that favors a specific political group. The following system ensures a valid redistricted map was always possible. A valid map must contain only contiguous districts with relatively equal populations.