Terminal support for emoji
emoji | terminals | unicode | python
Some terminals don't understand them
If your terminal doesn't treat multi-codepoint emoji correctly (Hyper, for example), it will essentially just iterate through each codepoint and render it as is:
>>> print("\U0001F468\u200D\U0001F467\u200D\U0001F466")๐จ๐ง๐ฆ
This is problematic for terminal-based applications because two identical strings can display with different widths depending on the environment.
If a user is running Hyper, three emoji are rendered (๐จ๐ง๐ฆ).
If they're on iTerm2, a single emoji is rendered (๐จโ๐งโ๐ฆ).
Terminals that do understand them don't know how wide they are
Even worse, terminal emulators themselves often don't know how much visual space an emoji will take up. Graphical emoji are typically treated as "East Asian Wide" characters, which take up 2 "cell-widths" in the terminal. In other words, they take up the same width as two ASCII characters:
>>> print("๐จ\n12") # Emoji is same width as 2 ASCII chars๐จ # In other words it has "cell width" of 212 # (results may vary in browsers :))
You might wonder:
Unfortunately not. Most terminals incorrectly regard graphical Emoji Presentation Sequences as having cell width of 1. For example, iTerm2 considers the "rosette" ๐ต๏ธ emoji to have width 1, so when it writes the emoji it only progresses the cursor forward 1 column. This means any text that comes after it will overlap with the emoji:
This issue affects every terminal I've tested: Visual Studio Code, iTerm2, Alacritty, and Hyper.
Why does this happen?
A single character can take up more than one cell, and the East Asian Width Unicode database is generally used by terminals to determine how "wide" the character will be. East Asian Wide characters (such as Chinese, Japanese, and Korean ideographs) take up two cell widths in a terminal.
This approach comes from the wcwidth
utility, and the comment at the top of the C source file provides further insight into the difficulties faced here.
The clue to why some emoji are rendered incorrectly can be found via the unicodedata
module. Let's use it to query the "East Asian Width" property of two codepoints in the Unicode database.
>>> unicodedata.east_asian_width("\U0001F4A3") # Bomb emoji ๐ฃ'W' # 'W' means 'East Asian Wide', terminals think of this as cell_width=2>>> unicodedata.east_asian_width("\U0001F6E5") # Motorboat emoji ๐ฅ'N' # 'N' means 'Neutral', terminals think of this as cell_width=1
Here's how these emoji render in a terminal:
All terminal emulators I tested consider codepoints with an "East Asian Width" of N
to have cell width of 1.
This is incorrect in the case of Emoji Presentation Sequences -
Unicode recommends they should be always treated as "East Asian Wide" (W
).
In summary: a massive headache, avoid if possible.
Copyright ยฉ 2022 Darren Burns