A blog by Darren Burns
Darren
Burns
Hey ๐Ÿ‘‹ I'm Darren.
I'm a software engineer based in Edinburgh, Scotland

Posts
Twitter
GitHub

Terminal support for emoji

January 21, 2021
emoji | terminals | unicode | python

Some terminals don't understand them

If your terminal doesn't treat multi-codepoint emoji correctly (Hyper, for example), it will essentially just iterate through each codepoint and render it as is:

>>> print("\U0001F468\u200D\U0001F467\u200D\U0001F466")
๐Ÿ‘จ๐Ÿ‘ง๐Ÿ‘ฆ

This is problematic for terminal-based applications because two identical strings can display with different widths depending on the environment.

If a user is running Hyper, three emoji are rendered (๐Ÿ‘จ๐Ÿ‘ง๐Ÿ‘ฆ).

If they're on iTerm2, a single emoji is rendered (๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ฆ).

Terminals that do understand them don't know how wide they are

Even worse, terminal emulators themselves often don't know how much visual space an emoji will take up. Graphical emoji are typically treated as "East Asian Wide" characters, which take up 2 "cell-widths" in the terminal. In other words, they take up the same width as two ASCII characters:

>>> print("๐Ÿ‘จ\n12") # Emoji is same width as 2 ASCII chars
๐Ÿ‘จ # In other words it has "cell width" of 2
12 # (results may vary in browsers :))

You might wonder:

"Can't you just detect the width of an emoji by writing it to the terminal and checking how many columns the cursor has moved forward by?"
โ€” You, maybe

Unfortunately not. Most terminals incorrectly regard graphical Emoji Presentation Sequences as having cell width of 1. For example, iTerm2 considers the "rosette" ๐Ÿต๏ธ emoji to have width 1, so when it writes the emoji it only progresses the cursor forward 1 column. This means any text that comes after it will overlap with the emoji:

Rosette overlapping with text

This issue affects every terminal I've tested: Visual Studio Code, iTerm2, Alacritty, and Hyper.

Why does this happen?

A single character can take up more than one cell, and the East Asian Width Unicode database is generally used by terminals to determine how "wide" the character will be. East Asian Wide characters (such as Chinese, Japanese, and Korean ideographs) take up two cell widths in a terminal.

This approach comes from the wcwidth utility, and the comment at the top of the C source file provides further insight into the difficulties faced here.

The clue to why some emoji are rendered incorrectly can be found via the unicodedata module. Let's use it to query the "East Asian Width" property of two codepoints in the Unicode database.

>>> unicodedata.east_asian_width("\U0001F4A3") # Bomb emoji ๐Ÿ’ฃ
'W' # 'W' means 'East Asian Wide', terminals think of this as cell_width=2
>>> unicodedata.east_asian_width("\U0001F6E5") # Motorboat emoji ๐Ÿ›ฅ
'N' # 'N' means 'Neutral', terminals think of this as cell_width=1

Here's how these emoji render in a terminal:

Rendering of the bomb and motorboat emojis in a terminal

All terminal emulators I tested consider codepoints with an "East Asian Width" of N to have cell width of 1. This is incorrect in the case of Emoji Presentation Sequences -
Unicode recommends they should be always treated as "East Asian Wide" (W).

In summary: a massive headache, avoid if possible.


Copyright ยฉ 2022 Darren Burns