Terminal support for emoji

January 21, 2021
emoji | terminals | unicode | python

Some terminals don't understand them

If your terminal doesn't treat multi-codepoint emoji correctly (Hyper, for example), it will essentially just iterate through each codepoint and render it as is:

>>> print("\U0001F468\u200D\U0001F467\u200D\U0001F466")
👨👧👦

This is problematic for terminal-based applications because two identical strings can display with different widths depending on the environment.

If a user is running Hyper, three emoji are rendered (👨👧👦).

If they're on iTerm2, a single emoji is rendered (👨‍👧‍👦).

Terminals that do understand them don't know how wide they are

Even worse, terminal emulators themselves often don't know how much visual space an emoji will take up. Graphical emoji are typically treated as "East Asian Wide" characters, which take up 2 "cell-widths" in the terminal. In other words, they take up the same width as two ASCII characters:

>>> print("👨\n12")  # Emoji is same width as 2 ASCII chars
👨                   # In other words it has "cell width" of 2
12                   # (results may vary in browsers :))

You might wonder:

"Can't you just detect the width of an emoji by writing it to the terminal and checking how many columns the cursor has moved forward by?"

— You, maybe

Unfortunately not. Most terminals incorrectly regard graphical Emoji Presentation Sequences as having cell width of 1. For example, iTerm2 considers the "rosette" 🏵️ emoji to have width 1, so when it writes the emoji it only progresses the cursor forward 1 column. This means any text that comes after it will overlap with the emoji:

This issue affects every terminal I've tested: Visual Studio Code, iTerm2, Alacritty, and Hyper.

Why does this happen?

A single character can take up more than one cell, and the East Asian Width Unicode database is generally used by terminals to determine how "wide" the character will be. East Asian Wide characters (such as Chinese, Japanese, and Korean ideographs) take up two cell widths in a terminal.

This approach comes from the wcwidth utility, and the comment at the top of the C source file provides further insight into the difficulties faced here.

The clue to why some emoji are rendered incorrectly can be found via the unicodedata module. Let's use it to query the "East Asian Width" property of two codepoints in the Unicode database.

>>> unicodedata.east_asian_width("\U0001F4A3")  # Bomb emoji 💣
'W'  # 'W' means 'East Asian Wide', terminals think of this as cell_width=2

>>> unicodedata.east_asian_width("\U0001F6E5")  # Motorboat emoji 🛥
'N'  # 'N' means 'Neutral', terminals think of this as cell_width=1

Here's how these emoji render in a terminal:

All terminal emulators I tested consider codepoints with an "East Asian Width" of N to have cell width of 1. This is incorrect in the case of Emoji Presentation Sequences -
Unicode recommends they should be always treated as "East Asian Wide" (W).

In summary: a massive headache, avoid if possible.