Text editing hates you too

Alexis Bingessner 's article, “Rendering Text Hates You,” published a month ago , is very close to me.



Back in 2017, I was developing an interactive text editor in a browser. Unsatisfied with the existing ContentEditable libraries, I thought: “Hey, just re-realizing the selection of text! Is it hard? ”I was young. Naive. I figured that I could handle it in two weeks. In fact, an attempt to solve this problem took several years of my life, including a year of paid work from morning until evening to develop a text editor for the new OS.



At work, I was fortunate to learn a lot from mentors with vast experience in this field . I have heard many, many scary stories. Including an engineer who supported a Windows application with a custom implementation of a text field - and wanted to switch from an outdated text input API to a new version. Here is a list of text input interfaces in this new version:







That's right, 128 interfaces for entering text. I’m almost sure that there are eight (8!) Different types of locks to fix concurrency problems, although I honestly did not read their documentation, so do not quote me about this. That engineer a year and a half (full time!) Finalized his editor, but in the end failed and remained on the old API.



Typing is difficult.



Alexis sometimes mentions the selection of text, but her personal experience is more related to rendering. As a person on the other hand, I can add a few points about input.



Vertical cursor movement



I already covered this in a previous article , but we can quickly repeat here.











In this example, if you press up, the cursor goes to the beginning of the line, before the word hello . So far, everything is pretty reasonable. But if you press up and then down, the cursor will first jump in front of hello , and then stand after some .



This may not seem very logical. You ask why he jumps to the right? Well, with vertical movements, each cursor remembers the x position in pixels, and it only updates when you press left or right, and not up and down. The same behavior prevents the cursors from moving to the left when moving vertically through short lines.



Proximity



Okay, now we know that when we select text, we have two pieces of state: the byte offset inside the line and the x coordinate in pixels, mentioned above. Is the problem resolved? Well no.



Consider two cursor positions on a very long line:











Since loooooooooong is one word, two cursor positions have exactly the same byte offset in the string . There is no newline between them, since the line is softly wrapped. Our cursors need an extra bit that tells them which line to go to. Most systems call this bit affinity. It is also used in a mixed bidirectional text, which we will talk about soon.



Emoji Modifiers



Suppose I send a message to a friend. To express my feelings I want to add a funny emoji. I enter in the text area a thumb raised up, the letter a



and an emoji modifier for skin tone. It looks like this:











Oh, I didn’t want to write a letter. I set the cursor after it and click Backspace. What will happen? I saw several options, depending on the editor.













All options are bad, so you can assume that there is probably some fourth option. There is! Many editors, such as TextEdit, do not even allow you to put the cursor after the letter, since the skin tone modifier is considered as a single unit with the previous character. This makes sense in the context of emoji and even works well in this case, but what if the modifier is indicated by the first character in the string?











Now the modifier changes the newline character. TextEdit will not allow you to place the cursor at the beginning of the second line! I personally consider this decision “also bad.”



You may also have noticed that the thumb up has become the thumb down. I did this myself to reflect my feelings about the whole situation.



By the way, TextEdit specifically makes the cursor on the first line very buggy. For example, guess what happens if I press 4



here?











Yeah. You might also think that there are spaces between the numbers. There is none of them.



Bidirectional text



Alexis mentions split selections in mixed bidirectional text, as in this example from TextEdit:











It really makes sense, since the Arabic language in the lines is encoded from right to left, so that the selection seems to be split, but bytes is a continuous range.



Therefore, it’s a little surprising that we can get this selection:











Yes, it is visually continuous, but byte-separated. Yes, it’s bad. Some editors do this if you select text with the arrow keys instead of the mouse. An alternative is to swap the left / right keys inside the text with the direction from right to left, which is also bad. There are no good options here.



As a bonus, try to understand what is going on here:











Lord ... I do not want to comment on this.



The thing about input methods



Software that translates keystrokes to input is called an input method or input method editor. For the Latin alphabet, this is not very interesting software, since each keystroke is directly associated with the insertion of a single character. But in many scripts, characters do not fit on the keyboard, so you have to be creative. For example, in some input methods for the Chinese language, the user enters sounds - and receives a list of characters similar in sound:











This field is sometimes called the composing region, and it often appears above the underlined text . Sometimes the input method has to style it. For example, the Japanese input method on Android uses the background color to create a sentence-sharing area:









(Thanks to Shae for the screenshot!)



Do all these selections and compositional areas interact with bidirectional text? Let's not think about it.



Input methods should work everywhere, even inside the terminal :











Nothing will go to Vim until a Chinese character is selected from the list. You probably think: “But how does it work in Vim command mode?” Not very well. This is why on the Internet text input and keystrokes are separate events. In the console, they mix, causing problems.



This is just one example of many different ways to enter text. (Don't forget keyboardless input methods such as voice and handwriting!) Fortunately, the operating system provides you with all of these methods. But, unfortunately, your text field must speak the general text input protocol used by all of these methods. For Windows, these are the 128 interfaces listed at the beginning of the article. In other operating systems, interfaces are simpler, but still difficult to implement.



You may also notice that the input method is a separate process, so both the input method and the application can make changes to the state of the text field. This is actually a parallel editing protocol. Windows solves the problem with eight (8!) Types of locks. Although holding a lock across process boundaries may seem dubious, most other platforms try to use imperfect heuristics to fix concurrency issues. Or they just hope that the race’s condition will not happen. In my experience, prayer is not a very effective primitive of parallelism.



Why is everything so difficult??



Jonathan Blow, in a lecture on software degradation, mentions the text editor Ken Thompson , which he wrote in a week. Most of the code in this article is randomly introduced complexity. Does Windows really need 128 interfaces and 8 kinds of locks for text input? In no case. Are errors in TextEdit the result of a complex editing model? Yes. Is scattering of bugs in modern programs something to worry about? At least for me it is.



However, Ken Thompson’s editor was also much, much simpler than what we expect from modern text editors. Unicode supports almost all living languages ​​in the world (there are about 7,000 of them), and many more are dead. There are different scripts, text directions and input methods, each of which imposes complex (and in some cases insoluble) restrictions on any editor. But he must also support screen readers.



Huge complexity accumulates inevitably , and in this article we only touched it slightly. This is a real programming miracle that you can just slap <textarea>



on a web page - and instantly provide text input for every Internet user around the world.



All Articles