Posted by Victor Mair
https://languagelog.ldc.upenn.edu/nll/?p=72879&utm_source=rss&utm_medium=rss&utm_campaign=the-chinese-computer-competition-or-cooperation
https://languagelog.ldc.upenn.edu/nll/?p=72879
The Chinese Computer: Competition or Cooperation?
book review by
David Moser
Beijing Capital Normal University
Thomas Mullaney’s The Chinese Computer is a fascinating account of the decades-long effort by linguists, computer scientists and engineers to incorporate Chinese characters into the digital age. Drawing on a vast body of historical and scientific sources, the book offers the reader an lively account of the formidable technical challenges involved in creating practical and intuitive input methods for one of the world’s most complex writing systems. The reader will come away with an increased awareness of the contributions that Chinese computing brought to modern computer science.
Chinese scholars and sinologists working in the 1980s and 90s will recall the early generations of Chinese word processors—slow, unreliable, and crash-prone—when every incremental gain in speed or compatibility felt like a small miracle. Thanks to the ingenuity and innovation of computer input developers, today anyone on the planet can create Chinese texts using an impressive ecosystem of powerful and user-friendly tools.
One surprising takeaway of of Mullaney’s book is that certain Chinese character entry methods are now overall faster – even much faster – than English input. Mullaney makes a case that the speed of the new Chinese input methods is due to an increasingly common mode of digital-age writing that he calls “hypography.” Simply put, hypography is “writing-by-retrieval.” That is, the sequence of alphanumeric symbols inputted do not directly represent the output text, and those input symbols are then used to retrieve the intended characters as visible text on the screen. This mode of writing is in contrast to the direct “what-you-type-is-what-you-get” principle of inputting alphanumeric symbols on the keyboard.” Mullaney provides an interesting example:
Loading up a phonetic IME [Input Method Editor], for example, a user can enter the string zhrmghg and watch as it correctly suggests 中华人民共和国 (Zhonghua renmin gongheguo or “the People’s Republic of China”). If a user prefers an example from deeper in antiquity, entering the input string xmyjj is a possibility. Chances are reasonable that the Cloud IME might recommend a stanza from “Parting” by famed Tang-dynasty poet Wang Wei: 下马饮君酒 (xiama yin jun jiu “I dismount from my horse and offer you wine . . .”) Admittedly, this is one of the best known of all Tang poems, yet I invite the same user to switch their computer back to English-language mode and enter the string sicttasdtamlamt. Did your machine catch this comparably famous passage by Shakespeare? Chances are slim. (Mullaney, p. 12)
Slim indeed. English-language input systems are unable to take advantage of this acronymic technique. The function of this phonetic IME affords a great savings in keystrokes (thanks to pinyin, it must be noted) and is now a default function on most Chinese digital platforms. If both Chinese and English systems are using the roman alphabet, why is it that English cannot take advantage of this hyper-abbreviated pinyin input?
The answer lies in the morphology of Chinese. The vast majority of Chinese syllables are morphemes, and each written character corresponds to a single-syllable morpheme. Because Chinese has so many homophones, the number of possible spellings is relatively small, and forms a closed set. English, by contrast, has a much more complex morphological system, with morphemes of different syllable length and inconsistent spelling. Thus the possible spellings for each morpheme is enormous.
The number of English words beginning with the letter ‘c’ is estimated at 35,000–40,000. (Merriam-Webster Unabridged) In contrast, the number of Chinese syllables beginning with the letter ‘c’ (sans tone) amounts to about 36 spellings, forming a finite inventory of one-syllable morphemes. Here is the complete list:
ca, cai, can, cang, cao, ce, cen, ceng, cha, chai, chan, chang, chao, che, chen, cheng, chi, chong, chou, chu, chua, chuai, chuan, chuang, chui, chun, chuo, ci, cong, cou, cu, cuan, cui, cun. cuo
These phonological units, with 100% consistent spelling rules, are precisely what makes this particular acronym-based IME so powerful in Chinese—and so impractical in English. Typing the letter ‘c’ according to the “first letter” algorithm in English would entail an enormous brute force search of tens of thousands of possible words.
Mullaney points out that the Wang Wei poem is quite well-known poem and thus has already been encoded into the Cloud. If one were to choose a more obscure poem, it might not have been uploaded into the Cloud, and the user would have no recourse other than straightforward pinyin entry.
But let us go back to Mullaney’s taunt about the futility of inputting the string “sicttasdtamlamt.” The string is, of course, a sort of acronym for the first line of a famous Shakespeare sonnet: “Shall I compare thee to a summer's day? Thou art more lovely and more temperate.” We cannot make use of the Chinese acronymic IME, and can only enter the text letter-by-letter. So let’s see how many keystrokes it would require to access the first line of the poem using Google:

With just seven keystrokes, the predictive text function of Google yields the first line of the poem, thus there is no need to type in the entire sentence. And when I continue to type the next line, predictive text supplies that text, as well:

With just one keystroke, ‘T’, the second line of the poem is in the user’s text box. And this process could be continued, depending on the needs of the user. In the most situations, the user’s goal would be simply to get access to the entire poem, in which case one may simply cut-and-paste the entire poem into the file, and edit it as needed.
As Mullaney recounts, we have the Chinese to thank for predictive text, which was pioneered out of necessity in the 1950s by Chinese linguists to address the inefficiency of the Chinese typewriter. It is now a standard feature in computer systems throughout the world.
Mullaney also describes a similar Chinese-context scenario when searching well-known texts with a Chinese predictive text IME:
Imagine you are a literary scholar, employing a “connected thought” Chinese IME plug-in focused on Chinese poetry and literature. (If you’re a medical professional, an aeronautical engineer, a physicist, a pharmacist . . . there are IME plug-ins for you, too.) As soon as you enter the first few characters of a well-known poem, let’s say, your input system offers up a long passage from your desired text. Upon confirming, your composition window fills up with 10 characters—or perhaps 20 or 30. In a matter of a few seconds, you will have entered anywhere between 300 and 1,800 “characters per minute.” (p. 220)
The scenario he describes is quite similar our experience using Google’s predictive text feature. According to context, the Chinese input might be somewhat faster, but based on my online experience, the results, mutatis mutandis, are in the same ball park. Mullaney’s boast about the speed differential of the two systems begins to seem less impressive.
So far these examples are from the alphabetic pinyin world, but pinyin is not the only entry method. Mullaney informs us that, in terms of speed, pinyin-based input systems are much slower than “structure-based” entry methods. Unlike pinyin, which relies on how a character sounds, structure-based methods break down Chinese characters into their fundamental components, radicals, and strokes, and maps these features to the standard QWERTY keyboard. Mullaney recounts the astounding performance of a Henry Zhenyu, who used a structure-based IME to take first prize in the 2013 National Chinese Characters Typing Competition, attaining one of the fastest typing speeds ever recorded:
He [Huang Zhenyu] transcribed the first 31 Chinese characters of Hu Jintao’s speech in roughly 5 seconds, for an extrapolated speed of 372 Chinese characters per minute. By the close of the grueling 20-minute contest, one extending over thousands of characters, he crossed the finish line with an almost unbelievable speed of 221.9 characters per minute. That’s 3.7 Chinese characters every second. In the context of English, Huang’s opening 5 seconds would have been the equivalent of around 375 English words-per-minute, with his overall competition speed easily surpassing 200 WPM—a blistering pace unmatched by anyone in the Anglophone world (using QWERTY, at least).
Huang made use of Wubi (五笔), a structure-based entry method that was popular in the 1980s and 90s. As fast as the method is, mastering the Wubi system constitutes a very steep barrier for the vast majority of Chinese people, who have already learned Hanyu pinyin in grade school. While Wubi is still used in certain technical contexts, pinyin entry dominates.
Is Mullaney putting too much emphasis on speed? Most computer users would rather use a system that is intuitive and user-friendly rather than a system that necessitates an exorbitant learning curve, no matter how fast it may be. In buying a car, we usually want a vehicle that is reliable and easy to operate, not a sports car that can do zero to 60 mph in 5 seconds. Similarly, most of us who use computer keyboards do not aspire to be record-breaking speed typists. So it is with input systems, where speed takes second place to criteria such as “intuitive” and “user-friendly.”
Nevertheless, Mullaney focuses on speed of input as paramount, arguing that mastering the more complicated hypographic systems can be worth the effort:
When it came to modern information technologies, that is to say, Chinese was consistently one of the slowest writing systems in the world… What changed? How did a script so long disparaged as cumbersome and helplessly complex suddenly rival—exceed, even—computational typing speeds clocked in other parts of the world? Even if we accept that Chinese computer users are somehow able to engage in “real time” coding, shouldn’t Chinese IMEs result in a lower overall “ceiling” for Chinese text processing as compared to English? Chinese computer users have to jump through so many more hoops, after all, over the course of a cumbersome, multistep process: the IME has to intercept a user’s keystrokes, search in memory for a match, present potential candidates, and wait for the user’s confirmation. Meanwhile, English-language computer users need only depress whichever key they wish to see printed on screen. What could be simpler than the “immediacy” of “Q equals Q,” “W equals W,” and so on?… Even though Chinese human-computer interaction relies upon forms of mediation unseen in mainstream Anglophone computing, these additional layers of mediation can result in speeds that equal or surpass those of the seemingly “unmediated” world of what-you-type-is-what-you-get. Counterintuitively, the addition of mediation can lead to the subtraction of time. (p. 9)
Mullaney is saying that increasing layers of mediation – advances in hypography techniques – will pay off in the long run with input speeds surpassing standard alphabetic systems. This may be quite true. But in the current era, how important is this advantage of extra speed if the learning process is more time consuming? Super-fast typists using QWERTY keyboards are capable of astonishing speeds unattainable by we mere mortals. For us, the upper speed constraint of QWERTY keyboards is the clumsiness of human hands, not the layout of the keyboard. In the present era, most users cannot even achieve the theoretical speeds of existing input methods, much less the more hypographic ones that promise even faster performance.
Of course, Mullaney notes that despite the existence of the available array of hypographic methods, their use English-language input remains limited to a few specialist areas:
What sets Chinese computing apart, then, is not the existence of hypography, but the scale and intensity of its usage. In English, hypography remains a hyperspecialized practice, reserved for specific domains of work (court stenography, for example), or in cases when practical limitations or physical abilities make the use of “conventional” QWERTY-style typing untenable or unattractive (as with the Palm Pilot and other small electronic devices). In Chinese, by contrast, hypography is ubiquitous. (p. 222)
Why has the QWERTY keyboard effortlessly migrated from the typewriter to the computer keyboard, with the addition of only a small number of the hypographic features required by Chinese input? It would seem obvious that inventors and developers working in the “Alphabet World” simply had fewer challenges to tackle. The old maxim “Necessity is the mother of invention” is a cliché, but still applicable.
Mullaney seems to imply that Anglophone text input is “missing the boat” when it comes to the adoption of highly mediated input systems. A self-satisfied complacency with the status quo results in a loss of the advantages such systems offer:
“…(W)ould average users want to dedicate precious time and energy to learning complex, highly mediated systems of textual input when they already enjoy the “best of all possible worlds”: immediacy? When presented with anything that falls outside this core—be it chording keyboards, autocompletion, predictive text algorithms, or otherwise—Anglophone computer users always have the option to regard these things as optional, auxiliary, or extra.” (p. 225)
His observation is somewhat valid. With English language input there was less urgency for the development and use of such systems. But once developments like autocompletion and predictive text became available, they were easily added to the array of input features, which thus became part of a new default “core” function.
When it comes to “user-friendly,” everything is a tradeoff, and speed is only one factor to take into consideration. Stenography is an abbreviated symbolic writing method that vastly increases speed and brevity of writing, enabling court stenographers to type faster than any typewriter speed champion. But for obvious reasons, this system has never been considered as a default computer keyboard format.
Mullaney is quite aware that these kinds of tradeoffs can result in a kind of cultural inertia that preserves the familiar and eschews the new. Language processing systems become default for many different reasons. In his book he raises the case of the QWERTY keyboard as an example of how the habituation of existing technology makes incremental improvements difficult:
Ironically, some of the clearest illustrations of the enduring power of “normal” typing comes in Western criticisms of the QWERTY keyboard. In his famous 1985 essay, “Clio and the Economics of QWERTY,” the economist Paul David posed a basic question: Given how inefficient the QWERTY keyboard is, in terms of how the letters of the Latin alphabet are arranged, why did it remain dominant? Why was it never displaced by other keyboard arrangements in history that were, he argued, demonstrably better? How did inefficiency win the day? His answer became a mainstay of economic thought: economies are shaped, not merely by rational choice, but also by the sedimented accumulation of decisions from the past. Economic paths are “path dependent.” (p. 225)
It is true that satisfactory is often the enemy of the perfect. At this stage of history, the superior Dvorak keyboard will never replace the familiar QWERTY keyboard. Pinyin was developed on the basis of many compromises, and, as Mullaney stresses, was probably not the best possible system for Chinese character input. (No system could be.) But due to many factors (including the mandate and support of the PRC government), generations of users have become accustomed to this method, and it is permanently entrenched in Chinese online culture. English spelling is famously inconsistent, and for many years there were various plans to systematize the orthography. Then came computers and automatic spell-check, and now users need not grapple with the chaos of English spelling. (Though President Trump does not seem to have discovered this digital advantage.) Flawed as they are, the technologies that first catch on have legs. As my computer-savvy father use to say, tongue-in-cheek, “The best software is the one that you’re most familiar with.”
An important historical perspective in Mullaney’s book is that Western alphabetic input did dominate the domain of information technology during the 20th century, resulting in an orthographic hegemony did not accommodate other scripts such as Chinese, Arabic, Hebrew, Cyrillic and Devanagari. These cultures, like China, were faced with similar challenges in developing digital tools that could enable their writing systems to participate in the digital world. Mullaney submits that these non-Western technical innovations were a godsend to the Western-designed computer, which was late in incorporating the advantages of hypography:
It was not the Western-designed computer that saved China and the non-Western world. It was China and the non-Western world that saved the Western-designed computer—saved it, that is, from its foundational limitations, both conceptual and material. Without Input Method Editors, contextual shaping, dynamic ligatures, rendering engines, layout engines, adaptive memory, contextual analysis, autocompletion, predictive text, the “modding” of the BIOS; the hacking of printer drivers, “Chinese-“Chinese-on-a-chip,” and, above all, an embrace of hypography, no Western-built computer could have achieved a meaningful presence in the world beyond the Americas and Europe. Today, hypography is the global norm. Hypography made global computing possible. (p. 229).
One of the contributions of Mullaney’s historical narrative is the realization of how early these technical developments were taking place, and to what extent Chinese computer scientists were actively involved. His account is a corrective to the common assumption that computer technology was primarily the fruits of the West.
However, too often Mullaney tends to characterize the progress as a kind of digital arms race between Western and Chinese computers, both vying for first place in an all-out global contest. To state that China and the non-Western world “saved the Western-designed computer” is an exaggeration, and this binary framing is at odds with the historical account he so ably documents in his book. Even a cursory reading of The Chinese Computer reveals the process as a multi-national, multi-generational collaborative effort. The invention of the computer itself was a collaborative, global effort taking place the late 1930s and 1940s, with key milestone developments in Germany, the United Kingdom, and the United States. China scientists, already on board early on and heavily engaged with the goals of Chinese information technology, began to develop important contributions to the nascent computer technology. From the outset, scientists and linguistic throughout the world were working on pieces of the puzzle.
Every tool in human history, from the abacus to the computer, eventually becomes the shared fruits of all peoples and civilizations. The tools developed by the cast of characters in Mullaney’s book are all built upon the breakthroughs of previous eras, and these breakthroughs are modified, improved, and in the end shared by the whole world. The contributions of the Chinese scientists were substantial indeed, but none of them were working in isolation.
Mullaney also tends to anthropomorphize the Chinese and English languages, speaking as if the advancements in Chinese input represent a victory over the alphabet due to some mysterious inherent power embodied in the Chinese characters themselves. This excerpt from an interview in is typical of his characterizations:
In the Western world, people began to assume that the Latin alphabet had finally “conquered” Chinese — just like they assumed it always would. But nothing could be further from the truth.
What actually did happen?
If anything, Chinese conquered the alphabet, not the other way around.
Let’s look closely at the QWERTY keyboard in China. When we do, we find that it’s not at all how one might expect. In the Western world — or really in the “Alphabetic World” — we use the computer keyboard in a dumb, what-you-type-is-what-you-get kind of way…
And that’s not what happens with Chinese?
No. Chinese input uses the QWERTY keyboard in an entirely different manner. In China, the QWERTY keyboard is “smart,” in the sense that it makes full use of modern-day computer power to augment and accelerate the input process.
(Los Angeles Review of Books)
Again, this kind of binary thinking is not constructive. Chinese characters didn’t “conquer” the alphabet, because scripts, like languages, do not compete with each other. Rather, it was the thousands of brilliant computer scientists on both sides of the Pacific who “conquered” the daunting task of integrating Chinese characters into the digital era. And in the process, many new tools and frameworks emerged, to the betterment of computer science as a whole. It is certainly not racist to simply recognize that the Chinese characters represented a unique challenge in comparison with alphabetic writing systems. This fact does not imply that Chinese characters are a deficient or backward writing system. Every script has its own functions and flaws. Currently the most common Chinese character input methods are pinyin based. Surely it would be absurd to claim that the alphabet “conquered” Chinese because pinyin afforded a more user-friendly input than the numerous other methods. The widespread adoption of pinyin was not a “defeat” for the Chinese characters. It was a victory.
As someone who types Chinese characters into a computer on a daily basis, I feel we are living in a golden age. For me, the problem of character input has effectively been solved. There may be a need for super-fast input in certain highly technical domains, but the vast majority of Chinese input users, the status quo is already more than adequate. I’m not a particularly proficient typist, but I’m now able to switch smoothly from English to Chinese input, effortlessly search English or Chinese texts, and to take full advantage of features like predictive text and autofill in both languages. Chinese, English and increasingly the scripts of other cultures now exist side-by-side in cyberspace, easily accessed and processed. And these amazing science-fiction conveniences are due to the cast of characters in Mullaney’s book – they are my heroes.
Finally, a topic closely related to the triumph of digital input is the phenomenon of “character amnesia,” (in Chinese, ti bi wang zi 提笔忘字) the increasing inability of writing characters by hand.) Mullaney addresses this issue at the very beginning of the book, seeming to mock the public hang-wringing about the problem:
How do we make sense of these astonishing accounts [of character amnesia]? Is it another case of moral panic in the digital age—whether concerns over “Textspeak,” emoticons, the decline of handwriting, or other matters of “language hygiene?” Or could it be that twenty-first century China is home to hundreds of millions of newly illiterate aphasics, or dysgraphic amnesiacs? If so, why don’t we find evidence of this crisis everywhere we look? A cratering economy? The collapse of higher education, perhaps? How, then, can China be one of the world’s most vibrant and wealthiest digital economies? How is it possible, moreover, that the Chinese-language internet is boiling over with activity, with an estimated 900 million internet users in mainland China alone, engaged in a frenetic, nonstop traffic in Chinese-language content? If China’s most connected, tech-savviest individuals are “incapable of writing” (a baseline definition of dysgraphia), who exactly is doing all this Chinese writing? (p. 2)
At first, the logic of this passage seemed baffling to me, and the sarcasm seems misdirected. Surely Mullaney knows that the definition of “character amnesia” is the increasing inability to write Chinese characters by hand. It wasn’t until I read the very last paragraph in the book did I understand Mullaney’s stance:
Returning one final time to Huang Zhenyu and the 2013 Chinese input competition, we might ask: Would Huang Zhenyu have been able to write out President Hu Jintao’s speech by hand, with just a pen and paper? And if he had proven incapable of doing so—if he, too, had “lifted his brush, but forgotten the character,” would we really feel comfortable calling him amnesiac, aphasic, or illiterate? Writing has changed. Our frameworks for understanding it must change as well. (pp. 231-32)
Amen. I was a bit slow in understanding Mullaney’s point, but if I’m understanding him correctly, we are in solid agreement. The upshot is that character amnesia is no longer considered a crisis, because the act of writing itself (mutatis mutandis) continues apace in daily life, and with increased speed and efficiency. Thus, counter-intuitively, character amnesia entails no fear of imminent societal collapse because communication via Chinese characters continues as usual – only digitally.
There is, of course, a loopy irony here. Character amnesia is certainly not a new phenomenon. For millennia the task of memorizing and writing Chinese characters has posed a challenge to human memory resources, and has required an inordinate investment in time. Now in the 21st century digital input has “solved” the problem of character amnesia – by exacerbating it.
Typing English (or any other language with an alphabet or limited character set) does not entail a loss of handwriting ability because the act of typing reinforces the orthography. Unlike China, citizens of Alphabet World are definitely not forgetting how to write by hand. In contrast, typing Chinese on the computer, whether using pinyin or any other input method, disengages the user from the physical act of writing characters by hand. As time goes on, the “muscle memory” deteriorates, and the overt awareness of character components fades. The irony here is that the increase in character amnesia is in large part due to the miraculous Chinese character input tools that Mullaney describes in his fascinating book.
In a roundabout way Mullaney is asking: Is the deterioration of writing ability really a significant loss? Or do we simply find ourselves in a digital world where the activity of “writing” is carried out in a more efficient way?
There is a Chinese idiom you de you shi有得有失 “You win some, you lose some.” Obviously there is a cultural loss. The traditionalists are lamenting: “We’re losing touch with our 5,000 years of culture!” The Ministry of Education mandates an increase in handwriting requirements and calligraphy in the school curricula. State television airs character-writing competitions such as Chinese Character Heroes (汉字英雄) and Chinese Characters Dictation Competition (中国汉字听写大赛) in an effort to make handwriting skill seem “cool” to the younger generation. How long should we continue the requirement that high school graduates should be able to write 4,500 characters by hand? Culture enthusiasts have been loath to even consider this possibility. Meanwhile average citizens going about their daily tasks have already accepted the new norm of digital writing. Virtually all my Chinese students, colleagues and acquaintances will readily attest to the increasing loss of ability to write characters by hand. When I ask “Does it bother you?” they merely shrug and reply “Not at all. I just look up the character on my mobile phone.”
It’s difficult to predict where this is all heading but based on the advancements of the past two decades, the future of writing might look very different. Speech-to-text systems are rapidly becoming more accurate and reliable. Medical science is on the road to developing brain-to-text systems, or Brain-Computer Interfaces (BCIs), enabling paralyzed individuals to translate mental, heard or spoken language directly from neural activity into text. Perhaps in the future, not only pen and paper will be obsolete, but even computer keyboards will be a quaint artifact of the early 21st century. But whatever technology we will be using, it will be – as ever – the collective product of the ingenuity and dreams of the entire human race.
Mullaney, Thomas The Chinese Computer: A Global History of the Information Age. The MIT Press. 2024.
“It’s Time to Get Over QWERTY” — A Q&A with Tom Mullaney on Alphabets, Chinese Characters, and Computing. LARB Los Angeles Review of Books, May 4, 2016.
https://languagelog.ldc.upenn.edu/nll/?p=72879&utm_source=rss&utm_medium=rss&utm_campaign=the-chinese-computer-competition-or-cooperation
https://languagelog.ldc.upenn.edu/nll/?p=72879