Might Qwerty be optimal on touchscreens?

It’s a common misconception that the Qwerty keyboard is designed to slow users down to prevent typewriters jamming. It fact, it’s designed to keep commonly consecutive letter pairs apart, so that two adjacent levers won’t collide.

(A more fun, but irrelevant, Qwerty story is that it is also designed such that the word ‘typewriter’ is all on the top row, to make demonstrating it easy. This story, if true, is itself fun but sucks all the fun out of the fact that the longest word that can be typed on the top row of a typewriter is ‘typewriter’. One of these is a fun fact, but I’ve no idea which.)

Nowadays, obviously, there are no swinging arms to collide, so we want the commonly-used keys to be reachable, and if possible to alternate hands as much as possible. Dvorak and Coleman have each had a stab at designing a better layout, but both aimed at the computer keyboard.

But increasingly, I type on my phone, using one very mobile thumb. I can get to any point on the screen, more-or-less right away — but sometimes I miss, and usually the phone figures out what I meant and autocorrects it. So maybe the most important thing about any given keyboard layout is how likely it is that a typo will result in a real word that the phone isn’t to know isn’t what I meant.

I wondered if suddenly Qwerty might be optimal again — separating pairs of letters that can be swapped to make another real word and that appear next to each other in English words aren’t totally different goals. So I thought I’d investigate.

So first I loaded the CSW12 Scrabble word list, and worked out a big table of how many places in the list you can replace each letter with each other letter to create a new word.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A 176 681 305 5253 182 216 252 4180 15 208 477 166 401 4717 297 5 483 953 453 2898 52 201 54 585 30
B 176 1240 1238 284 1157 1123 775 91 360 406 1003 1383 761 157 1579 14 1407 1127 1344 74 398 836 70 366 151
C 681 1240 1109 375 876 1218 843 162 261 1102 1004 1036 1379 265 1418 24 1117 1832 1783 120 418 816 153 274 209
D 305 1238 1109 715 776 1445 709 205 299 942 1600 1467 1838 227 1242 18 7549 10979 2348 107 549 726 135 616 307
E 5253 284 375 715 186 553 454 4712 22 459 938 1010 683 3434 470 4 956 2123 1734 1893 87 291 57 2162 57
F 182 1157 876 776 186 723 592 76 261 357 846 829 632 126 1040 11 782 1037 1121 57 387 620 30 203 89
G 216 1123 1218 1445 553 723 556 186 333 747 812 810 1013 191 1018 39 914 1194 1514 103 373 669 130 371 180
H 252 775 843 709 454 592 556 137 261 641 1186 979 635 260 1098 6 1079 1215 1453 81 239 805 34 356 137
I 4180 91 162 205 4712 76 186 137 35 129 519 130 380 2786 154 3 387 525 331 2671 42 200 23 1118 28
J 15 360 261 299 22 261 333 261 35 117 311 301 224 27 331 2 344 320 346 1 125 196 7 136 66
K 208 406 1102 942 459 357 747 641 129 117 947 779 892 145 911 50 816 929 1340 71 379 517 119 273 180
L 477 1003 1004 1600 938 846 812 1186 519 311 947 1420 1938 479 1363 15 3271 1876 2049 286 599 899 142 394 248
M 166 1383 1036 1467 1010 829 810 979 130 301 779 1420 1085 238 1898 14 1321 1225 2912 119 575 747 130 380 250
N 401 761 1379 1838 683 632 1013 635 380 224 892 1938 1085 286 1367 7 2439 1925 2091 301 529 711 259 440 250
O 4717 157 265 227 3434 126 191 260 2786 27 145 479 238 286 296 6 574 531 389 2291 58 314 36 596 27
P 297 1579 1418 1242 470 1040 1018 1098 154 331 911 1363 1898 1367 296 11 1255 1528 2061 166 580 1011 141 377 243
Q 5 14 24 18 4 11 39 6 3 2 50 15 14 7 6 11 0 12 30 16 5 10 2 2 2
R 483 1407 1117 7549 956 782 914 1079 387 344 816 3271 1321 2439 574 1255 12 4806 2173 447 591 885 205 613 250
S 953 1127 1832 10979 2123 1037 1194 1215 525 320 929 1876 1225 1925 531 1528 30 4806 3126 327 617 887 232 2621 6540
T 453 1344 1783 2348 1734 1121 1514 1453 331 346 1340 2049 2912 2091 389 2061 16 2173 3126 256 682 1187 215 602 429
U 2898 74 120 107 1893 57 103 81 2671 1 71 286 119 301 2291 166 0 447 327 256 43 416 15 239 12
V 52 398 418 549 87 387 373 239 42 125 379 599 575 529 58 580 5 591 617 682 43 353 96 142 154
W 201 836 816 726 291 620 669 805 200 196 517 899 747 711 314 1011 10 885 887 1187 416 353 108 400 136
X 54 70 153 135 57 30 130 34 23 7 119 142 130 259 36 141 2 205 232 215 15 96 108 74 58
Y 585 366 274 616 2162 203 371 356 1118 136 273 394 380 440 596 377 2 613 2621 602 239 142 400 74 108
Z 30 151 209 307 57 89 180 137 28 66 180 248 250 250 27 243 2 250 6540 429 12 154 136 58 108


As you can see, the letters involved in typos that are genuine words are also the most common letters — except C and P. (The frequency values are on an arbitrary scale to match the typo figures.)

Then I wrote a Python routine to generate a ‘badness’ score for each layout, which is the total number of words you can make by replacing a letter of another word with one of the six keys adjacent to it. Running it on 10,000 random layouts, the average badness is around 83,603, with a standard deviation of 14,024.

Here are some other layouts I tried:

Layout Badness STDs above mean
Qwerty 119,170 2.54
Dvorak 121,458 2.70
Colemak 112,354 2.05
Best random 46,414 −2.65
Worst random 151,438 4.84
Alphabetic 74,064 −0.68
Best I found 31,992 −3.68

(Predictable answer to question in title: “haha, no”.) Alphabetic uses the same key layout as Qwerty: 10 on the top row, 9 on the second and 7 on the bottom. The ‘best I found’ layout was derived from a random board on that Qwerty grid (since actually Dvorak and Coleman don’t really fit on a phone), by swapping letter pairs at random and keeping the change if it seemed to work. (This is called a ‘genetic algorithm’, albeit a crude one.) I think I did 5,000 steps, five or six times. Here’s the layout it found:

D W E B K R I T Q S
  O J V U Z F X A M  
  C L G H N Y P  

The most obvious thing it’s done is put S (the most typoable letter) in a corner and shoved Q up against it. Another potential improvement to the model is to account for second-nearest neighbours — since flagging an error but correcting it to the wrong thing isn’t much better than missing it.

Another thing it’s done is put all the rarest letters in the middle where they have lots of neighbours — almost precisely the opposite of what Dvorak and Coleman did. Which makes sense, both intuitively and because all the standard layouts are in the worst 5% of all layouts (assuming normal distribution).

Anyway, I think we can all agree this is plainly the best possible keyboard layout for smartphones, and we should name it Taylak and petition Apple and Google to include it as the default for everything ever. I certainly can’t imagine how using the same layout on phones and computers could possibly be more desirable than this.

Here, to end on, is the worst layout I could find, with 204,290 = μ + 8.61σ possible real-world typos:

V N T M B G E I J Q
  Z S D P C K A O X  
  Y R L F H W U  

Nobody use that layout.

9 thoughts on “Might Qwerty be optimal on touchscreens?

  1. “since actually Dvorak and Coleman don’t really fit on a phone”

    ?

    I use Dvorak on my phone and don’t have a problem with it – either portrait or landscape :)

    There’s even a decent dictionary with it that catches most of my typos. It’s slowly learning the ‘typos’ that aren’t actually typos but computer/programming related words that aren’t normally common.

    • I guess I mean that they’re funny shapes. It’s fine on a physical keyboard because you just move the punctuation to fit, but there is no punctuation on an iPhone keyboard so that’s not really an option.

      Qwerty is 10 keys wide. Colemak and Dvorak are 10.5 keys wide. You can get them on the screen, sure, but it’s a weird thing to do to a touchscreen unless you already use one of those layouts on your PC (and even then, see #2). Also, Colemak and Dvorak have funny spurs and letters out on their own, which artificially reduces their badness scores, as in fact you’re likely to mistakenly end the word. My model doesn’t account for that, and in any case whether they fit or not they’re a perverse shape to use for an all-new phone keyboard layout, so I didn’t.

      • It’s not an iPhone :) And the (portrait) layout is (including non-letter keys) 9, 10, 9 (6)

        [Case] p y f g c r l
        a o e u i d h t n l
        q j k x b m w v z
        [?123][opt][speak][ s p a c e ][.][enter]

        Landscape separates that into two halves and puts a (4,4,4) numpad in the middle.

        Punctuation (apart from period) is (largely) accessible by holding down a key (1-0 are across the home row e.g.) or by using the [?123] button to switch the keyboard.

        I tried searching for images but couldn’t find anything close enough to match.

    • In fact it may be better to use a 6,7,7,6 layout for phones. It’d probably have a higher ‘badness’ score, but squarer keys. It would depend whether people were better at aiming thumbs horizontally or vertically (and whether you like typing portrait or landscape).

      Also my model assumes the standard hexagonal layout — which on a phone should surely have hexagonal keys, even if the on-screen graphic is square? I don’t know if there’s any good reason to use that on a phone. Indeed, the iPhone Qwerty board has square tessellation at the bottom so the numbers here aren’t directly transferable.

      There are many refinements that could be made.

  2. I used Colemak on my phone for a while and found it quite awkward to use. On the other hand, I haven’t found using different layouts on my phone and on proper keyboards to be a problem at all.

    If I can work out an easy way to put your layout on my phone I might try it out, just for kicks.

    • I use Colemak on my PCs (used to use Dvorak but the keyboard shortcuts drove me a bit mad — having paste next to ‘please delete all my work’ isn’t good) and I quite like having Qwerty on my phone so I don’t unlearn it completely.

  3. Really I should have used a passage of real English for the source of typos — nobody sits down and types out the Scrabble word list on their phone — but my experience with the Scrabble hacking a bit back suggests that the word list behaves much like real English, and using the same data for typed and valid words gives a symmetrical table that’s faster to compute and gives a handy parity check on the badness scores (they are all even).

  4. Pingback: State of Data #115 « Dr Data's Blog

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>