It’s a common misconception that the Qwerty keyboard is designed to slow users down to prevent typewriters jamming. It fact, it’s designed to keep commonly consecutive letter pairs apart, so that two adjacent levers won’t collide.
(A more fun, but irrelevant, Qwerty story is that it is also designed such that the word ‘typewriter’ is all on the top row, to make demonstrating it easy. This story, if true, is itself fun but sucks all the fun out of the fact that the longest word that can be typed on the top row of a typewriter is ‘typewriter’. One of these is a fun fact, but I’ve no idea which.)
Nowadays, obviously, there are no swinging arms to collide, so we want the commonly-used keys to be reachable, and if possible to alternate hands as much as possible. Dvorak and Coleman have each had a stab at designing a better layout, but both aimed at the computer keyboard.
But increasingly, I type on my phone, using one very mobile thumb. I can get to any point on the screen, more-or-less right away — but sometimes I miss, and usually the phone figures out what I meant and autocorrects it. So maybe the most important thing about any given keyboard layout is how likely it is that a typo will result in a real word that the phone isn’t to know isn’t what I meant.
I wondered if suddenly Qwerty might be optimal again — separating pairs of letters that can be swapped to make another real word and that appear next to each other in English words aren’t totally different goals. So I thought I’d investigate.
So first I loaded the CSW12 Scrabble word list, and worked out a big table of how many places in the list you can replace each letter with each other letter to create a new word.
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
| A | 176 | 681 | 305 | 5253 | 182 | 216 | 252 | 4180 | 15 | 208 | 477 | 166 | 401 | 4717 | 297 | 5 | 483 | 953 | 453 | 2898 | 52 | 201 | 54 | 585 | 30 | |
| B | 176 | 1240 | 1238 | 284 | 1157 | 1123 | 775 | 91 | 360 | 406 | 1003 | 1383 | 761 | 157 | 1579 | 14 | 1407 | 1127 | 1344 | 74 | 398 | 836 | 70 | 366 | 151 | |
| C | 681 | 1240 | 1109 | 375 | 876 | 1218 | 843 | 162 | 261 | 1102 | 1004 | 1036 | 1379 | 265 | 1418 | 24 | 1117 | 1832 | 1783 | 120 | 418 | 816 | 153 | 274 | 209 | |
| D | 305 | 1238 | 1109 | 715 | 776 | 1445 | 709 | 205 | 299 | 942 | 1600 | 1467 | 1838 | 227 | 1242 | 18 | 7549 | 10979 | 2348 | 107 | 549 | 726 | 135 | 616 | 307 | |
| E | 5253 | 284 | 375 | 715 | 186 | 553 | 454 | 4712 | 22 | 459 | 938 | 1010 | 683 | 3434 | 470 | 4 | 956 | 2123 | 1734 | 1893 | 87 | 291 | 57 | 2162 | 57 | |
| F | 182 | 1157 | 876 | 776 | 186 | 723 | 592 | 76 | 261 | 357 | 846 | 829 | 632 | 126 | 1040 | 11 | 782 | 1037 | 1121 | 57 | 387 | 620 | 30 | 203 | 89 | |
| G | 216 | 1123 | 1218 | 1445 | 553 | 723 | 556 | 186 | 333 | 747 | 812 | 810 | 1013 | 191 | 1018 | 39 | 914 | 1194 | 1514 | 103 | 373 | 669 | 130 | 371 | 180 | |
| H | 252 | 775 | 843 | 709 | 454 | 592 | 556 | 137 | 261 | 641 | 1186 | 979 | 635 | 260 | 1098 | 6 | 1079 | 1215 | 1453 | 81 | 239 | 805 | 34 | 356 | 137 | |
| I | 4180 | 91 | 162 | 205 | 4712 | 76 | 186 | 137 | 35 | 129 | 519 | 130 | 380 | 2786 | 154 | 3 | 387 | 525 | 331 | 2671 | 42 | 200 | 23 | 1118 | 28 | |
| J | 15 | 360 | 261 | 299 | 22 | 261 | 333 | 261 | 35 | 117 | 311 | 301 | 224 | 27 | 331 | 2 | 344 | 320 | 346 | 1 | 125 | 196 | 7 | 136 | 66 | |
| K | 208 | 406 | 1102 | 942 | 459 | 357 | 747 | 641 | 129 | 117 | 947 | 779 | 892 | 145 | 911 | 50 | 816 | 929 | 1340 | 71 | 379 | 517 | 119 | 273 | 180 | |
| L | 477 | 1003 | 1004 | 1600 | 938 | 846 | 812 | 1186 | 519 | 311 | 947 | 1420 | 1938 | 479 | 1363 | 15 | 3271 | 1876 | 2049 | 286 | 599 | 899 | 142 | 394 | 248 | |
| M | 166 | 1383 | 1036 | 1467 | 1010 | 829 | 810 | 979 | 130 | 301 | 779 | 1420 | 1085 | 238 | 1898 | 14 | 1321 | 1225 | 2912 | 119 | 575 | 747 | 130 | 380 | 250 | |
| N | 401 | 761 | 1379 | 1838 | 683 | 632 | 1013 | 635 | 380 | 224 | 892 | 1938 | 1085 | 286 | 1367 | 7 | 2439 | 1925 | 2091 | 301 | 529 | 711 | 259 | 440 | 250 | |
| O | 4717 | 157 | 265 | 227 | 3434 | 126 | 191 | 260 | 2786 | 27 | 145 | 479 | 238 | 286 | 296 | 6 | 574 | 531 | 389 | 2291 | 58 | 314 | 36 | 596 | 27 | |
| P | 297 | 1579 | 1418 | 1242 | 470 | 1040 | 1018 | 1098 | 154 | 331 | 911 | 1363 | 1898 | 1367 | 296 | 11 | 1255 | 1528 | 2061 | 166 | 580 | 1011 | 141 | 377 | 243 | |
| Q | 5 | 14 | 24 | 18 | 4 | 11 | 39 | 6 | 3 | 2 | 50 | 15 | 14 | 7 | 6 | 11 | 0 | 12 | 30 | 16 | 5 | 10 | 2 | 2 | 2 | |
| R | 483 | 1407 | 1117 | 7549 | 956 | 782 | 914 | 1079 | 387 | 344 | 816 | 3271 | 1321 | 2439 | 574 | 1255 | 12 | 4806 | 2173 | 447 | 591 | 885 | 205 | 613 | 250 | |
| S | 953 | 1127 | 1832 | 10979 | 2123 | 1037 | 1194 | 1215 | 525 | 320 | 929 | 1876 | 1225 | 1925 | 531 | 1528 | 30 | 4806 | 3126 | 327 | 617 | 887 | 232 | 2621 | 6540 | |
| T | 453 | 1344 | 1783 | 2348 | 1734 | 1121 | 1514 | 1453 | 331 | 346 | 1340 | 2049 | 2912 | 2091 | 389 | 2061 | 16 | 2173 | 3126 | 256 | 682 | 1187 | 215 | 602 | 429 | |
| U | 2898 | 74 | 120 | 107 | 1893 | 57 | 103 | 81 | 2671 | 1 | 71 | 286 | 119 | 301 | 2291 | 166 | 0 | 447 | 327 | 256 | 43 | 416 | 15 | 239 | 12 | |
| V | 52 | 398 | 418 | 549 | 87 | 387 | 373 | 239 | 42 | 125 | 379 | 599 | 575 | 529 | 58 | 580 | 5 | 591 | 617 | 682 | 43 | 353 | 96 | 142 | 154 | |
| W | 201 | 836 | 816 | 726 | 291 | 620 | 669 | 805 | 200 | 196 | 517 | 899 | 747 | 711 | 314 | 1011 | 10 | 885 | 887 | 1187 | 416 | 353 | 108 | 400 | 136 | |
| X | 54 | 70 | 153 | 135 | 57 | 30 | 130 | 34 | 23 | 7 | 119 | 142 | 130 | 259 | 36 | 141 | 2 | 205 | 232 | 215 | 15 | 96 | 108 | 74 | 58 | |
| Y | 585 | 366 | 274 | 616 | 2162 | 203 | 371 | 356 | 1118 | 136 | 273 | 394 | 380 | 440 | 596 | 377 | 2 | 613 | 2621 | 602 | 239 | 142 | 400 | 74 | 108 | |
| Z | 30 | 151 | 209 | 307 | 57 | 89 | 180 | 137 | 28 | 66 | 180 | 248 | 250 | 250 | 27 | 243 | 2 | 250 | 6540 | 429 | 12 | 154 | 136 | 58 | 108 |
As you can see, the letters involved in typos that are genuine words are also the most common letters — except C and P. (The frequency values are on an arbitrary scale to match the typo figures.)
Then I wrote a Python routine to generate a ‘badness’ score for each layout, which is the total number of words you can make by replacing a letter of another word with one of the six keys adjacent to it. Running it on 10,000 random layouts, the average badness is around 83,603, with a standard deviation of 14,024.
Here are some other layouts I tried:
| Layout | Badness | STDs above mean |
|---|---|---|
| Qwerty | 119,170 | 2.54 |
| Dvorak | 121,458 | 2.70 |
| Colemak | 112,354 | 2.05 |
| Best random | 46,414 | −2.65 |
| Worst random | 151,438 | 4.84 |
| Alphabetic | 74,064 | −0.68 |
| Best I found | 31,992 | −3.68 |
(Predictable answer to question in title: “haha, no”.) Alphabetic uses the same key layout as Qwerty: 10 on the top row, 9 on the second and 7 on the bottom. The ‘best I found’ layout was derived from a random board on that Qwerty grid (since actually Dvorak and Coleman don’t really fit on a phone), by swapping letter pairs at random and keeping the change if it seemed to work. (This is called a ‘genetic algorithm’, albeit a crude one.) I think I did 5,000 steps, five or six times. Here’s the layout it found:
| D | W | E | B | K | R | I | T | Q | S | ||||||||||
| O | J | V | U | Z | F | X | A | M | |||||||||||
| C | L | G | H | N | Y | P | |||||||||||||
The most obvious thing it’s done is put S (the most typoable letter) in a corner and shoved Q up against it. Another potential improvement to the model is to account for second-nearest neighbours — since flagging an error but correcting it to the wrong thing isn’t much better than missing it.
Another thing it’s done is put all the rarest letters in the middle where they have lots of neighbours — almost precisely the opposite of what Dvorak and Coleman did. Which makes sense, both intuitively and because all the standard layouts are in the worst 5% of all layouts (assuming normal distribution).
Anyway, I think we can all agree this is plainly the best possible keyboard layout for smartphones, and we should name it Taylak and petition Apple and Google to include it as the default for everything ever. I certainly can’t imagine how using the same layout on phones and computers could possibly be more desirable than this.
Here, to end on, is the worst layout I could find, with 204,290 = μ + 8.61σ possible real-world typos:
| V | N | T | M | B | G | E | I | J | Q | ||||||||||
| Z | S | D | P | C | K | A | O | X | |||||||||||
| Y | R | L | F | H | W | U | |||||||||||||
Nobody use that layout.
“since actually Dvorak and Coleman don’t really fit on a phone”
?
I use Dvorak on my phone and don’t have a problem with it – either portrait or landscape :)
There’s even a decent dictionary with it that catches most of my typos. It’s slowly learning the ‘typos’ that aren’t actually typos but computer/programming related words that aren’t normally common.
I guess I mean that they’re funny shapes. It’s fine on a physical keyboard because you just move the punctuation to fit, but there is no punctuation on an iPhone keyboard so that’s not really an option.
Qwerty is 10 keys wide. Colemak and Dvorak are 10.5 keys wide. You can get them on the screen, sure, but it’s a weird thing to do to a touchscreen unless you already use one of those layouts on your PC (and even then, see #2). Also, Colemak and Dvorak have funny spurs and letters out on their own, which artificially reduces their badness scores, as in fact you’re likely to mistakenly end the word. My model doesn’t account for that, and in any case whether they fit or not they’re a perverse shape to use for an all-new phone keyboard layout, so I didn’t.
It’s not an iPhone :) And the (portrait) layout is (including non-letter keys) 9, 10, 9 (6)
[Case] p y f g c r l
a o e u i d h t n l
q j k x b m w v z
[?123][opt][speak][ s p a c e ][.][enter]
Landscape separates that into two halves and puts a (4,4,4) numpad in the middle.
Punctuation (apart from period) is (largely) accessible by holding down a key (1-0 are across the home row e.g.) or by using the [?123] button to switch the keyboard.
I tried searching for images but couldn’t find anything close enough to match.
In fact it may be better to use a 6,7,7,6 layout for phones. It’d probably have a higher ‘badness’ score, but squarer keys. It would depend whether people were better at aiming thumbs horizontally or vertically (and whether you like typing portrait or landscape).
Also my model assumes the standard hexagonal layout — which on a phone should surely have hexagonal keys, even if the on-screen graphic is square? I don’t know if there’s any good reason to use that on a phone. Indeed, the iPhone Qwerty board has square tessellation at the bottom so the numbers here aren’t directly transferable.
There are many refinements that could be made.
I used Colemak on my phone for a while and found it quite awkward to use. On the other hand, I haven’t found using different layouts on my phone and on proper keyboards to be a problem at all.
If I can work out an easy way to put your layout on my phone I might try it out, just for kicks.
I use Colemak on my PCs (used to use Dvorak but the keyboard shortcuts drove me a bit mad — having paste next to ‘please delete all my work’ isn’t good) and I quite like having Qwerty on my phone so I don’t unlearn it completely.
Really I should have used a passage of real English for the source of typos — nobody sits down and types out the Scrabble word list on their phone — but my experience with the Scrabble hacking a bit back suggests that the word list behaves much like real English, and using the same data for typed and valid words gives a symmetrical table that’s faster to compute and gives a handy parity check on the badness scores (they are all even).
Pingback: State of Data #115 « Dr Data's Blog
Ben Goldacre just posted this link
http://norvig.com/mayzner.html
Loads of better data there for doing this more thoroughly.