Might Qwerty be optimal on touchscreens?

July 2012

It’s a common misconception that the Qwerty keyboard is designed to slow users down to prevent typewriters jamming. It fact, it’s designed to keep commonly consecutive letter pairs apart, so that two adjacent levers won’t collide.

(A more fun, but irrelevant, Qwerty story is that it is also designed such that the word ‘typewriter’ is all on the top row, to make demonstrating it easy. This story, if true, is itself fun but sucks all the fun out of the fact that the longest word that can be typed on the top row of a typewriter is ‘typewriter’. One of these is a fun fact, but I’ve no idea which.)

Nowadays, obviously, there are no swinging arms to collide, so we want the commonly-used keys to be reachable, and if possible to alternate hands as much as possible. Dvorak and Coleman have each had a stab at designing a better layout, but both aimed at the computer keyboard.

But increasingly, I type on my phone, using one very mobile thumb. I can get to any point on the screen, more-or-less right away – but sometimes I miss, and usually the phone figures out what I meant and autocorrects it. So maybe the most important thing about any given keyboard layout is how likely it is that a typo will result in a real word that the phone isn’t to know isn’t what I meant.

I wondered if suddenly Qwerty might be optimal again – separating pairs of letters that can be swapped to make another real word and that appear next to each other in English words aren’t totally different goals. So I thought I’d investigate.

So first I loaded the CSW12 Scrabble word list, and worked out a big table of how many places in the list you can replace each letter with each other letter to create a new word.

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z
A		176	681	305	5253	182	216	252	4180	15	208	477	166	401	4717	297	5	483	953	453	2898	52	201	54	585	30
B	176		1240	1238	284	1157	1123	775	91	360	406	1003	1383	761	157	1579	14	1407	1127	1344	74	398	836	70	366	151
C	681	1240		1109	375	876	1218	843	162	261	1102	1004	1036	1379	265	1418	24	1117	1832	1783	120	418	816	153	274	209
D	305	1238	1109		715	776	1445	709	205	299	942	1600	1467	1838	227	1242	18	7549	10979	2348	107	549	726	135	616	307
E	5253	284	375	715		186	553	454	4712	22	459	938	1010	683	3434	470	4	956	2123	1734	1893	87	291	57	2162	57
F	182	1157	876	776	186		723	592	76	261	357	846	829	632	126	1040	11	782	1037	1121	57	387	620	30	203	89
G	216	1123	1218	1445	553	723		556	186	333	747	812	810	1013	191	1018	39	914	1194	1514	103	373	669	130	371	180
H	252	775	843	709	454	592	556		137	261	641	1186	979	635	260	1098	6	1079	1215	1453	81	239	805	34	356	137
I	4180	91	162	205	4712	76	186	137		35	129	519	130	380	2786	154	3	387	525	331	2671	42	200	23	1118	28
J	15	360	261	299	22	261	333	261	35		117	311	301	224	27	331	2	344	320	346	1	125	196	7	136	66
K	208	406	1102	942	459	357	747	641	129	117		947	779	892	145	911	50	816	929	1340	71	379	517	119	273	180
L	477	1003	1004	1600	938	846	812	1186	519	311	947		1420	1938	479	1363	15	3271	1876	2049	286	599	899	142	394	248
M	166	1383	1036	1467	1010	829	810	979	130	301	779	1420		1085	238	1898	14	1321	1225	2912	119	575	747	130	380	250
N	401	761	1379	1838	683	632	1013	635	380	224	892	1938	1085		286	1367	7	2439	1925	2091	301	529	711	259	440	250
O	4717	157	265	227	3434	126	191	260	2786	27	145	479	238	286		296	6	574	531	389	2291	58	314	36	596	27
P	297	1579	1418	1242	470	1040	1018	1098	154	331	911	1363	1898	1367	296		11	1255	1528	2061	166	580	1011	141	377	243
Q	5	14	24	18	4	11	39	6	3	2	50	15	14	7	6	11	0	12	30	16		5	10	2	2	2
R	483	1407	1117	7549	956	782	914	1079	387	344	816	3271	1321	2439	574	1255	12		4806	2173	447	591	885	205	613	250
S	953	1127	1832	10979	2123	1037	1194	1215	525	320	929	1876	1225	1925	531	1528	30	4806		3126	327	617	887	232	2621	6540
T	453	1344	1783	2348	1734	1121	1514	1453	331	346	1340	2049	2912	2091	389	2061	16	2173	3126		256	682	1187	215	602	429
U	2898	74	120	107	1893	57	103	81	2671	1	71	286	119	301	2291	166	0	447	327	256		43	416	15	239	12
V	52	398	418	549	87	387	373	239	42	125	379	599	575	529	58	580	5	591	617	682	43		353	96	142	154
W	201	836	816	726	291	620	669	805	200	196	517	899	747	711	314	1011	10	885	887	1187	416	353		108	400	136
X	54	70	153	135	57	30	130	34	23	7	119	142	130	259	36	141	2	205	232	215	15	96	108		74	58
Y	585	366	274	616	2162	203	371	356	1118	136	273	394	380	440	596	377	2	613	2621	602	239	142	400	74		108
Z	30	151	209	307	57	89	180	137	28	66	180	248	250	250	27	243	2	250	6540	429	12	154	136	58	108

As you can see, the letters involved in typos that are genuine words are also the most common letters – except C and P. (The frequency values are on an arbitrary scale to match the typo figures.)

Then I wrote a Python routine to generate a ‘badness’ score for each layout, which is the total number of words you can make by replacing a letter of another word with one of the six keys adjacent to it. Running it on 10,000 random layouts, the average badness is around 83,603, with a standard deviation of 14,024.

Here are some other layouts I tried:

Layout	Badness	STDs above mean
Qwerty	119,170	2.54
Dvorak	121,458	2.70
Colemak	112,354	2.05
Best random	46,414	−2.65
Worst random	151,438	4.84
Alphabetic	74,064	−0.68
Best I found	31,992	−3.68

(Predictable answer to question in title: “haha, no”.) Alphabetic uses the same key layout as Qwerty: 10 on the top row, 9 on the second and 7 on the bottom. The ‘best I found’ layout was derived from a random board on that Qwerty grid (since actually Dvorak and Coleman don’t really fit on a phone), by swapping letter pairs at random and keeping the change if it seemed to work. I think I did 5,000 steps, five or six times. Here’s the layout it found:

The most obvious thing it’s done is put S (the most typoable letter) in a corner and shoved Q up against it. Another potential improvement to the model is to account for second-nearest neighbours – since flagging an error but correcting it to the wrong thing isn’t much better than missing it.

Another thing it’s done is put all the rarest letters in the middle where they have lots of neighbours – almost precisely the opposite of what Dvorak and Coleman did. Which makes sense, both intuitively and because all the standard layouts are in the worst 5% of all layouts (assuming normal distribution).

Anyway, I think we can all agree this is plainly the best possible keyboard layout for smartphones, and we should name it Taylak and petition Apple and Google to include it as the default for everything ever. I certainly can’t imagine how using the same layout on phones and computers could possibly be more desirable than this.

Here, to end on, is the worst layout I could find, with 204,290 = μ + 8.61σ possible real-world typos:

Nobody use that layout.