Question 1

What are the most common letters in English?

Accepted Answer

In order of frequency: E (12.7%), T (9.1%), A (8.2%), O (7.5%), I (7.0%), N (6.7%), S (6.3%), H (6.1%), R (6.0%), D (4.3%), L (4.0%), C (2.8%). The mnemonic "ETAOIN SHRDLU" covers the 12 most common — old Linotype operators knew this sequence by heart.

Question 2

What is the difference between character frequency and word frequency?

Accepted Answer

Character frequency counts individual letters (ignoring spaces and punctuation in most implementations). Word frequency counts whole words as tokens. For cryptanalysis, character frequency is key. For NLP and content analysis, word frequency (and its normalized version, relative frequency or TF-IDF) is more useful.

Question 3

What is Zipf's Law in word frequency?

Accepted Answer

Zipf's Law states that in natural language, the frequency of a word is inversely proportional to its rank: the 2nd most common word appears roughly half as often as the 1st, the 3rd roughly a third as often, and so on. In English, "the" appears about twice as often as "of", three times as often as "and". This power-law distribution appears in almost all natural language corpora.

Question 4

How does Index of Coincidence differ from simple frequency analysis?

Accepted Answer

Simple frequency analysis counts character occurrences. Index of Coincidence (IC) measures the probability that two randomly chosen characters are the same. English plaintext has IC ≈ 0.065; random text has IC ≈ 0.038. IC is used to detect polyalphabetic ciphers: a Vigenère cipher with key length N will have IC between random and English — useful for determining the key length before frequency analysis.

Frequency Analyzer

What is it and how does it work?

Common use cases

Frequently asked questions

What are the most common letters in English?

What is the difference between character frequency and word frequency?

What is Zipf's Law in word frequency?

How does Index of Coincidence differ from simple frequency analysis?

Data