Methodology

This page explains how we test theories, score results, and determine whether a cipher approach has been eliminated.

The K4 Ciphertext

K4 consists of 97 characters, and all 26 letters of the English alphabet appear at least once. The ciphertext is shown below with known plaintext positions highlighted:

OBKRUOXOGHULBSOLIFBBWFLRVQQPRNGKSSOTWTQSJQSSEKZZWATJKLUDIAWINFBNYPVTTMZFPKWGDKZXTJCDIGKUHUAUEKCAR

Known plaintext (positions numbered from 0, confirmed by Jim Sanborn):

Positions 21–33: EASTNORTHEAST (13 characters)
Positions 63–73: BERLINCLOCK (11 characters)

These 24 known characters are the foundation of our scoring system. They also reveal the internal key values the cipher uses at those positions, which constrain what methods and keys are possible.

Scoring: Crib Matching (0–24)

For any proposed decryption method and key, we check how many of the 24 known plaintext positions produce the correct letter. This gives a score from 0 to 24.

Score	Classification	Meaning
0–9	NOISE	Expected performance from a random key/method. Not recorded.
10–17	INTERESTING	Above noise floor. Recorded for completeness, but almost certainly coincidental.
18–23	SIGNAL	Statistically unusual. Warrants investigation, but may be a false positive at high periods.
24	FULL MATCH	All 24 cribs match. Requires Bean constraint validation, quadgram analysis, and human review before any claim is made.

These thresholds are conservative. With a random key and a 26-letter alphabet, you’d expect fewer than 1 letter out of 24 to match by pure luck. A score of 6 is already far beyond chance. The SIGNAL threshold at 18 is set high because longer keys can produce misleadingly high scores (see below).

Expected Random Scores by Period

A critical subtlety: how well a random key scores depends on the key’s length. Longer keys have fewer checks per key position, so random matches become more common, inflating scores artificially.

Period	Approx. Expected Random Score	Discriminative?
2–7	∼8 / 24	Yes, meaningful range
8	∼10 / 24	Marginally
13	∼14 / 24	Weak
17	∼17 / 24	No (random matches 17+)
24	∼19 / 24	No (random matches 19+)
26	∼20 / 24	No

These values are approximate, based on how many known-letter positions share each key position at a given key length. The key insight: at key length 17 or above, random keys routinely score above our SIGNAL threshold, making those scores meaningless. Only scores at key length 7 or below are reliable.

Bean Constraints

In 2021, Richard Bean published additional constraints on the K4 keystream, derived from the repeated letter P at positions 27 and 65 of the ciphertext (both of which decrypt to R). This gives:

Equality: The key values at positions 27 and 65 must be equal
242 inequalities: Hundreds of pairs of key positions that must differ, regardless of which cipher variant is used

These constraints hold no matter which cipher variant is used (Vigenère, Beaufort, or Variant Beaufort). Any valid solution must satisfy all of them, provided the cipher uses an additive key model (CT[i] = f(PT[i], K[i]) mod 26). If K4 uses a non-additive cipher (lookup table, physical overlay, or grid-based system), Bean constraints do not apply.

Combined with a counting argument about repeated key values, these constraints show that no repeating key of any length can produce the K4 plaintext under direct positional correspondence (CT[i] → PT[i]) and additive-key assumptions. This is a deterministic proof (Level A), reproducible from the codebase. It does not apply if a transposition layer reorders positions before substitution, or if the cipher is non-additive.

Two-System Model

Jim Sanborn stated at the Kryptos dedication that “there are two systems of enciphering the bottom text.” Scheidt stated that he “masked the English language so it’s more of a challenge” and that solvers need to “solve the technique first and then go for the puzzle.” Combined with the mathematical impossibility of periodic substitution at any period, the leading hypothesis is that K4 uses two layers: a steganographic layer (null mask) and a substitution cipher. The “two systems” statement is a public fact; the specific interpretation as null insertion plus substitution is our working model, not proven.

Pure transposition is independently impossible: the ciphertext contains only 2 E’s, but the known cribs require 3. Therefore at least one layer must be substitution. The working model (not proven) is:

PT (73 chars) → substitution cipher → filler insertion (24 chars) → carved text (97 chars)

Under one cipher model (KA-autokey Vigenère), a candidate null palette {B,G,I,K,O,W,Z} was identified at 17 positions (p ≈ 1 in 16,000 under that model). However, matched controls (April 2026) showed this palette is not privileged: nearly every random 7-letter palette produces equal or better cross-model mask agreement. The convergence improvement from palette constraints is a generic combinatorial property, not evidence for these specific letters. The palette identity has been retired as a finding. Palette constraints remain useful as a search regularizer.

The cipher layer remains open regardless of which two-layer model is correct. Periodic keys are proven impossible at all periods under direct positional correspondence. Running keys from 60,230 publicly available English texts (Project Gutenberg, 106 billion offset-checks) produced zero signal. The surviving possibilities include: a running key from a non-public source, a bespoke procedural cipher, or a one-time key.

What Has Been Exhaustively Tested

Over 475 experiments spanning 671.0B+ individual hypothesis evaluations have been run (with overlaps across experiments), eliminating:

All repeating-key ciphers (every variant, both alphabets, all key lengths): proven impossible under direct positional correspondence and additive-key assumptions (Level A)
All self-keying ciphers (every variant, all primers): proven structurally impossible under additive-key assumptions (Level A)
All substitution + rearrangement combinations (~1.2 billion evaluations): 14 rearrangement families
All Cold War-era cipher models (VIC family): extensively tested across multiple variants
All letter-pair ciphers (Four-Square, Playfair, Two-Square): apparent high scores are overfitting artifacts
Every specialized cipher we could find: Gromark (8.74 billion keys), plus Feistel, Gronsfeld, Porta, and more
Running key from 60,000 public texts (106 billion position-checks): zero signal
All rearrangements of the 73-character text: 4.5 million rearrangements tested with each cipher variant
Sculpture reading paths as keys: 10,777 paths tested, all noise
Grid-position-based keys: key derived from position on the grid, zero signal
Morse code hypotheses: multiple Morse-based encoding schemes, all noise

73-Character Hypothesis

The carved text has 97 characters. One working model proposes that 24 characters are nulls (97 − 24 = 73 real ciphertext characters). This is a hypothesis, not a proven fact.

The primary quantitative support was the palette restriction (7 of 26 letters at candidate null positions under KA-autokey Vigenère). However, matched controls (April 2026) retired this as evidence: the convergence effect is generic to palette-constrained search, not specific to {B,G,I,K,O,W,Z}. The 73-character model remains an open hypothesis but currently lacks quantitative support beyond Sanborn’s statement that K4 uses two systems.

Note: The number 24 also appears in other K4 contexts (24 crib characters, Berlin Clock has 24 facets, K3 chart has 24 rows). These coincidences are not evidence; many small integers recur naturally. They are documented here only because they are frequently mentioned in community discussions.

The Null Palette (Retired April 2026)

Under KA-autokey Vigenère, simulated annealing identified 17 candidate filler positions using only 7 letters {B,G,I,K,O,W,Z} (p ≈ 1/16,000 under that model). This was initially treated as a key observation.

Matched controls disproved the palette’s specificity. Among 100 random 7-letter palettes, BGIKOWZ ranked near the bottom for cross-model mask agreement (1st percentile). The convergence improvement from palette constraints is generic and largely combinatorial. Any sufficiently restrictive palette produces similar or better cross-model agreement. 76 of 133 single-letter-swap neighbors also outperformed BGIKOWZ.

Palette constraints remain useful as a computational technique (they make mask-inference problems tractable and exhaustively solvable), but BGIKOWZ is not a privileged signal. The Beaufort A=0 keystream enrichment (7/8 palette letters at BCL crib positions, uncorrected p = 0.0006) is a separate Level C observation that is ciphertext-intrinsic under the Beaufort convention but does not survive correction for the project’s full search breadth.

What Remains

Null mask + periodic substitution is proven impossible for ANY choice of 24 null positions at ANY period (algebraic proof extended to CT73 via Bean inequalities across all 11,440 candidate masks). If the 73-character model is correct, the cipher operating on the extracted characters must use a non-periodic key: a running key from an unknown source, a bespoke procedural method, or a one-time key.

Validation Criteria

A candidate solution is not accepted unless it passes all of the following simultaneously:

Crib score: 24/24
Bean constraints: PASS
Text quality: letter patterns must match normal English (measured by how common its 4-letter sequences are)
Letter frequency: must match the statistical profile of English text
Readability: must produce meaningful English with recognizable words (human review required)

How to Read Claims on This Site

We classify every claim by its evidence strength. When reading results on this site:

Level A: Proven within stated assumptions: Mathematical proof or complete enumeration. Always conditioned on explicit assumptions (e.g., correct cribs, additive key model). If you disagree with the assumptions, the proof does not apply.
Level B: Exhaustively negative within tested scope: Every configuration in a defined parameter space was tested and produced noise. Does not extend to untested variants or multi-layer combinations. “Ruled out within tested scope” means exactly that.
Level C: Descriptive anomaly: A pattern discovered post-hoc (from the data, not predicted in advance). P-values are uncorrected for search breadth unless stated otherwise. Does not prove how K4 was encrypted.
Level D: Hypothesis: A plausible conjecture without quantitative support. Labeled “hypothesis” or “open question.”

All p-values on this site are uncorrected for the project’s full search breadth (~1000 experiments) unless explicitly stated. Over that many tests, individually “significant” results are expected by chance. We document them for transparency, not as proof.

Truth Taxonomy

Every claim in our database is classified by its evidence level:

[PUBLIC FACT]: Verified by reputable public reporting or primary-source statements.
[DERIVED FACT]: Deterministic consequence of public facts, reproducible by a provided command.
[INTERNAL RESULT]: Empirical result from this project. Includes artifact pointers and a reproduction command.
[HYPOTHESIS]: Plausible claim not yet proven. Includes a test plan.

Confidence Tiers

Tier 1: Mathematical Proof: Algebraic proof that the method cannot produce the known plaintext. Permanently valid unless crib positions or ciphertext transcription are wrong.
Tier 2: Exhaustive Search: Every possible configuration was tested under stated assumptions. Solid for the specific model tested, but does not eliminate multi-layer variants.
Tier 3: Partial/Statistical: Sampling-based or incomplete coverage. May warrant re-testing under different assumptions.
Tier 4: Untested: Never properly tested. Fully open.

Reproducibility

Every elimination includes a reproduction command you can run yourself. The entire codebase is open source. Clone the repo, install Python 3.11+, and run any experiment with PYTHONPATH=src.