671.0B+ configurations tested and eliminated

Methodology

This page explains how we test theories, score results, and determine whether a cipher approach has been eliminated.

The K4 Ciphertext

K4 consists of 97 characters, and all 26 letters of the English alphabet appear at least once. The ciphertext is shown below with known plaintext positions highlighted:

OBKRUOXOGHULBSOLIFBBWFLRVQQPRNGKSSOTWTQSJQSSEKZZWATJKLUDIAWINFBNYPVTTMZFPKWGDKZXTJCDIGKUHUAUEKCAR

Known plaintext (positions numbered from 0, confirmed by Jim Sanborn):

These 24 known characters are the foundation of our scoring system. They also reveal the internal key values the cipher uses at those positions, which constrain what methods and keys are possible.

Scoring: Crib Matching (0–24)

For any proposed decryption method and key, we check how many of the 24 known plaintext positions produce the correct letter. This gives a score from 0 to 24.

ScoreClassificationMeaning
0–9NOISEExpected performance from a random key/method. Not recorded.
10–17INTERESTINGAbove noise floor. Recorded for completeness, but almost certainly coincidental.
18–23SIGNALStatistically unusual. Warrants investigation, but may be a false positive at high periods.
24FULL MATCHAll 24 cribs match. Requires Bean constraint validation, quadgram analysis, and human review before any claim is made.

These thresholds are conservative. With a random key and a 26-letter alphabet, you’d expect fewer than 1 letter out of 24 to match by pure luck. A score of 6 is already far beyond chance. The SIGNAL threshold at 18 is set high because longer keys can produce misleadingly high scores (see below).

Expected Random Scores by Period

A critical subtlety: how well a random key scores depends on the key’s length. Longer keys have fewer checks per key position, so random matches become more common, inflating scores artificially.

PeriodApprox. Expected Random ScoreDiscriminative?
2–7∼8 / 24Yes, meaningful range
8∼10 / 24Marginally
13∼14 / 24Weak
17∼17 / 24No (random matches 17+)
24∼19 / 24No (random matches 19+)
26∼20 / 24No

These values are approximate, based on how many known-letter positions share each key position at a given key length. The key insight: at key length 17 or above, random keys routinely score above our SIGNAL threshold, making those scores meaningless. Only scores at key length 7 or below are reliable.

Bean Constraints

In 2021, Richard Bean published additional constraints on the K4 keystream, derived from the repeated letter P at positions 27 and 65 of the ciphertext (both of which decrypt to R). This gives:

These constraints hold no matter which cipher variant is used (Vigenère, Beaufort, or Variant Beaufort). Any valid solution must satisfy all of them, provided the cipher uses an additive key model (CT[i] = f(PT[i], K[i]) mod 26). If K4 uses a non-additive cipher (lookup table, physical overlay, or grid-based system), Bean constraints do not apply.

Combined with a counting argument about repeated key values, these constraints show that no repeating key of any length can produce the K4 plaintext under direct positional correspondence (CT[i] → PT[i]) and additive-key assumptions. This is a deterministic proof (Level A), reproducible from the codebase. It does not apply if a transposition layer reorders positions before substitution, or if the cipher is non-additive.

Two-System Model

Jim Sanborn stated at the Kryptos dedication that “there are two systems of enciphering the bottom text.” Scheidt stated that he “masked the English language so it’s more of a challenge” and that solvers need to “solve the technique first and then go for the puzzle.” Combined with the mathematical impossibility of periodic substitution at any period, the leading hypothesis is that K4 uses two layers: a steganographic layer (null mask) and a substitution cipher. The “two systems” statement is a public fact; the specific interpretation as null insertion plus substitution is our working model, not proven.

Pure transposition is independently impossible: the ciphertext contains only 2 E’s, but the known cribs require 3. Therefore at least one layer must be substitution. The working model (not proven) is:

PT (73 chars) → substitution cipher → filler insertion (24 chars) → carved text (97 chars)

Under one cipher model (KA-autokey Vigenère), a candidate null palette {B,G,I,K,O,W,Z} was identified at 17 positions (p ≈ 1 in 16,000 under that model). However, matched controls (April 2026) showed this palette is not privileged: nearly every random 7-letter palette produces equal or better cross-model mask agreement. The convergence improvement from palette constraints is a generic combinatorial property, not evidence for these specific letters. The palette identity has been retired as a finding. Palette constraints remain useful as a search regularizer.

The cipher layer remains open regardless of which two-layer model is correct. Periodic keys are proven impossible at all periods under direct positional correspondence. Running keys from 60,230 publicly available English texts (Project Gutenberg, 106 billion offset-checks) produced zero signal. The surviving possibilities include: a running key from a non-public source, a bespoke procedural cipher, or a one-time key.

What Has Been Exhaustively Tested

Over 475 experiments spanning 671.0B+ individual hypothesis evaluations have been run (with overlaps across experiments), eliminating:

73-Character Hypothesis

The carved text has 97 characters. One working model proposes that 24 characters are nulls (97 − 24 = 73 real ciphertext characters). This is a hypothesis, not a proven fact.

The primary quantitative support was the palette restriction (7 of 26 letters at candidate null positions under KA-autokey Vigenère). However, matched controls (April 2026) retired this as evidence: the convergence effect is generic to palette-constrained search, not specific to {B,G,I,K,O,W,Z}. The 73-character model remains an open hypothesis but currently lacks quantitative support beyond Sanborn’s statement that K4 uses two systems.

Note: The number 24 also appears in other K4 contexts (24 crib characters, Berlin Clock has 24 facets, K3 chart has 24 rows). These coincidences are not evidence; many small integers recur naturally. They are documented here only because they are frequently mentioned in community discussions.

The Null Palette (Retired April 2026)

Under KA-autokey Vigenère, simulated annealing identified 17 candidate filler positions using only 7 letters {B,G,I,K,O,W,Z} (p ≈ 1/16,000 under that model). This was initially treated as a key observation.

Matched controls disproved the palette’s specificity. Among 100 random 7-letter palettes, BGIKOWZ ranked near the bottom for cross-model mask agreement (1st percentile). The convergence improvement from palette constraints is generic and largely combinatorial. Any sufficiently restrictive palette produces similar or better cross-model agreement. 76 of 133 single-letter-swap neighbors also outperformed BGIKOWZ.

Palette constraints remain useful as a computational technique (they make mask-inference problems tractable and exhaustively solvable), but BGIKOWZ is not a privileged signal. The Beaufort A=0 keystream enrichment (7/8 palette letters at BCL crib positions, uncorrected p = 0.0006) is a separate Level C observation that is ciphertext-intrinsic under the Beaufort convention but does not survive correction for the project’s full search breadth.

What Remains

Null mask + periodic substitution is proven impossible for ANY choice of 24 null positions at ANY period (algebraic proof extended to CT73 via Bean inequalities across all 11,440 candidate masks). If the 73-character model is correct, the cipher operating on the extracted characters must use a non-periodic key: a running key from an unknown source, a bespoke procedural method, or a one-time key.

Validation Criteria

A candidate solution is not accepted unless it passes all of the following simultaneously:

How to Read Claims on This Site

We classify every claim by its evidence strength. When reading results on this site:

Level A: Proven within stated assumptions
Mathematical proof or complete enumeration. Always conditioned on explicit assumptions (e.g., correct cribs, additive key model). If you disagree with the assumptions, the proof does not apply.
Level B: Exhaustively negative within tested scope
Every configuration in a defined parameter space was tested and produced noise. Does not extend to untested variants or multi-layer combinations. “Ruled out within tested scope” means exactly that.
Level C: Descriptive anomaly
A pattern discovered post-hoc (from the data, not predicted in advance). P-values are uncorrected for search breadth unless stated otherwise. Does not prove how K4 was encrypted.
Level D: Hypothesis
A plausible conjecture without quantitative support. Labeled “hypothesis” or “open question.”

All p-values on this site are uncorrected for the project’s full search breadth (~1000 experiments) unless explicitly stated. Over that many tests, individually “significant” results are expected by chance. We document them for transparency, not as proof.

Truth Taxonomy

Every claim in our database is classified by its evidence level:

[PUBLIC FACT]
Verified by reputable public reporting or primary-source statements.
[DERIVED FACT]
Deterministic consequence of public facts, reproducible by a provided command.
[INTERNAL RESULT]
Empirical result from this project. Includes artifact pointers and a reproduction command.
[HYPOTHESIS]
Plausible claim not yet proven. Includes a test plan.

Confidence Tiers

Tier 1: Mathematical Proof
Algebraic proof that the method cannot produce the known plaintext. Permanently valid unless crib positions or ciphertext transcription are wrong.
Tier 2: Exhaustive Search
Every possible configuration was tested under stated assumptions. Solid for the specific model tested, but does not eliminate multi-layer variants.
Tier 3: Partial/Statistical
Sampling-based or incomplete coverage. May warrant re-testing under different assumptions.
Tier 4: Untested
Never properly tested. Fully open.

Reproducibility

Every elimination includes a reproduction command you can run yourself. The entire codebase is open source. Clone the repo, install Python 3.11+, and run any experiment with PYTHONPATH=src.