Methodology

This page explains how we test theories, score results, and determine whether a cipher approach has been eliminated.

The K4 Ciphertext

K4 consists of 97 characters, and all 26 letters of the English alphabet appear at least once. The ciphertext is shown below with known plaintext positions highlighted:

OBKRUOXOGHULBSOLIFBBWFLRVQQPRNGKSSOTWTQSJQSSEKZZWATJKLUDIAWINFBNYPVTTMZFPKWGDKZXTJCDIGKUHUAUEKCAR

Known plaintext (positions numbered from 0, confirmed by Jim Sanborn):

Positions 21–33: EASTNORTHEAST (13 characters)
Positions 63–73: BERLINCLOCK (11 characters)

These 24 known characters are the foundation of our scoring system. They also reveal the internal key values the cipher uses at those positions, which constrain what methods and keys are possible.

Scoring: Crib Matching (0–24)

For any proposed decryption method and key, we check how many of the 24 known plaintext positions produce the correct letter. This gives a score from 0 to 24.

Score	Classification	Meaning
0–9	NOISE	Expected performance from a random key/method. Not recorded.
10–17	INTERESTING	Above noise floor. Recorded for completeness, but almost certainly coincidental.
18–23	SIGNAL	Statistically unusual. Warrants investigation, but may be a false positive at high periods.
24	FULL MATCH	All 24 cribs match. Requires Bean constraint validation, quadgram analysis, and human review before any claim is made.

These thresholds are conservative. With a random key and a 26-letter alphabet, you’d expect fewer than 1 letter out of 24 to match by pure luck. A score of 6 is already far beyond chance. The SIGNAL threshold at 18 is set high because longer keys can produce misleadingly high scores (see below).

Expected Random Scores by Period

A critical subtlety: how well a random key scores depends on the key’s length. Longer keys have fewer checks per key position, so random matches become more common, inflating scores artificially.

Period	Approx. Expected Random Score	Discriminative?
2–7	∼8 / 24	Yes, meaningful range
8	∼10 / 24	Marginally
13	∼14 / 24	Weak
17	∼17 / 24	No (random matches 17+)
24	∼19 / 24	No (random matches 19+)
26	∼20 / 24	No

These values are approximate, based on how many known-letter positions share each key position at a given key length. The key insight: at key length 17 or above, random keys routinely score above our SIGNAL threshold, making those scores meaningless. Only scores at key length 7 or below are reliable.

Bean Constraints

In 2021, Richard Bean published additional constraints on the K4 keystream, derived from the repeated letter P at positions 27 and 65 of the ciphertext (both of which decrypt to R). This gives:

Equality: The key values at positions 27 and 65 must be equal
242 inequalities: Hundreds of pairs of key positions that must differ, regardless of which cipher variant is used

These constraints hold no matter which cipher variant is used (Vigenère, Beaufort, or Variant Beaufort). Any valid solution must satisfy all of them, provided the cipher uses an additive key model (CT[i] = f(PT[i], K[i]) mod 26). If K4 uses a non-additive cipher (lookup table, physical overlay, or grid-based system), Bean constraints do not apply.

Combined with a counting argument about repeated key values, these constraints show that no repeating key of any length can produce the K4 plaintext under direct positional correspondence (CT[i] → PT[i]) and additive-key assumptions. This is a deterministic proof (Level A), reproducible from the codebase. It does not apply if a transposition layer reorders positions before substitution, or if the cipher is non-additive.

Two-System Model

Jim Sanborn stated at the Kryptos dedication that “there are two systems of enciphering the bottom text.” Scheidt stated that he “masked the English language so it’s more of a challenge” and that solvers need to “solve the technique first and then go for the puzzle.” The “two systems” quote is public evidence. Any specific interpretation of that quote is a hypothesis, not a fact.

Pure transposition is independently impossible: the ciphertext contains only 2 E’s, but the known cribs require 3. Therefore at least one layer must be substitution. Beyond that, the architecture remains open. Current live project surfaces include:

Layered classical models: heavily tested in bounded structured families, but not globally exhausted.
Procedural or physical constructions: still live, but only as explicit, finite, testable procedures.
W-delimiter segmentation: the five carved W positions explain the old width-21 anomaly. As of April 2026 the single-layer construction is saturated (80+ tested, no signal) and is now admissible only as a multi-layer component; the interpretation is otherwise still open.
Null extraction: still possible in principle, but the old score-conditioned null-palette result is retired and cannot be cited as evidence.

The April 2026 audit explicitly rejected treating any single two-system story as the project’s established model. The correct public stance is: K4 likely involves a technique beyond straightforward single-layer classical encryption, but the exact composition is unknown.

What Has Been Exhaustively Tested

Over 522 experiments spanning 671.1B+ individual hypothesis evaluations have been run (with overlaps across experiments), eliminating:

All repeating-key ciphers (every variant, both alphabets, all key lengths): proven impossible under direct positional correspondence and additive-key assumptions (Level A)
All self-keying ciphers (every variant, all primers): proven structurally impossible under additive-key assumptions (Level A)
Many structured substitution + rearrangement combinations (~1.2 billion evaluations): 14 rearrangement families under the registered bounded search programs
All Cold War-era cipher models (VIC family): extensively tested across multiple variants
All letter-pair ciphers (Four-Square, Playfair, Two-Square): apparent high scores are overfitting artifacts
Every specialized cipher we could find: Gromark (8.74 billion keys), plus Feistel, Gronsfeld, Porta, and more
Running key from 60,000+ public texts (106 billion position-checks): zero signal. A narrower follow-up (April 2026) found that running-key × columnar widths 6/8/9 is blocked by the current 242-inequality Bean constraint set regardless of source text; other transposition families and non-English sources are not covered by that result.
Two-layer compositions tested (105,692 branches): additive × transposition, transposition × periodic substitution, 6 stateful families — zero Bean passes, max crib score 6/24 within the registered layer families and default keyword sets
Three-layer non-columnar compositions tested (838,350 branches, April 2026): {additive, Vig, Beau} × {myszkowski, rail fence, route, block transposition} × {additive, Vig, Beau} — max crib score 7/24. This covers the enumerated layer registry with default parameter generators; non-registered outer families (e.g., homophonic, bifid, four-square as composition outer) are not included.
All rearrangements of the 73-character text: 4.5 million rearrangements tested with each cipher variant
Sculpture reading paths as keys: 10,777 paths tested, all noise
Grid-position-based keys: key derived from position on the grid, zero signal
Morse code hypotheses: multiple Morse-based encoding schemes, all noise

73-Character Hypothesis

The carved text has 97 characters. One working model proposes that 24 characters are nulls (97 − 24 = 73 real ciphertext characters). This is a hypothesis, not a proven fact.

The strongest current public support is not the old null-palette result; that evidence was retired in April 2026. The cleaner live structural observation is that the five carved Ws create a bounded segmentation hypothesis. Even that does not prove nulls. The 73-character idea remains open, but it currently lacks independent, model-neutral quantitative support.

Note: The number 24 also appears in other K4 contexts (24 crib characters, Berlin Clock has 24 facets, K3 chart has 24 rows). These coincidences are not evidence; many small integers recur naturally. They are documented here only because they are frequently mentioned in community discussions.

The Null Palette (Retired April 2026)

A score-conditioned null-palette result was once treated as a key observation. It is retained in the repo only as a cautionary historical case.

Matched controls disproved the palette’s specificity. The apparent convergence advantage was generic to restrictive palette-constrained search and did not justify treating that letter set as real evidence about K4.

Palette constraints remain useful as a computational technique in some search programs, but the site no longer treats any retired palette identity as a cryptographic clue.

What Remains

Null mask + periodic substitution is proven impossible for ANY choice of 24 filler positions at every repeat length from 1 to 23. The algebraic argument depends only on how many filler letters fall in each of the three known-plaintext segments, and every possible split fails. (Repeat lengths 24–26 are too long to be decided either way on a 73-letter text.) If the 73-character model is correct, the cipher operating on the extracted characters must use a non-periodic key: a running key from an unknown source, a bespoke procedural method, or a one-time key.

On 2026-04-08 we ran an adversarial internal audit of the scope of our own testing and reclassified the frontier into “testable now” (bounded, reproducible campaigns we can run), “weakly testable” (requires better detection apparatus, not more compute), and “untestable under current clues” (requires new primary-source evidence). We are aware that classical cipher space is infinite and cannot be literally exhausted; this classification describes the scope of what we have tested under our specific assumptions, not a claim about K4 as a mathematical object. Full audit and record are in the internal status audit (docs/exhaustion_audit_2026_04_08.md in the research repository).

If you have an idea we have not tested, the Submit a Theory page is the direct path. We want to be wrong about anything we have classified as ruled out.

Validation Criteria

A candidate solution is not accepted unless it passes all of the following simultaneously:

Crib score: 24/24
Bean constraints: PASS
Text quality: letter patterns must match normal English (measured by how common its 4-letter sequences are)
Letter frequency: must match the statistical profile of English text
Readability: must produce meaningful English with recognizable words (human review required)

How to Read Claims on This Site

We classify every claim by its evidence strength. When reading results on this site:

Level A: Proven within stated assumptions: Mathematical proof or complete enumeration. Always conditioned on explicit assumptions (e.g., correct cribs, additive key model). If you disagree with the assumptions, the proof does not apply.
Level B: Exhaustively negative within tested scope: Every configuration in a defined parameter space was tested and produced noise. Does not extend to untested variants or multi-layer combinations. “Ruled out within tested scope” means exactly that.
Level C: Descriptive anomaly: A pattern discovered post-hoc (from the data, not predicted in advance). P-values are uncorrected for search breadth unless stated otherwise. Does not prove how K4 was encrypted.
Level D: Hypothesis: A plausible conjecture without quantitative support. Labeled “hypothesis” or “open question.”

All p-values on this site are uncorrected for the project’s full search breadth (~1000 experiments) unless explicitly stated. Over that many tests, individually “significant” results are expected by chance. We document them for transparency, not as proof.

Truth Taxonomy

Every claim in our database is classified by its evidence level:

[PUBLIC FACT]: Verified by reputable public reporting or primary-source statements.
[DERIVED FACT]: Deterministic consequence of public facts, reproducible by a provided command.
[INTERNAL RESULT]: Empirical result from this project. Includes artifact pointers and a reproduction command.
[HYPOTHESIS]: Plausible claim not yet proven. Includes a test plan.

Confidence Tiers

Tier 1: Mathematical Proof: Algebraic proof that the method cannot produce the known plaintext. Permanently valid unless crib positions or ciphertext transcription are wrong.
Tier 2: Exhaustive Search: Every possible configuration was tested under stated assumptions. Solid for the specific model tested, but does not eliminate multi-layer variants.
Tier 3: Partial/Statistical: Sampling-based or incomplete coverage. May warrant re-testing under different assumptions.
Tier 4: Untested: Never properly tested. Fully open.

Reproducibility

Every elimination includes a reproduction command you can run yourself. The entire codebase is open source. Clone the repo, install Python 3.11+, and run any experiment with PYTHONPATH=src.