671.0B+ configurations evaluated across recorded experiments

Methodology

This page explains how we test theories, score results, and determine whether a cipher approach has been eliminated.

The K4 Ciphertext

K4 consists of 97 characters, and all 26 letters of the English alphabet appear at least once. The ciphertext is shown below with known plaintext positions highlighted:

OBKRUOXOGHULBSOLIFBBWFLRVQQPRNGKSSOTWTQSJQSSEKZZWATJKLUDIAWINFBNYPVTTMZFPKWGDKZXTJCDIGKUHUAUEKCAR

Known plaintext (positions numbered from 0, confirmed by Jim Sanborn):

These 24 known characters are the foundation of our scoring system. They also reveal the internal key values the cipher uses at those positions, which constrain what methods and keys are possible.

Scoring: Crib Matching (0–24)

For any proposed decryption method and key, we check how many of the 24 known plaintext positions produce the correct letter. This gives a score from 0 to 24.

ScoreClassificationMeaning
0–9NOISEExpected performance from a random key/method. Not recorded.
10–17INTERESTINGAbove noise floor. Recorded for completeness, but almost certainly coincidental.
18–23SIGNALStatistically unusual. Warrants investigation, but may be a false positive at high periods.
24FULL MATCHAll 24 cribs match. Requires Bean constraint validation, quadgram analysis, and human review before any claim is made.

These thresholds are conservative. With a random key and a 26-letter alphabet, you’d expect fewer than 1 letter out of 24 to match by pure luck. A score of 6 is already far beyond chance. The SIGNAL threshold at 18 is set high because longer keys can produce misleadingly high scores (see below).

Expected Random Scores by Period

A critical subtlety: how well a random key scores depends on the key’s length. Longer keys have fewer checks per key position, so random matches become more common, inflating scores artificially.

PeriodApprox. Expected Random ScoreDiscriminative?
2–7∼8 / 24Yes, meaningful range
8∼10 / 24Marginally
13∼14 / 24Weak
17∼17 / 24No (random matches 17+)
24∼19 / 24No (random matches 19+)
26∼20 / 24No

These values are approximate, based on how many known-letter positions share each key position at a given key length. The key insight: at key length 17 or above, random keys routinely score above our SIGNAL threshold, making those scores meaningless. Only scores at key length 7 or below are reliable.

Bean Constraints

In 2021, Richard Bean published additional constraints on the K4 keystream, derived from the repeated letter P at positions 27 and 65 of the ciphertext (both of which decrypt to R). This gives:

These constraints hold no matter which cipher variant is used (Vigenère, Beaufort, or Variant Beaufort). Any valid solution must satisfy all of them, provided the cipher uses an additive key model (CT[i] = f(PT[i], K[i]) mod 26). If K4 uses a non-additive cipher (lookup table, physical overlay, or grid-based system), Bean constraints do not apply.

Combined with a counting argument about repeated key values, these constraints show that no repeating key of any length can produce the K4 plaintext under direct positional correspondence (CT[i] → PT[i]) and additive-key assumptions. This is a deterministic proof (Level A), reproducible from the codebase. It does not apply if a transposition layer reorders positions before substitution, or if the cipher is non-additive.

Two-System Model

Jim Sanborn stated at the Kryptos dedication that “there are two systems of enciphering the bottom text.” Scheidt stated that he “masked the English language so it’s more of a challenge” and that solvers need to “solve the technique first and then go for the puzzle.” The “two systems” quote is public evidence. Any specific interpretation of that quote is a hypothesis, not a fact.

Pure transposition is independently impossible: the ciphertext contains only 2 E’s, but the known cribs require 3. Therefore at least one layer must be substitution. Beyond that, the architecture remains open. Current live project surfaces include:

The April 2026 audit explicitly rejected treating any single two-system story as the project’s established model. The correct public stance is: K4 likely involves a technique beyond straightforward single-layer classical encryption, but the exact composition is unknown.

What Has Been Exhaustively Tested

Over 502 experiments spanning 671.0B+ individual hypothesis evaluations have been run (with overlaps across experiments), eliminating:

73-Character Hypothesis

The carved text has 97 characters. One working model proposes that 24 characters are nulls (97 − 24 = 73 real ciphertext characters). This is a hypothesis, not a proven fact.

The strongest current public support is not the old null-palette result; that evidence was retired in April 2026. The cleaner live structural observation is that the five carved Ws create a bounded segmentation hypothesis. Even that does not prove nulls. The 73-character idea remains open, but it currently lacks independent, model-neutral quantitative support.

Note: The number 24 also appears in other K4 contexts (24 crib characters, Berlin Clock has 24 facets, K3 chart has 24 rows). These coincidences are not evidence; many small integers recur naturally. They are documented here only because they are frequently mentioned in community discussions.

The Null Palette (Retired April 2026)

A score-conditioned null-palette result was once treated as a key observation. It is retained in the repo only as a cautionary historical case.

Matched controls disproved the palette’s specificity. The apparent convergence advantage was generic to restrictive palette-constrained search and did not justify treating that letter set as real evidence about K4.

Palette constraints remain useful as a computational technique in some search programs, but the site no longer treats any retired palette identity as a cryptographic clue.

What Remains

Null mask + periodic substitution is proven impossible for ANY choice of 24 null positions at ANY period (algebraic proof extended to CT73 via Bean inequalities across all 11,440 candidate masks). If the 73-character model is correct, the cipher operating on the extracted characters must use a non-periodic key: a running key from an unknown source, a bespoke procedural method, or a one-time key.

On 2026-04-08 we ran an adversarial internal audit of the scope of our own testing and reclassified the frontier into “testable now” (bounded, reproducible campaigns we can run), “weakly testable” (requires better detection apparatus, not more compute), and “untestable under current clues” (requires new primary-source evidence). We are aware that classical cipher space is infinite and cannot be literally exhausted; this classification describes the scope of what we have tested under our specific assumptions, not a claim about K4 as a mathematical object. Full audit and record are in the internal status audit.

If you have an idea we have not tested, the Submit a Theory page is the direct path. We want to be wrong about anything we have classified as ruled out.

Validation Criteria

A candidate solution is not accepted unless it passes all of the following simultaneously:

How to Read Claims on This Site

We classify every claim by its evidence strength. When reading results on this site:

Level A: Proven within stated assumptions
Mathematical proof or complete enumeration. Always conditioned on explicit assumptions (e.g., correct cribs, additive key model). If you disagree with the assumptions, the proof does not apply.
Level B: Exhaustively negative within tested scope
Every configuration in a defined parameter space was tested and produced noise. Does not extend to untested variants or multi-layer combinations. “Ruled out within tested scope” means exactly that.
Level C: Descriptive anomaly
A pattern discovered post-hoc (from the data, not predicted in advance). P-values are uncorrected for search breadth unless stated otherwise. Does not prove how K4 was encrypted.
Level D: Hypothesis
A plausible conjecture without quantitative support. Labeled “hypothesis” or “open question.”

All p-values on this site are uncorrected for the project’s full search breadth (~1000 experiments) unless explicitly stated. Over that many tests, individually “significant” results are expected by chance. We document them for transparency, not as proof.

Truth Taxonomy

Every claim in our database is classified by its evidence level:

[PUBLIC FACT]
Verified by reputable public reporting or primary-source statements.
[DERIVED FACT]
Deterministic consequence of public facts, reproducible by a provided command.
[INTERNAL RESULT]
Empirical result from this project. Includes artifact pointers and a reproduction command.
[HYPOTHESIS]
Plausible claim not yet proven. Includes a test plan.

Confidence Tiers

Tier 1: Mathematical Proof
Algebraic proof that the method cannot produce the known plaintext. Permanently valid unless crib positions or ciphertext transcription are wrong.
Tier 2: Exhaustive Search
Every possible configuration was tested under stated assumptions. Solid for the specific model tested, but does not eliminate multi-layer variants.
Tier 3: Partial/Statistical
Sampling-based or incomplete coverage. May warrant re-testing under different assumptions.
Tier 4: Untested
Never properly tested. Fully open.

Reproducibility

Every elimination includes a reproduction command you can run yourself. The entire codebase is open source. Clone the repo, install Python 3.11+, and run any experiment with PYTHONPATH=src.