Live Forensic Intelligence

Forensic Dossiers

Multi-layer cryptographic evidence of training data provenance. Each dossier combines structural probes, deep crystallography, resonance fingerprinting, behavioral analysis, and genomic correlation — autonomously generated, machine-verifiable.

How to read forensic evidence

This registry presents converging forensic signals across monitored models. Each card combines directional evidence, confidence estimates, similarity fingerprints, and internal activation patterns to help interpret whether a model shows signs consistent with a tracked evidence profile.

Dossiers Complete
In Progress
Evidence Layers
Crystallography
Resonance Probes
Families Covered

Pipeline Progress

Completed Dossiers

Each card presents converging evidence — from interpretive verdict to underlying signal layers.
⚠ CDS Recalculation Notice: CDS (Calibrated Differential Surprise) and Gene Y scores shown in dossier cards are being recalculated with validated Pile test texts (v4). Current values are preliminary and may change. Dataset Genome, Gotcha Report, and ZK Certificates sections below use independently verified data.

Pipeline Queue

Models awaiting completion of one or more forensic analysis stages.

Dataset Genome Project

Cross-model × cross-dataset forensic fingerprinting. Each cell shows size-normalized familiarity (0–100). Higher = model disproportionately familiar with that dataset.

🚨 Anomalous Familiarity Report

Forensic evidence of training data familiarity patterns that are not explained by developers' declared training sources. Each finding is backed by BPC measurements on 1,128 real text excerpts, size-normalized z-scores, and cross-family validation.
Methodology Note: These findings show statistical anomalies in model behavior, not definitive proof of training data inclusion. High familiarity with a dataset may result from: (a) undisclosed training data, (b) content overlap between web datasets, (c) knowledge distillation from larger models, or (d) emergent generalization. We present the evidence and let readers draw conclusions. All measurements are reproducible from published model weights and public dataset excerpts.

ZK Forensic Certificates

Cryptographically signed certificates for each model. Pedersen commitments (secp256k1) on BPC values, Merkle tree on genome vector, Fiat-Shamir proof of knowledge, Ed25519 signature.

Forensic Methodology

Five independent analysis engines converge on each model. No single signal determines a verdict.

Structural Probe

Sonar v2 latent crystallography — detects domain-specific structural patterns in model embeddings without accessing training data.

Observed Signal

Deep Crystallography

Extracts attention layer tensors, measures domain-specific crystallization gaps vs negative controls. Surface vs deep gap analysis reveals training signal.

Observed Signal

Resonance Probe

Vocabulary-space cluster analysis. Discovers token communities via attention resonance, measures separation from noise floor, and traces assimilation chains.

Observed Signal

CDS v4

Calibrated Differential Surprise — measures statistical deviation from reference models on dataset-specific texts with bootstrap CI and neutral calibration.

Derived Similarity

Gene Y

Ranking fingerprint correlation. Models trained on shared data produce correlated perplexity rankings. Spearman ρ across Pile-style test texts.

Derived Similarity

Dataset Genome

Size-normalized BPC fingerprinting across 13 datasets × 16 models. Z-score normalization removes model-size bias, revealing true training data signal.

Observed Signal

ZK Forensic Certificate

Pedersen commitments on BPC values, Merkle tree on genome vector, Fiat-Shamir proof, Ed25519 signature. Tamper-proof cryptographic attestation of forensic results.

Cryptographic Proof

Verification Stack

Sonar v2 Deep Crystallography Resonance Probe CDS v4 Gene Y Dataset Genome ZK Forensic Certificate