Forensic Dossiers
Multi-layer cryptographic evidence of training data provenance. Each dossier combines structural probes, deep crystallography, resonance fingerprinting, behavioral analysis, and genomic correlation — autonomously generated, machine-verifiable.
How to read forensic evidence
This registry presents converging forensic signals across monitored models. Each card combines directional evidence, confidence estimates, similarity fingerprints, and internal activation patterns to help interpret whether a model shows signs consistent with a tracked evidence profile.
Dataset Fingerprinting
METHOD 1Does the model know the exact published text or just the story? Ratio above 1.0 = memorized the specific character sequence.
Memorization Scaling Law
METHOD 2Larger models memorize more copyrighted content. BPC reduction of 10-14% per size increase across three independent families.
Evasion Detection
METHOD 3Does RLHF/instruction tuning remove copyright knowledge? Min-K20% membership inference on base vs. instruct pairs.
Extraction Bypass
CRITICALSimple prompts extract verbatim copyrighted text from safety-trained models. No jailbreaking required.
Cross-Organizational BPC
81 MODELSAverage BPC by text category. Green = low BPC (memorized). Red = high BPC (unfamiliar). Pattern is universal across all organizations.
Completed Dossiers
—Pipeline Queue
—Dataset Genome Project
🚨 Anomalous Familiarity Report
ZK Forensic Certificates
Forensic Methodology
Structural Probe
Sonar v2 latent crystallography — detects domain-specific structural patterns in model embeddings without accessing training data.
Observed SignalDeep Crystallography
Extracts attention layer tensors, measures domain-specific crystallization gaps vs negative controls. Surface vs deep gap analysis reveals training signal.
Observed SignalResonance Probe
Vocabulary-space cluster analysis. Discovers token communities via attention resonance, measures separation from noise floor, and traces assimilation chains.
Observed SignalCDS v4
Calibrated Differential Surprise — measures statistical deviation from reference models on dataset-specific texts with bootstrap CI and neutral calibration.
Derived SimilarityGene Y
Ranking fingerprint correlation. Models trained on shared data produce correlated perplexity rankings. Spearman ρ across Pile-style test texts.
Derived SimilarityDataset Genome
Size-normalized BPC fingerprinting across 13 datasets × 16 models. Z-score normalization removes model-size bias, revealing true training data signal.
Observed SignalZK Forensic Certificate
Pedersen commitments on BPC values, Merkle tree on genome vector, Fiat-Shamir proof, Ed25519 signature. Tamper-proof cryptographic attestation of forensic results.
Cryptographic Proof