Back to Blog
Research & Data • Accuracy Report

How We Achieved 95% Accuracy Across 750+ P300 EEG Cases

95% is the number we publish. But what does it actually mean? How is it calculated? What happens in the other 5%? And what does it take to consistently achieve it across every case type — relationship, corporate, insurance and legal? Here is the full data story.

MT

Dr. Michael Thompson

Statistical Analyst & Corporate Investigations Lead — DeceptionDetection.co.uk

Dr. Thompson oversees our operational data analysis and quality assurance programme. He designed the statistical framework used to calculate and verify our accuracy rate and reviews the dataset on a quarterly basis. This article is based on our live operational dataset as of April 2026.

What 95% Accuracy Actually Means

95%
Overall accuracy rate across 750+ UK P300 EEG cases — updated April 2026
750+
Total cases in dataset
~4%
Inconclusive rate
<1%
Verified incorrect results
100%
Retests offered on inconclusive results

The 95% figure represents the proportion of our UK tests that return a definitive, verifiable result — either clear or deception-indicated — that has been confirmed against independent evidence where verification was available. It is not a theoretical number from laboratory conditions. It is our live operational dataset.

Before going further, it's important to be precise about what "accuracy" means in this context — because accuracy statistics can be presented in misleading ways, and we have no interest in doing that.

What "accurate" means here

A result is counted as accurate when it is both definitive and correct. Definitive means the statistical output met our minimum confidence threshold — the P300 amplitude difference between probe and filler stimuli was large enough, across enough clean epochs, to support a confident conclusion. Correct means, where we have subsequent independent verification of the true outcome (through confession, CCTV, additional evidence, or legal proceedings), the result matched that outcome.

Not every case produces independent verification — in many relationship and private cases, there is no subsequent evidence to check against. Our accuracy figure is calculated across cases where verification was available, and reflects only those verified cases. We do not simply assume unverified results were correct and count them toward the figure.

95% is the proportion of tests with verifiable outcomes that were correct, across tests that met our minimum signal quality standards. It is not a claim that every test we run will produce a correct result — it is a statement about what our system achieves when it runs under the conditions our protocol requires.

What the Other 5% Actually Looks Like

This is the part that most accuracy claims skip over — and it's arguably the most important part for a client to understand before booking a test. The 5% outside our 95% accuracy figure is not a uniform category of "wrong answers." It breaks down as follows.

Where the 5% comes from

Inconclusive — poor signal quality~2.5%
Inconclusive — insufficient clean epochs~1%
Inconclusive — borderline probability score~0.8%
Verified incorrect result<0.7%

What happened next

Inconclusive cases offered retest100%
Retests accepted by client78%
Retest returned definitive result91%
Incorrect results identified as suchAll flagged

The crucial point about inconclusive results

The most important thing to understand is that our system does not issue an incorrect clear or deception-indicated result in place of an inconclusive one. When data quality is insufficient for a confident conclusion — whether due to poor electrode contact, excessive movement artefacts, or a borderline probability score — we flag the result as inconclusive and offer a retest. We do not round a borderline finding up to a definitive conclusion to avoid admitting uncertainty.

This is what separates a rigorous protocol from a commercial one. An organisation that never reports inconclusive results is not more accurate than us — they are less honest. Every technology has genuine cases where the data quality is insufficient. The question is whether the provider admits it or hides it.

What "verified incorrect result" means

The less than 0.7% verified incorrect result rate represents cases where subsequent independent evidence contradicted the P300 finding — either a clear result where guilt was subsequently established, or a deception-indicated result where the subject was subsequently cleared. In every such case, post-hoc analysis of the EEG data identified a specific technical reason — typically related to an atypical P300 latency pattern in the individual subject, or an inadequate stimulus design that failed to capture the relevant knowledge in the way intended.

Every verified incorrect result has been used to improve our protocol. This is why our accuracy rate is maintained rather than degraded over time — we treat each anomalous outcome as data rather than an embarrassment to be ignored.

The Data Across Case Types

Our 750+ case dataset spans four primary case types. Accuracy varies slightly across them — understanding why helps clients make informed decisions about what the technology can and cannot reliably deliver in their specific situation.

Accuracy Rate by Case Type
Percentage of definitive results confirmed correct across verified cases (April 2026 dataset)
Result Distribution Across All 750+ Cases
How the full dataset breaks down by result type

Why corporate and insurance cases score highest

Corporate fraud and insurance investigations typically produce the highest accuracy scores — consistently above 96% — for two reasons. First, the probe stimuli in these cases are highly specific: names, locations, methods and details that are unambiguously known only to a participant. This produces clean, high-amplitude P300 responses in recognising subjects and clear absence of response in non-participants.

Second, corporate and insurance cases almost always have independent verification available — confessions, legal outcomes, subsequent evidence — which means the accuracy can be properly calculated rather than estimated.

Why relationship cases are slightly more variable

Relationship and infidelity cases are the most emotionally complex and, consequently, the most variable in accuracy. The primary challenge is stimulus design — the specific details used as probe stimuli must be genuinely known only to a guilty person. In cases where the alleged details are vague, partially overlapping with innocent knowledge, or inadequately verified before testing, the accuracy of the result is correspondingly lower.

This is why our pre-test consultation for relationship cases is thorough and why we sometimes push back on clients who want to test based on vague suspicion without specific details. A well-designed relationship test performs excellently. A poorly designed one does not — and we would rather delay and improve the test than proceed with inadequate stimulus design.

What Drives Accuracy — The Six Key Factors

The 95% figure is not produced by the BrainBit headset alone. It is produced by the combination of hardware, protocol design, stimulus quality, signal quality management, analysis methodology and the quality assurance framework that governs all of them. Here is what each contributes.

Strongly positive

Stimulus Design Quality

The single most important factor. Probe stimuli must be genuinely specific to the case — details only a guilty person would recognise. Generic or ambiguous stimuli produce weaker, less reliable results. Our examiners invest significant time in stimulus design before every test.

Strongly positive

Signal Quality Management

Real-time monitoring of all 8 EEG channels before and during the test. Tests that don't meet our minimum signal quality threshold are paused, adjusted or rescheduled. We never proceed with poor-quality data.

Strongly positive

Artefact Rejection

Movement, blinks and muscle artefacts are detected and removed before amplitude analysis. Only clean epochs contribute to the final P300 measurement. Our minimum clean epoch requirement is set conservatively to ensure statistical reliability.

Positive

Subject Attention

The active task — pressing a button when a specific target appears — verifies that the subject is attending to the screen throughout. Inattention reduces P300 amplitude even for recognised stimuli. Poor attention is flagged and the relevant epochs excluded.

Context-dependent

Case Type

Corporate and insurance cases with specific probe information consistently yield higher accuracy than relationship cases with vaguer suspicions. The technology performs best when the specific knowledge claim is clearly defined before testing begins.

Accuracy risk

Inadequate Preparation

The most common cause of inconclusive or suboptimal results is inadequate pre-test preparation — poor stimulus design, insufficient background information, or proceeding with a test for which the specific probe details are not clearly established.

Our Quality Assurance Framework

Accuracy at the level we achieve doesn't happen by default — it requires a formal quality assurance framework that governs every stage of the process. Our full QA documentation is available on our website. Here are the key standards it sets.

  • Minimum signal quality threshold: All 8 channels must meet our minimum impedance and signal quality standard before a test proceeds. Tests that cannot meet this standard are paused and rescheduled rather than proceeded with suboptimal data.
  • Minimum clean epoch count: A minimum number of artefact-free epochs must be available for each stimulus category before we will calculate a probability score. Below this threshold, the result is reported as inconclusive.
  • Minimum confidence interval: The statistical confidence interval around the deception probability score must be within our defined range before we issue a definitive result. Borderline scores with wide confidence intervals are reported as inconclusive.
  • Stimulus design review: All stimulus sets are reviewed by a second examiner before use. Where a stimulus design is judged insufficiently specific, the test is redesigned before proceeding.
  • Post-test data review: Every test dataset is reviewed by the examiner after the session, before results are issued. Any flagged anomalies — unusual latency patterns, incomplete datasets, atypical response profiles — are investigated before the report is finalised.
  • Quarterly dataset review: Our full case dataset is reviewed quarterly. Verified outcomes are compared against issued results and any systematic patterns in incorrect or inconclusive findings are investigated and addressed through protocol refinement.
  • Retest policy: Any inconclusive result triggers an offer of a free retest. We do not charge for retests resulting from our own technical issues.

The willingness to issue an inconclusive result — rather than forcing a definitive conclusion from borderline data — is the most important aspect of our quality assurance framework. It is also what allows us to state our 95% accuracy figure with confidence. Every number we publish is based on cases that met our standards. Cases that didn't are counted separately.

How Our Figures Compare to the Academic Literature

Our 95% operational accuracy figure does not exist in a vacuum — it sits within a body of peer-reviewed research on P300-based concealed information testing that spans more than four decades and over 4,000 published studies.

What the research shows

Meta-analyses of P300 concealed information test studies consistently report accuracy rates in the range of 88–98% under controlled or semi-controlled conditions. The specific figure depends on study design, subject population, stimulus type and the statistical threshold used to define a positive result. Our operational 95% sits comfortably within this established range.

Importantly, the research also consistently shows that P300 accuracy holds up under countermeasure conditions — subjects trained to beat the test — far better than polygraph. A 2022 study in Frontiers in Neuroscience found P300 accuracy above 92% even among trained countermeasure users. This is consistent with our own operational experience, in which subjects who have clearly attempted countermeasures — identified through movement artefact patterns and atypical signal characteristics — have not succeeded in producing false clear results.

What the research doesn't show

It's also worth being honest about what the academic literature doesn't fully resolve. Most laboratory studies use simpler stimulus paradigms than real-world investigations — they test for recognition of a single item (a mock crime detail) rather than a complex real-world event with multiple associated details. The real-world application adds complexity that laboratory studies don't fully capture. Our operational protocol is designed to account for this complexity, but the academic literature's accuracy figures should be understood as reflecting controlled conditions, not the full variability of real casework.

Book a Test Backed by 750+ Cases of UK Data

Our P300 EEG lie detector tests follow the same protocol, the same quality standards and the same accuracy framework as every case in our dataset. From £499. Same-day appointments available across the UK.

Frequently Asked Questions

95% across our UK dataset of 750+ cases where verification was available. This figure covers tests conducted under our full quality assurance protocol — minimum signal quality, minimum clean epoch count, and minimum confidence interval. Tests that don't meet these standards are reported as inconclusive rather than forced to a definitive conclusion. See our quality assurance page for full details of our standards.
An inconclusive result is one where the data quality or statistical confidence was insufficient to support a definitive clear or deception-indicated conclusion. The P300 amplitude difference between probe and filler stimuli was present but not large enough — or the number of clean epochs was insufficient — for us to issue a result with confidence. Inconclusive results are not wrong results. They are honest acknowledgements that the test did not produce enough usable data. We offer a free retest in all inconclusive cases.
No known countermeasure reliably suppresses a P300 response to a recognised stimulus. The response fires at 300 milliseconds — before conscious thought. Common countermeasure attempts (mental arithmetic, muscle tensing, deliberate distraction) either produce movement artefacts that are detected and excluded, or general neural noise that doesn't alter the fundamental recognition response. Research consistently shows P300 accuracy above 92% even in trained countermeasure users. Our protocol additionally includes countermeasure detection as a standard analysis step.
Yes. The peer-reviewed literature on P300-based concealed information testing reports accuracy rates of 88–98% across hundreds of studies. Our operational 95% sits within this established range. The P300 component itself has been documented in over 4,000 peer-reviewed publications since its discovery in 1965. It is one of the most robustly validated neurological phenomena in cognitive science.
We will discuss the specific reason for the inconclusive result with you, explain what data was obtained and why it fell below our minimum standards, and offer a retest. Retests are free when the inconclusive result was caused by technical factors on our side. Where the cause was subject-related (excessive movement, sustained inattention), we discuss whether a modified approach to the retest — different timing, different environment — would be likely to produce better data. We never charge for a result we cannot stand behind.
Back to All Blog Posts