What 95% Accuracy Actually Means
The 95% figure represents the proportion of our UK tests that return a definitive, verifiable result — either clear or deception-indicated — that has been confirmed against independent evidence where verification was available. It is not a theoretical number from laboratory conditions. It is our live operational dataset.
Before going further, it's important to be precise about what "accuracy" means in this context — because accuracy statistics can be presented in misleading ways, and we have no interest in doing that.
What "accurate" means here
A result is counted as accurate when it is both definitive and correct. Definitive means the statistical output met our minimum confidence threshold — the P300 amplitude difference between probe and filler stimuli was large enough, across enough clean epochs, to support a confident conclusion. Correct means, where we have subsequent independent verification of the true outcome (through confession, CCTV, additional evidence, or legal proceedings), the result matched that outcome.
Not every case produces independent verification — in many relationship and private cases, there is no subsequent evidence to check against. Our accuracy figure is calculated across cases where verification was available, and reflects only those verified cases. We do not simply assume unverified results were correct and count them toward the figure.
95% is the proportion of tests with verifiable outcomes that were correct, across tests that met our minimum signal quality standards. It is not a claim that every test we run will produce a correct result — it is a statement about what our system achieves when it runs under the conditions our protocol requires.
What the Other 5% Actually Looks Like
This is the part that most accuracy claims skip over — and it's arguably the most important part for a client to understand before booking a test. The 5% outside our 95% accuracy figure is not a uniform category of "wrong answers." It breaks down as follows.
Where the 5% comes from
What happened next
The crucial point about inconclusive results
The most important thing to understand is that our system does not issue an incorrect clear or deception-indicated result in place of an inconclusive one. When data quality is insufficient for a confident conclusion — whether due to poor electrode contact, excessive movement artefacts, or a borderline probability score — we flag the result as inconclusive and offer a retest. We do not round a borderline finding up to a definitive conclusion to avoid admitting uncertainty.
This is what separates a rigorous protocol from a commercial one. An organisation that never reports inconclusive results is not more accurate than us — they are less honest. Every technology has genuine cases where the data quality is insufficient. The question is whether the provider admits it or hides it.
What "verified incorrect result" means
The less than 0.7% verified incorrect result rate represents cases where subsequent independent evidence contradicted the P300 finding — either a clear result where guilt was subsequently established, or a deception-indicated result where the subject was subsequently cleared. In every such case, post-hoc analysis of the EEG data identified a specific technical reason — typically related to an atypical P300 latency pattern in the individual subject, or an inadequate stimulus design that failed to capture the relevant knowledge in the way intended.
Every verified incorrect result has been used to improve our protocol. This is why our accuracy rate is maintained rather than degraded over time — we treat each anomalous outcome as data rather than an embarrassment to be ignored.
The Data Across Case Types
Our 750+ case dataset spans four primary case types. Accuracy varies slightly across them — understanding why helps clients make informed decisions about what the technology can and cannot reliably deliver in their specific situation.
Why corporate and insurance cases score highest
Corporate fraud and insurance investigations typically produce the highest accuracy scores — consistently above 96% — for two reasons. First, the probe stimuli in these cases are highly specific: names, locations, methods and details that are unambiguously known only to a participant. This produces clean, high-amplitude P300 responses in recognising subjects and clear absence of response in non-participants.
Second, corporate and insurance cases almost always have independent verification available — confessions, legal outcomes, subsequent evidence — which means the accuracy can be properly calculated rather than estimated.
Why relationship cases are slightly more variable
Relationship and infidelity cases are the most emotionally complex and, consequently, the most variable in accuracy. The primary challenge is stimulus design — the specific details used as probe stimuli must be genuinely known only to a guilty person. In cases where the alleged details are vague, partially overlapping with innocent knowledge, or inadequately verified before testing, the accuracy of the result is correspondingly lower.
This is why our pre-test consultation for relationship cases is thorough and why we sometimes push back on clients who want to test based on vague suspicion without specific details. A well-designed relationship test performs excellently. A poorly designed one does not — and we would rather delay and improve the test than proceed with inadequate stimulus design.
What Drives Accuracy — The Six Key Factors
The 95% figure is not produced by the BrainBit headset alone. It is produced by the combination of hardware, protocol design, stimulus quality, signal quality management, analysis methodology and the quality assurance framework that governs all of them. Here is what each contributes.
Stimulus Design Quality
The single most important factor. Probe stimuli must be genuinely specific to the case — details only a guilty person would recognise. Generic or ambiguous stimuli produce weaker, less reliable results. Our examiners invest significant time in stimulus design before every test.
Signal Quality Management
Real-time monitoring of all 8 EEG channels before and during the test. Tests that don't meet our minimum signal quality threshold are paused, adjusted or rescheduled. We never proceed with poor-quality data.
Artefact Rejection
Movement, blinks and muscle artefacts are detected and removed before amplitude analysis. Only clean epochs contribute to the final P300 measurement. Our minimum clean epoch requirement is set conservatively to ensure statistical reliability.
Subject Attention
The active task — pressing a button when a specific target appears — verifies that the subject is attending to the screen throughout. Inattention reduces P300 amplitude even for recognised stimuli. Poor attention is flagged and the relevant epochs excluded.
Case Type
Corporate and insurance cases with specific probe information consistently yield higher accuracy than relationship cases with vaguer suspicions. The technology performs best when the specific knowledge claim is clearly defined before testing begins.
Inadequate Preparation
The most common cause of inconclusive or suboptimal results is inadequate pre-test preparation — poor stimulus design, insufficient background information, or proceeding with a test for which the specific probe details are not clearly established.
Our Quality Assurance Framework
Accuracy at the level we achieve doesn't happen by default — it requires a formal quality assurance framework that governs every stage of the process. Our full QA documentation is available on our website. Here are the key standards it sets.
- Minimum signal quality threshold: All 8 channels must meet our minimum impedance and signal quality standard before a test proceeds. Tests that cannot meet this standard are paused and rescheduled rather than proceeded with suboptimal data.
- Minimum clean epoch count: A minimum number of artefact-free epochs must be available for each stimulus category before we will calculate a probability score. Below this threshold, the result is reported as inconclusive.
- Minimum confidence interval: The statistical confidence interval around the deception probability score must be within our defined range before we issue a definitive result. Borderline scores with wide confidence intervals are reported as inconclusive.
- Stimulus design review: All stimulus sets are reviewed by a second examiner before use. Where a stimulus design is judged insufficiently specific, the test is redesigned before proceeding.
- Post-test data review: Every test dataset is reviewed by the examiner after the session, before results are issued. Any flagged anomalies — unusual latency patterns, incomplete datasets, atypical response profiles — are investigated before the report is finalised.
- Quarterly dataset review: Our full case dataset is reviewed quarterly. Verified outcomes are compared against issued results and any systematic patterns in incorrect or inconclusive findings are investigated and addressed through protocol refinement.
- Retest policy: Any inconclusive result triggers an offer of a free retest. We do not charge for retests resulting from our own technical issues.
The willingness to issue an inconclusive result — rather than forcing a definitive conclusion from borderline data — is the most important aspect of our quality assurance framework. It is also what allows us to state our 95% accuracy figure with confidence. Every number we publish is based on cases that met our standards. Cases that didn't are counted separately.
How Our Figures Compare to the Academic Literature
Our 95% operational accuracy figure does not exist in a vacuum — it sits within a body of peer-reviewed research on P300-based concealed information testing that spans more than four decades and over 4,000 published studies.
What the research shows
Meta-analyses of P300 concealed information test studies consistently report accuracy rates in the range of 88–98% under controlled or semi-controlled conditions. The specific figure depends on study design, subject population, stimulus type and the statistical threshold used to define a positive result. Our operational 95% sits comfortably within this established range.
Importantly, the research also consistently shows that P300 accuracy holds up under countermeasure conditions — subjects trained to beat the test — far better than polygraph. A 2022 study in Frontiers in Neuroscience found P300 accuracy above 92% even among trained countermeasure users. This is consistent with our own operational experience, in which subjects who have clearly attempted countermeasures — identified through movement artefact patterns and atypical signal characteristics — have not succeeded in producing false clear results.
What the research doesn't show
It's also worth being honest about what the academic literature doesn't fully resolve. Most laboratory studies use simpler stimulus paradigms than real-world investigations — they test for recognition of a single item (a mock crime detail) rather than a complex real-world event with multiple associated details. The real-world application adds complexity that laboratory studies don't fully capture. Our operational protocol is designed to account for this complexity, but the academic literature's accuracy figures should be understood as reflecting controlled conditions, not the full variability of real casework.
Book a Test Backed by 750+ Cases of UK Data
Our P300 EEG lie detector tests follow the same protocol, the same quality standards and the same accuracy framework as every case in our dataset. From £499. Same-day appointments available across the UK.