Credibility assessment

Assessing the Credibility of Testimony: Cognitive and Forensic Perspectives

Evaluating testimonial credibility is a central task in criminal justice systems, particularly in cases where evidence is scarce and judicial decisions rely primarily on statements from victims, witnesses, or suspects. Despite its importance, assessing the veracity of testimony remains highly challenging, as judges and jurors are generally poor at distinguishing truthful from deceptive accounts and tend to rely on subjective indicators that are not reliably linked to accuracy (Brewer et al., 1999; DePaulo et al., 2003; Porter & ten Brinke, 2009). Research consistently shows that credibility judgments are often shaped by perceived cues such as detail richness, coherence, and consistency, even though these cues do not always correspond to actual truthfulness (Peace et al., 2015).

A robust finding in this field is that statement content influences perceived credibility. Testimonies rich in detail are typically judged as more believable than sparse accounts, whereas inconsistencies tend to undermine credibility assessments (Brewer et al., 1999). However, such judgments are frequently confounded by factors unrelated to veracity, including the decision-maker’s expectations, beliefs about deception cues, and individual differences such as fantasy proneness (Peace et al., 2015).

There is no reliable set of behavioral cues that unequivocally distinguishes truth from lies, and both professionals and laypersons typically perform at chance in deception detection (Bond & Uysal, 2007; Vrij, 2008). This has led to a growing emphasis on the systematic analysis of statement content rather than on nonverbal behavior.

Criteria-Based Content Analysis (CBCA) is the most extensively researched and empirically supported content-based method for assessing testimonial credibility. CBCA is grounded in the so-called “Undeutsch hypothesis,” which posits that statements derived from memories of self-experienced events differ qualitatively from fabricated accounts (Undeutsch, 1967, as cited in Steller & Köhnken, 1989). According to this framework, truthful statements are expected to contain more specific, contextualized, and experiential details than fabricated narratives, which are typically constructed from generalized cognitive scripts (Steller & Köhnken, 1989). Empirical research has shown that several CBCA criteria—such as the quantity of details, unstructured production, contextual embedding, and the reproduction of conversations—are more prevalent in genuine than in fabricated accounts (Vrij, 2008).

However, CBCA was never intended to serve as a standalone diagnostic tool. Köhnken and Steller emphasized that content quality is influenced not only by veracity but also by factors such as the nature of the event, the witness’s cognitive abilities, and interview conditions. Consequently, CBCA must be applied within the broader framework of Statement Validity Assessment (SVA), which incorporates case-file analysis and the evaluation of alternative hypotheses (Köhnken & Steller, 1988).

Another influential theoretical approach to credibility assessment is Reality Monitoring, which focuses on differences between memories of perceived events and memories of imagined or internally generated events (Johnson & Raye, 1981). Reality Monitoring research suggests that memories of real experiences contain more sensory, contextual, and affective information, whereas imagined events are characterized by greater cognitive operations and reflective processes. Empirical studies have supported these distinctions, showing that perceived events are typically richer in sensory and contextual details than imagined ones (Johnson et al., 1988; Kealy et al., 2006).

Research findings also indicate that truthful and fabricated trauma narratives differ not only in externally observable features but also in how individuals subjectively experience and describe their memories (Porter & Peace, 2007). Fabricated accounts tend to exaggerate emotional intensity and vividness, reflecting strategic self-presentation rather than genuine recollection.

A critical challenge for credibility assessment arises in distinguishing truthful statements from accounts based on false memories. Unlike deliberate fabrications, false memories are subjectively experienced and deemed true by the person reporting them, resulting in a psychological status that closely resembles that of genuine witnesses (Volbert & Steller, 2014). Once full-scale false memories have developed, content-based criteria may no longer reliably differentiate them from experience-based accounts (Bruck & Ceci, 2009; Blandón-Gitlin et al., 2009). Consequently, high content quality alone cannot be taken as evidence of truthfulness when suggestive influences are present.

Given these findings, contemporary credibility assessment emphasizes not only the qualitative characteristics of a statement but also the manner in which it was formed and has evolved over time. True memories typically emerge spontaneously and remain bounded by the original experience, whereas false memories often appear discontinuously and may continue to develop through repeated questioning or cognitive elaboration (Volbert & Steller, 2014).

Criteria for the Assessment of Statement Validity

Köhnken and Steller developed the Content Criteria for Statement Analysis (CBCA) as a core component of Statement Validity Assessment. These criteria were initially designed to assess the reliability of children’s testimony, particularly in cases of alleged sexual victimization, and were later applied to adolescents and adults. Their introduction marked a transition from unstructured, experience-based judgments by legal practitioners to a more evidence-informed, transparent method of credibility assessment (Köhnken & Steller, 1988).

The original CBCA system comprises 19 criteria, grouped into several conceptual categories. These criteria are not intended to serve as direct indicators of truth or deception but rather as probabilistic signs that a statement is more likely to be grounded in actual experience than in fabrication.

19 Criteria of the CBCA system

I. General characteristics

1- Logical structure

2- Unstructured production

3- Quantity of details

II. Specific content

4- Contextual embedding

5- Descriptions of interactions

6- Reproduction of conversations

7- Unexpected complications during the incident

III. Peculiarities of content

8- Unusual details

9- Superfluous details

10- Accurately reported details misunderstood

11- Related external associations

12- Accounts of subjective mental state

13- Attribution of perpetrator’s mental state

IV. Motivation-related contents

14- Spontaneous corrections

15- Admitting lack of memory

16- Raising doubts about one’s own testimony

17- Self-deprecation

18- Pardoning the perpetrator

V. Offense-specific elements

19- Details characteristics of the offense

The first group, general characteristics, concerns the overall structure and richness of the statement. A logical structure refers to the narrative's internal coherence: even if not perfectly organized, truthful statements typically follow a comprehensible sequence and lack fundamental contradictions. Unstructured production captures the spontaneous, often non-linear manner in which genuine memories are reported, reflecting natural recall processes rather than rehearsed storytelling. Finally, quality of details refers to the presence of concrete, specific, and experiential information, which tends to be more pronounced in experience-based accounts.

The second group, specific contents, addresses how events are embedded and described. Contextual embedding refers to integrating the event within a broader temporal, spatial, or situational context, such as what happened before or after the incident. Descriptions of interactions involve reciprocal actions and reactions between individuals, rather than isolated or schematic descriptions. Reproducing conversations, whether verbatim or paraphrased, suggests episodic memory retrieval and is less common in fabricated accounts. Unexpected complications during the incident—such as interruptions, mistakes, or unplanned changes—are also considered indicative of genuine experiences, as they are difficult to invent convincingly and often deviate from stereotypical scripts.

The third category, peculiarities of the content, includes elements that do not necessarily advance the core allegation but nonetheless support experiential grounding. Unusual details are idiosyncratic or odd information that lacks an obvious functional purpose. Superfluous details are peripheral elements that appear irrelevant to the main event but reflect natural memory processes. Accurately reported details misunderstood are particularly relevant in child testimony and occur when individuals correctly describe aspects of an event they do not fully comprehend, something that is virtually impossible to fabricate beyond one’s own level of understanding. Related external associations involve spontaneous links to other experiences or memories triggered by the event. Finally, accounts of subjective mental state and of the perpetrator’s mental state refer to descriptions of one’s own thoughts and emotions, as well as inferences about the emotions or intentions of others during the incident.

The fourth group, motivation-related content, focuses on features that run counter to strategic self-presentation. Spontaneous corrections occur when witnesses revise or clarify their statements without external prompting. Admitting a lack of memory reflects a willingness to acknowledge uncertainty or gaps in recall. Raising doubts about one’s own testimony involves explicit expressions of uncertainty about accuracy. Self-deprecation refers to negative self-evaluations, while pardoning the perpetrator involves statements that mitigate or excuse the alleged offender’s behavior. These elements are considered counterintuitive in fabricated accounts, as they may appear to undermine the witness’s credibility.

The final criterion, details characteristic of the offense, captures information typical of the alleged criminal event but unlikely to be known without direct experience, especially in cases involving children or uncommon offenses.

Köhnken and Steller explicitly emphasized that these criteria were never intended to be exhaustive or mechanically applied. Rather, they were proposed as examples of content characteristics that could inform two central diagnostic questions: whether a witness could produce a statement with this specific content quality without having experienced the event, and whether a witness would produce such a statement if the account were not experience-based (Steller & Köhnken, 1989). Importantly, the presence or absence of individual criteria must always be interpreted in light of the witness’s cognitive abilities, the nature of the event, and the conditions under which the statement was elicited.

Subsequent theoretical and empirical work has refined and expanded this original system. Later models reorganized the criteria by distinguishing more explicitly between memory-related characteristics and features linked to strategic self-presentation, and by drawing on insights from reality monitoring research (Volbert & Steller, 2014). These refinements emphasized sensory, spatial, and temporal information; peripheral and script-deviant details; indicators of reconstructability and vividness; and explicit markers of memory-related shortcomings. The revised approach also highlighted that some criteria are more resistant to coaching and fabrication than others and therefore deserve greater weight in credibility assessment.

Bizarre Details and Perceived Credibility: Evidence from Experimental Research

This section draws exclusively on the experimental research conducted by Peace and colleagues (2015), which systematically examined how the presence, quantity, and type of bizarre details influence credibility judgments by mock jurors. The authors addressed a long-standing assumption in credibility assessment—also reflected in Criteria-Based Content Analysis—that unusual or bizarre details may serve as indicators of truthfulness. Their work provides important empirical nuance to this assumption by showing that the relationship between bizarreness and perceived credibility is neither linear nor straightforward.

Across three experimental studies, the researchers examined how variations in eyewitness testimony affected judgments of credibility, believability, and plausibility. The central focus was on “bizarre” details—defined as information that deviates from common expectations about how crimes and criminals typically appear—and on how such details interact with observers’ cognitive schemas of criminal events.

In Study 1, the authors examined whether credibility judgments varied as a function of crime perspective (victim versus witness account of events) and level of bizarreness (baseline, mild, moderate, and extreme). Participants acting as mock jurors evaluated written eyewitness statements that systematically varied along these dimensions. The results revealed a nonlinear effect of bizarreness. Statements containing baseline to mildly bizarre details were rated as more credible, whereas statements containing extremely bizarre details were judged as significantly less credible. In other words, a small degree of unusual information appeared to enhance credibility, but excessive bizarreness led observers to regard the testimony as implausible or deceptive.

Importantly, this effect was not dependent on whether the testimony was provided from a victim or witness perspective, although witness statements were generally perceived as more credible. The authors suggest that witnesses, as opposed to victims, may be perceived as less emotionally overwhelmed and therefore more capable of noticing and reporting unusual peripheral details. Additionally, individuals with prior victimization experience tended to rate testimonies as less plausible overall, possibly because they evaluated the statements against their own personal experiences of crime.

Contrary to expectations, fantasy proneness did not significantly influence credibility judgments. Although individuals high in fantasy proneness are often considered more receptive to unusual or imaginative content, Peace and colleagues (2015) found no consistent evidence that this trait increased acceptance of bizarre testimony. This finding suggests that credibility judgments may be guided more strongly by shared social schemas than by individual differences in imaginative tendencies.

Study 2 further refined these findings by disentangling the effects of the degree of bizarreness from the number of bizarre details. The authors manipulated both variables independently and found that the quantity of bizarre details played a more decisive role than their extremity. Testimonies containing a small number of bizarre details were judged credible regardless of their level of bizarreness. In contrast, statements containing a large number of bizarre details—particularly when combined with moderate or extreme bizarreness—were rated as significantly less credible.

These results indicate that observers are tolerant of limited deviations from expectations but become skeptical as unusual information accumulates. Rather than reinforcing each other, multiple bizarre details appear to undermine the overall plausibility of a narrative. This finding is particularly relevant to credibility assessment practices, as it suggests that an abundance of unusual details may be interpreted as excessive or contrived rather than as a marker of authenticity.

Study 3 examined whether the type of bizarre detail differentially affected credibility judgments. The authors distinguished between bizarre details related to the perpetrator’s appearance, the perpetrator’s actions, and the event itself. The results showed that bizarre action and event details were associated with lower credibility ratings, especially at higher levels of bizarreness. These details were perceived as violating observers’ schemas about how criminal events typically unfold.

In contrast, bizarre details about the perpetrator’s appearance did not reduce credibility and were often rated as more believable. Peace et al. (2015) interpret this finding in light of social stereotypes of criminals, which often portray offenders as strange, deviant, or visibly abnormal. As a result, unusual descriptions of perpetrators may align with observers’ expectations, whereas deviations in the structure or sequence of criminal actions are more likely to be rejected as implausible.

From a forensic perspective, this research has important implications. It suggests that jurors may overvalue or undervalue testimony based on how well it aligns with their expectations of criminal behavior rather than on its actual accuracy. Accordingly, unusual details should be interpreted with caution in content-based credibility assessments, and their evidentiary value should always be considered alongside narrative coherence, contextual plausibility, and the broader framework for statement evaluation.

The Limited Diagnostic Value of Non-Verbal Behavior in Credibility Assessment

Research on eyewitness testimony has long assumed that nonverbal behavior—such as facial expressions, gaze direction, posture, and gestures—provides meaningful cues for evaluating credibility and accuracy. This assumption is deeply embedded in everyday social cognition and widely shared by legal professionals, including judges, jurors, and police officers, who frequently report attending to demeanor when assessing witness credibility (Chalmers et al., 2022; Denault et al., 2024). However, a substantial body of empirical research has increasingly challenged the diagnostic value of nonverbal cues, demonstrating that their relationship to truthfulness and accuracy is weak, inconsistent, and highly context-dependent (Hartwig & Bond, 2011; Luke, 2019).

Both early and contemporary studies show that observers routinely incorporate nonverbal behavior into credibility judgments, often believing that behaviors such as nervousness, avoidance of eye contact, fidgeting, or emotional incongruence reliably indicate deception (Chalmers et al., 2022; Denault et al., 2024). Jurors, for example, frequently report evaluating not only what a witness says but also how it is delivered, interpreting gaze aversion or physical agitation as signs of dishonesty. Judges likewise report relying on posture, gestures, and facial expressions when judging credibility, despite substantial disagreement about how such cues should be interpreted and what they signify (Denault et al., 2024). These practices persist even though decades of deception research have shown that differences in nonverbal behavior between truth-tellers and liars are generally small, unreliable, and sensitive to situational factors (DePaulo et al., 2003; Hartwig & Bond, 2011).

A central limitation of demeanor-based assessments is that nonverbal behavior often reflects factors unrelated to deception or memory accuracy. Emotional arousal, stress, anxiety, trauma, and situational pressure can all influence how a witness appears while testifying, irrespective of whether their account is accurate (Kaufmann et al., 2003; Rogers et al., 2015). Consequently, a witness accurately recalling a distressing event may display behaviors commonly associated with uncertainty or nervousness—such as reduced eye contact or restless movements—whereas a deceptive witness may present in a calm and controlled manner. In such cases, incongruence between verbal content and nonverbal behavior may lead observers to discount accurate testimony and, conversely, to overestimate the credibility of inaccurate or fabricated accounts (Wessel et al., 2006).

Research further indicates that reliance on nonverbal behavior can systematically bias credibility judgments against particular groups of witnesses. Differences in expressive style related to language proficiency, cultural background, or ethnicity are often misinterpreted as indicators of low credibility rather than as normal variations in communication (Lindholm, 2008; Chalmers et al., 2022). Studies comparing native and non-native speakers consistently show that non-native witnesses are judged as less credible than native speakers, even when their testimonies are equally accurate (Lindholm, 2008; Raver et al., 2025). Observers may erroneously attribute differences in facial expressiveness, gesture use, or speech-related behaviors to deception or uncertainty, raising serious concerns about fairness and equality in legal decision-making.

A recent study by Raver and colleagues (2025) directly addressed this issue by examining a wide range of nonverbal behaviors in eyewitness testimonies and testing their association with recall accuracy, witnesses’ self-reported confidence, and observers’ credibility judgments. Importantly, this research moved beyond deception paradigms to focus on eyewitnesses who were not attempting to mislead but were instead reporting events as they remembered them.

The findings reported by Raver and colleagues (2025) provide strong evidence against the diagnostic utility of nonverbal behavior in credibility assessment. Nonverbal cues commonly believed to signal dishonesty or uncertainty did not reliably distinguish accurate from inaccurate statements. This pattern held for both native and non-native speakers, indicating that nonverbal behavior offered no reliable information about memory accuracy across language groups.

References

Blandón-Gitlin, I., Pezdek, K., Lindsay, D. S., & Hagen, L. (2009). Criteria-based content analysis of true and suggested accounts of events. Applied Cognitive Psychology, 23(7), 901–917. https://doi.org/10.1002/acp.1505

Bond, C. F., Jr., & Uysal, A. (2007). On lie detection “wizards.” Law and Human Behavior, 31(2), 109–115. https://doi.org/10.1007/s10979-006-9061-9

Brewer, N., Potter, R., Fisher, R. P., Bond, N., & Luszcz, M. A. (1999). Beliefs and data on the relationship between consistency and accuracy of eyewitness testimony. Applied Cognitive Psychology, 13(4), 297–313. https://doi.org/10.1002/(SICI)1099-0720(199908)13:4<297::AID-ACP578>3.0.CO;2-S

Bruck, M., & Ceci, S. J. (2009). Discriminating true from false memory: Problems and promises of litigation reforms. Psychology, Public Policy, and Law, 15(3), 257–295. https://doi.org/10.1037/a0016938

Chalmers, J., Leverick, F., & Munro, V. E. (2022). Handle with care: Jury deliberation and demeanour-based assessments of witness credibility. International Journal of Evidence & Proof, 26(4), 381–406. https://doi.org/10.1177/13657127221120955

DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psychological Bulletin, 129(1), 74–118. https://doi.org/10.1037/0033-2909.129.1.74

Denault, V., Leclerc, C., & Talwar, V. (2024). The use of nonverbal communication when assessing witness credibility: A view from the bench. Psychiatry, Psychology and Law, 31(1), 97–120. https://doi.org/10.1080/13218719.2023.2175068

Hartwig, M., & Bond, C. F., Jr. (2011). Why do lie-catchers fail? A lens model meta-analysis of human lie judgments. Psychological Bulletin, 137(4), 643–659. https://doi.org/10.1037/a0023589

Johnson, M. K., & Raye, C. L. (1981). Reality monitoring. Psychological Review, 88(1), 67–85. https://doi.org/10.1037/0033-295X.88.1.67

Johnson, M. K., Foley, M. A., Suengas, A. G., & Raye, C. L. (1988). Phenomenal characteristics of memories for perceived and imagined events. Journal of Experimental Psychology: General, 117(4), 371–376.

Kaufmann, G., Drevland, G. C., Wessel, E., Overskeid, G., & Magnussen, S. (2003). The importance of being earnest: Displayed emotions and witness credibility. Applied Cognitive Psychology, 17(1), 21–34. https://doi.org/10.1002/acp.842

Kealy, K. L., Kuiper, N. A., & Klein, D. N. (2006). Characteristics of real versus imagined events. Applied Cognitive Psychology, 20(7), 897–914.

Köhnken, G., & Steller, M. (1988). The evaluation of the credibility of child witness statements in the German procedural system. Issues in Criminological and Legal Psychology, 13, 1–20.

Lindholm, T. (2008). Validity in judgments of high- and low-accurate witnesses of own and other ethnic groups. Legal and Criminological Psychology, 13, 107–121. https://doi.org/10.1348/135532506X152949

Luke, T. J. (2019). Lessons from Pinocchio: Cues to deception may be highly exaggerated. Perspectives on Psychological Science, 14(4), 646–671. https://doi.org/10.1177/1745691619838258

Peace, K. A., Brower, K. L., & Rocchio, A. (2015). Is truth stranger than fiction? Journal of Police and Criminal Psychology, 30(1), 38–49. https://doi.org/10.1007/s11896-014-9140-7

Porter, S., & Peace, K. A. (2007). The credibility of traumatic memories. Applied Cognitive Psychology, 21(8), 1113–1127.

Porter, S., & ten Brinke, L. (2009). Dangerous decisions. Psychological Science in the Public Interest, 10(1), 1–35.

Rogers, H., Fox, S., & Herlihy, J. (2015). The importance of looking credible: The impact of the behavioural sequelae of post-traumatic stress disorder on the credibility of asylum seekers. Psychology, Crime & Law, 21(2), 139–155. https://doi.org/10.1080/1068316X.2014.951643

Steller, M., & Köhnken, G. (1989). Criteria-based content analysis. In Psychological methods in criminal investigation and evidence (pp. 217–245).

Volbert, R., & Steller, M. (2014). Is this testimony truthful, fabricated, or based on false memory? Credibility assessment 25 years after Steller and Köhnken (1989). European Psychologist, 19(3), 207–220. https://doi.org/10.1027/1016-9040/a000200

Vrij, A. (2008). Detecting lies and deceit: Pitfalls and opportunities (2nd ed.). Wiley.

Wessel, E., Drevland, G. C., Eilertsen, D. E., & Magnussen, S. (2006). Credibility of the emotional witness: A study of ratings by court judges. Law and Human Behavior, 30(2), 221–230. https://doi.org/10.1007/s10979-006-9024-1