Test or Deny? How Invalid Psych Testing Affects Foreign DBA Claims

The Defense Base Act has a “test or deny” problem. Insurance carriers argue that DBA benefits cannot be awarded for work-related psychological injuries or disabilities absent psychological testing. That is false. The American Psychological Association does not require psychological testing prior to diagnosis and treatment. The current Diagnostic and Statistical Manual for Mental Disorders (or “DSM-5”) does not require psychological testing as a part of its diagnostic criteria. And DBA caselaw makes it abundantly clear that an employee need not meet the criteria of a specific diagnosis to successfully state the existence of an injury.

Yet, the “test or deny” problem persists–especially for psychological claims involving foreign nationals.

The administration of psychological tests to foreign DBA claimants begs some questions:

Is it appropriate for defense experts to administer psychological tests to foreign nationals in the absence of normative or validation data to which the examinee’s answers can be compared?
Is it appropriate to administer English versions of psychological tests orally through an Albanian or Macedonian interpreter?
Is it appropriate to compare a foreign examinee’s test scores only to U.S. normative samples?
Are non-normed, non-validated personality assessments really “objective”?

In my opinion, this is not “objective” testing. Even if the tests are objective when appropriately administered in the United States, defense experts administer tests to foreign nationals in a way that undermines each test’s reliability and validity. My opinion finds support in scientific literature:

The generalizability of psychological tests with specific populations is an insidious problem in clinical psychology and may have significant implications in the context of a medical-legal examination. For example, if psychologists are asked to objectively substantiate the breadth, severity and veracity of subjective symptomatology and come to a diagnostic opinion, it is essential that such opinions are based on firm scientific grounds in order to meet legal standards and be accepted by the courts.

Eliyas Jeffay et al., Reliability of the French-Canadian adaptation of the Personality Assessment Inventory: Medical-legal implications, 28 Psychiatry, Psych. and L. 135, 135-48 (2021).

Below, I address these “generalizability” concerns and explore the insidious “test or deny” problem plaguing foreign national DBA claims. The use of non-normed, non-validated psychological tests as a gatekeeping tool to “prove” an entitlement to medical and compensation benefits runs contrary to the Defense Base Act.

How the Issue Arises:

For this article, I focus on two countries: Kosovo and North Macedonia. However, this problem persists in other countries where DBA claimants reside–e.g., Peru and Uganda.

Typically, my firm receives a notice of defense medical examination (“DME”) from either a law firm or an investigative company retained by the insurance carrier. The notice lists the date, time, and location of the examination (often a hotel conference room or a carrier-retained investigation company’s office). Further, the notice lists the identity of the U.S.-based, English-speaking expert.

The Kosovar or North Macedonian (who speak Albanian or Macedonian, respectively) participates in the DME through an interpreter hired by the carrier’s investigation company. Effectively, the insurance carrier’s hired interpreter then becomes the claimant’s mouthpiece for the insurance carrier’s adversarial examination of the claimant.

During the examination, the defense expert may administer psychological tests to the injured worker. Longer tests, like the 567-item MMPI-2, the 338-item MMPI-2-RF, or the 344-item Personality Assessment Inventory (or “PAI”), are read aloud to the claimant. The insurance carrier’s interpreter purportedly translates the test item into the claimant’s language, and the claimant responds. Often, the claimant does not record their own answer; the insurance carrier’s interpreter records claimant’s answers for claimant. Sometimes, the defense experts leave the examination during the testing, allowing a non-medical translator to administer the psychological tests. The expert then issues a report comparing the claimant’s test scores to U.S.-based normative groups.

To sum up, claimants must take psychological tests written in foreign languages, translated through their adversary’s interpreter who simultaneous acts as the claimant’s mouthpiece. Each claimant’s entitlement to medical care and compensation benefits depends on how their answers–again, interpreted by their adversary’s interpreter–compare to people who live half a world away in one of the most affluent countries on the planet. To make matters worse, lawyers for Fluor Federal Global Projects and Starr Indemnity recently threatened a claimant with monetary sanctions–despite the fact those sanctions are not allowed–because the claimant wanted to speak for himself at a defense medical examination instead of using a carrier-retained interpreter. You read that sentence correctly. Apparently some employers and carriers will only let a claimant speak when the employer and carrier controls the claimant’s words.

The (Ongoing) Problems with Administering Psychological Assessment Tests to Foreign Examinees Despite the Absence of Normative Data:

For this section of the post, I focus on the MMPI-2, MMPI-2-RF, MMPI-3, and PAI. The administration problems are not, however, limited to those tests. Judging from the scientific literature, defense expert test administration of psychological assessments to foreign nationals in Kosovo and North Macedonia violates basic tenets of standardized administration, as well as ethical standards.

First, the publishers have not issued Albanian or North Macedonian translations of these tests. The University of Minnesota’s website identifies available translations for the MMPI-2 and MMPI-2-RF. Neither Albanian nor Macedonian are available translations for the MMPI-2 and MMPI-2-RF. The MMPI-3 is so new that only English,U.S. Spanish, and French for Canada versions are available. Just like the MMPI tests, the PAI does not have Albanian or Macedonian translations. Yet, defense doctors administer these tests by orally reading each item through an Albanian or Macedonian interpreter. The claimant does not read each test item. Instead, the claimant responds to a carrier-retained interpreter’s verbal interpretation of each item. The carrier’s expert or the carrier’s interpreter may record the claimant’s answers…and later complain when the claimant exercises their right to review their own “test data,” as that term is defined by Ethical Standard 9.04 of the APA’s Ethical Principles of Psychologists and Code of Conduct.

Second, normative data for the MMPI-2, MMPI-2-RF, MMPI-3, and PAI do not exist for Kosovo and North Macedonia. Defense experts compare each examinee’s answers to U.S. normative groups. The first MMPI normative groups were developed in the U.S. in the 1940s. See Ben-Porath, Interpreting the MMPI-2 1-5 (Univ. of Minn. Press 2012). The MMPI-2 also collected a normative group from U.S. samples. Id. at 21. The MMPI-2-RF used the same normative groups as the MMPI-2, with a few tweaks. See Greene, The MMPI-2/MMPI-2-RF: An Interpretive Manual 22 (Allyn & Bacon 3d ed. 2011). While gender was removed as a consideration in the MMPI-2-RF, age, education and ethnicity remained. The ethnicity of all MMPI-2-RF test takes include: White (81.8%), Black (11.6%), Native American (3.1%), Hispanic (2.9%), and Asian American (0.6%). The recently published MMPI-3 adjusted its normative sample to reflect the U.S. Census. There were 1,620 English-language participants in the normative sample. The ethnicity of the normative sample included (approximately): White (60%), Black (12%), Other (4%), Hispanic (14%), Asian (5%), and Mixed Race (4%). Test authors collected English-language norms for 810 men and 810 women. Since the authors prepared their test based on the U.S. census, the test contains no normative data for Kosovar and North Macedonian examinees. The publisher’s website clearly states that there “are no separate cultural norms” for the MMPI-2-RF. (As an aside, Peruvian DBA claims must be discussed, too. U.S. Spanish-language norms were collected for 275 men and 275 women in the United States. For the MMPI-2 and MMPI-2-RF, three different Spanish translations exist: Spanish for Mexico & Central America; Spanish for Spain, South American & Central America; and Spanish for the U.S. So, even if an expert can administer the MMPI-2 and MMPI-2-RF in Spanish, Peruvian DBA claimants may want to ask whether the defense expert administered the correct Spanish translation.)

Third, the existence of a translation does not confirm the reliability of a translation. Recently, the French-Canadian version of the PAI came under fire because of the “the lack of reliability in the French translation . . . .” See Jeffay et al., supra, at 146. Indeed, experts “must be sure that patients are able to understand and correctly interpret the intention of the items before test scores can be considered an accurate representation of the individual’s functioning.” Id. at 147. This same issue existed in the original shoddy Dutch translation of the MMPI, which “lacked acceptable norms.” See James N. Butcher, Personality Assessment Without Borders: Adaptation of the MMPI-2 Across Cultures, 83 J. of Personality Assessment 90, 92 (2004). If a personality assessment translation already in existence can garner such criticism then surely an “on the fly” live interpretation should garner criticism, too.

Fourth, ad lib live interpretations of personality assessment inventories during a defense medical examination have not been back-translated. Back-translation (or backward translation) is the process whereby an original text is translated–in writing, of course–to a new language. It is a multi-step process that cannot logistically exist during live interpretation:

[Backward translation] “is a three-step procedure”; firstly, the original version of the test is translated into the target language; secondly, a different translator translates that version back into the source language; finally, the original and back-translated versions are compared by both psychologists and translators in order to consider possible deviations, and correct them.

Alicia Bolaños-Medina and Víctor González-Ruiz, Deconstructing the Translation of Psychological Tests, 57 Meta 715, 724 (2012).

Fifth, just because normative data may exist for one foreign country does not mean normative data exists for the examinee’s country. Still, defense examiners often opine that widespread test administration is appropriate no matter the lack of normative data matching the test taker’s demographics. The test publishers disagree. In fact, officially translated tests may have items and stimuli that “vary from the English version because of cultural and linguistic differences between the countries and their language.” Moreover, the American Psychological Association (along with other research associations) developed safeguards against inappropriate score interpretations. Tests lacking validity data for particular subgroups should not be used to disadvantage individuals in those subgroups. See Am. Psych, Assn., The Standards for Educational and Psychological Testing 70 (2014).

Sixth, the test format may be unfamiliar to foreign examinees, making instructions hard to understand. If the format of an official translation proves awkward, just imagine the inherent problems affecting an ad lib oral translation. Professor James Butcher highlighted this test format problem in a statement about the initial introduction of the MMPI in Israel:

When the MMPI was first introduced in Israel back in the 1970s, some of the students in one study noted that the true-false format was somewhat awkward at first–and referred to the test format as “American tests” because such questions were associated with American universities and not widely used in universities in Israel. Instructions had to be clarified to explain the task (Butcher & Gur, 1974).

See Butcher, supra, at 92.

Seventh, an expert may violate psychological ethical standards by inappropriately administering personality assessment tests. For example, the American Psychological Association’s Ethical Standard 9.02(b) requires psychologists to “use assessment instruments whose validity and reliability have been established for use with members of the population tested.” If such tests are used, then the expert must “describe the strengths and limitations of test results and interpretation.” Next, the APA’s Ethical Standard 9.02(c) requires psychologists to “use assessment methods that are appropriate to an individual’s language preference and competence, unless the use of an alternative language is relevant to the assessment issues.” These ethical standards are not new, either:

Psychologists have long recognized the need to examine the generalizability of their test findings with specific populations and not to facilely assume that comparable results are obtainable irrespective of cultural and ethnic status. As members of the profession, psychologists are mandated by ethical standards to identify limitations within their assessment methods due to race, ethnicity, and national origin (Standard 2.04; American Psychological Association, 1992). Anastasi (1988) observed that attempts to achieve linguistic equivalence often fall far short of the mark and do not constitute an adequate test of validity. Even if linguistic equivalence were achievable, cultural and ethnic influences on the clinical presentation and subsequent diagnosis of mental disorders would require external validation.

See Richard Rogers et al., Initial Validation of the Personality Assessment Inventory-Spanish Version With Clients from Mexican American Communities, 64 J. Personality Assessment 340, 340-41 (1995) (cautioning that “psychologists should be very circumspect in applying any decision rules about responses styles to minority populations, because of the lack of data on feigning and defensiveness for multi scale inventories”).

Eighth, publishers must grant written permission to translate tests. PAR, the publisher of many tests used in foreign national DBA claims–e.g., the PAI and TSI-2–“will consider request for permission to reproduce, modify, or translate any copyrighted publication.” “Translating a test into a language for which it is not currently available” requires written permission. Translation of PAR tests like the lengthy PAI is not as simple as reading a test through an interpreter. Indeed, translations must be approved and back-translated. PAR’s “[b]ack-translations have been conducted by an individual unfamiliar with the English version of the test and the back-translation has been forwarded to the author/PAR for review and approval.” I have never seen a statement in defense expert reports of Kosovar and North Macedonians confirming that the test publisher or author granted permission to translate the tests into Albanian and Macedonian. Finally, considering the work that Pearson put into development of the MMPI-3, I cannot imagine that it approves of unlicensed translations.

I could keep going, but I think you get the point. Personality assessment tests are not plug-and-play evaluation options for all cultures. Stating otherwise, either in a report or in testimony, is disingenuous and wrong.

Ignoring Testing Guidelines Hurts Examinees:

Guidelines exist for translating psychological tests in cross-cultural use. Failure to follow those guidelines increases the risk of misdiagnosis, the false-positive hits on validity scales, and the denial of medical treatment.

In 2004, Professor James Butcher, who was instrumental in the development of the MMPI-2, noted the following basic test translation procedures. Those procedures include:

“Assure that issues of copyright and future publication arrangements have been resolved before work begins.”
“Translators should be true bilinguals, people who have lived and functioned in both languages for a substantial time. It is usually desirable in the translation process to have two or more bilingual psychologists do an initial translation. Then after each translator has completed the task, they meet to discuss each item rendering and arrive at the best possible meaning.”
“Once the initial translation has been completed, it is important to conduct a back translation of the instructions and the item pool.”
“When the back translation is complete, it is usually desirable to conduct a bilingual test-retest study.”
“Having a completed translation with a back translation and bilingual test-retest study completed, the investigators can begin to accumulate data to complete the adaptation process. In many cases, it will be necessary to conduct a normative study–either to assure that existing norms (such as American norms) would be appropriate or else to accumulate a normative population specific to the target country.”
“External validity data and appropriate research samples should be collected to enable psychometric analyses (such as factorial studies) to be completed to further verify the translation.”

See Butcher, supra, at 92.

Genuine translations of other psychological tests follow similar procedures to ensure reliability. For example, here are the steps that were taken to create a German translation of the Personality Assessment Inventory:

First, three independent translators translated all 344 PAI items.
Second, researchers developed a consensus of the three translations. They “decided on the final wording of each of the items that best reflected the original content of the item and showed high readability in German.”
Third, they back-translated the German translations into English.
Fourth, they compared the consensus translation to the original test. Some items required small, content-related changes.
Fifth, they tested the translation for item equivalence. The first test participants were 38 bilingual women who were familiar with both English and German.
Sixth, the researchers “calculated differences between the item raw scores of both language versions and tested therefor significance.” Nineteen items were changed–some by changing qualifying words, some reworded “more extensively.”
Seventh, German standardization testing began. The standardization sample consisted of 749 adults, with a social research institute collecting the normative data.

See Julia A. Groves and Rolf R. Engel, The German Adaptation and Standardization of the Personality Assessment Inventory (PAI), 88 J. of Personality Assessment 49, 50-51 (2007).

Still, after all of that effort to translate the German version of the PAI, the German version required additional validation. Like the researchers recognized:

Our clinical database is currently too small and too biased for a comparison. The validation of a new instrument is usually a long process involving the collection and validation of data over several years. In the United States, the publication of the PAI in the early 1990s has led to a growing amount of literature that has demonstrated its increasing popularity especially in the fields of clinical and forensic psychology. Future steps for the validation of the German PAI are focused on the collection of clinical data; a larger clinical sample will provide greater opportunities for validity testing.

Id. at 55

Importantly, Americans and Germans scored different on clinical scales, treatment scales, and a validity scale:

The German standardization sample scored higher than the American census sample on one Validity scale, PIM; the Clinical scales SOM, PAR, and DEP; and the two Treatment scales, NON and RXR.

Id. at 53 (referencing the Positive Impression (PIM) validity scale; the Somatic Complaints (SOM), Paranoia (PAR), and Depression (DEP) Clinical scales; and the Nonsupport (NON) and Treatment Rejection (RXR) Treatment scales).

Researchers addressing other written translations of the PAI likewise cautioned that scoring differs based on the population tested when compared to American norms. For example, the initial translation of the Argentinean PAI showed scoring differences between Argentineans and Americans.

When comparing this sample with the American standardization sample (Morey 1991), differences in most scales and sub scales were found, with higher scores mainly for the Argentinean sample. That stresses the above-mentioned importance of adaptations for every population, such high differences might be manifested according to the country, the region, the culture or subculture, or the kind of specific population involved. Moreover, 24 years had elapsed between the two studies, suggesting changes in symptomatology patterns in time, possibly due to multiple factors. Issues around the translation and the use of idioms must be reviewed in depth in future studies, as well as in the professional use of the PAI.

Juliana B. Stover et al., Personality Assessment Inventory: Psychometric Analyses of its Argentinean Version, 117 Psych. Reports 799, 820 (2015).

So, is it accurate that the PAI has been administered outside of the United States? The most accurate answer is that there are translations for some locales outside of the United States, but certainly not all. For the most part, researchers administered the test after rigorous translations, back-translations, testing, re-testing, and development of normative data with many people across an extended period of time.

Even with translation protocols in place, the normative data generated for each translated test revealed differences on test scales when the tested population’s scores were compared to American standardization samples. These differences are incredibly important for Defense Base Act claims because defense experts often use scores on validity scales as a basis to impugn a foreign claimant’s credibility. If a scientifically translated test can produce higher validity scale scores, then just imagine the varying scores that are produced when a non-normed, non-validated test is live-translated by a single interpreter hired by the claimant’s adversary. Those scores should not be used to deprive a claimant of medical treatment or compensation benefits.

Conclusion:

I will say it again: the Defense Base Act has a “test or deny” problem. Carriers argue that “objective” testing is required to corroborate the injured worker’s subjective symptoms, even though that is not the law. Then, Carrier’s exacerbate the problem by hiring experts who do not objectively administer personality assessment tests to foreign nationals. The administration of long personality assessments lacking in normative standards through a single interpreter hired by the claimant’s adversary is in no way objective.

The bias is palpable. Everything from the selection of the test to administer, through the administration of the test via live interpretation, to the comparison of scores to U.S. normative groups demonstrates bias.

The validity of an examinee’s tests scores should not matter when the test administration lacked validity in the first place. One of the hallmarks of standardized testing is standardized administration. When administration destroys test validity, scores on the invalid test should not deprive foreign nationals of medical care or disability benefits. Wholesale discrimination of foreign nationals through unfair psychological testing administration has no place in the Defense Base Act.

Test or Deny? How Invalid Psych Testing Affects Foreign DBA Claims

How the Issue Arises:

The (Ongoing) Problems with Administering Psychological Assessment Tests to Foreign Examinees Despite the Absence of Normative Data:

Ignoring Testing Guidelines Hurts Examinees:

Conclusion:

Address & Contact Info

Strongpoint Law Firm

Directions

Stay Connected