Debra Comer."Part I. An Evaluation of Fitness-for-Duty Testing." Presented At: 103rd Annual Convention of the American Psychological Association. New York, NY. Aug 15, 1995.
This research assesses computerized fitness-for-duty testing, which has recently been adapted commercially as an alternative or supplement to biochemical drug testing in the workplace. After presenting the scant literature on fitness-for-duty testing programs and addressing criticisms of these programs, and discussing concurrent research on fitness-for-duty testing, the author presents a field study of two such devices. Specifically, archival data from test manufacturers are analyzed, as are response s to semi-structured interviews with managers of fitness-for-dutytesting organizations and to questionnaires completed by their employees. Practical implications for implementing fitness-for-duty testing are explored.
Overview
In light of the intrusiveness and questionable effectiveness of workplace drug testing programs, performance-based fitness-for-duty testing has emerged as a promising option for employers interested in preventing employees from working while impaired. However, the validity and feasibility of fitness-for-duty testing have not been systematically investigated. In particular, employees' attitudes toward these tests have not been examined.
This paper reviews the scant anecdotal research on performancebased testing programs, addresses criticisms raised by detractors of performance testing, and reviews concurrent research on fitnessfor-duty tests. Next, a field study of two commercially u sed tests is presented. Archival data from test manufacturers are analyzed, as are semi-structured interviews with managers of organizations using fitness-for-duty testing and questionnaires completed by their employees. Implications of implementing fit ness-for-duty testing are considered.
Introduction
In just a few short years, workplace drug testing has become a common organizational practice. Whereas in 1987, only 21.5% of the respondents surveyed by the American Management Association reported that they conducted some form of drug testing, 87.29 % indicated that they tested in 1994 (Greenberg, Canzoneri, & Straker, 1994). Nevertheless, workplace drug testing has been questioned on the grounds that it invades employees' privacy by providing their employers with information about their off-hour ac tivities (see, e.g., Abbasi, Hollman, & Murrey, 1988; Flaig, 1990; Haas, 1990; Hanson, 1988; Kupfer, 1988; Lewis, 1990; Malia, 1989; Maltby, 1990; Mendelson & Libbin, 1988; Orentlicher, 1990; Pavlovich, 1989). Desjardins and Duska (1987), for example, ha ve argued that if an employee is using drugs, so long as he or she performs responsibly, the organization does not need to know about this drug use, and that if the employee's performance is compromised by the drugs, the employer may rightfully discipline or dismiss the employee but still does not need to know the root of the impeded performance (see also Caste, 1992; Moore, 1989).
Employees' responses to workplace drug testing programs
The proliferation of workplace drug testing and controversy over its intrusiveness have led to research on the effects of various drug testing practices on the attitudinal and behavioral responses of employees and job applicants. This research suggest s, indeed, that the perceived appropriateness of drug testing has important implications for employees' attitudes and behaviors.
When Stone and Kotch (1989) asked blue-collar employees at a manufacturing firm to respond to scenarios involving hypothetical drug testing practices, those who read a scenario in which advance notice of drug testing was provided to employees responded more favorably to the testing than did those who read about a program requiring employees to give urine samples without warning. Those who read a scenario in which employees with positive test results were placed in rehabilitation responded more favorab ly than did those reading about a program that fired employees who tested positive. Similarly, Murphy, Thornton, and Reynolds's (1990) college student respondents viewed drug testing less hospitably when a failed test led to applicant rejection or employ ee termination. These students also reported feeling more positively disposed toward both testing on suspicion and post-accident testing for both current employees and applicants than toward random testing or periodic testing of all individuals. Stone a nd Vine (1989) reported that drug tests that provided organizations with medical information about their employees were perceived as having violated privacy. Latessa, Travis, and Cullen (1988) and Murphy, Thornton, and Prue (1991) found that drug testing was opposed when it was not limited to individuals performing dangerous or safetysensitive work.
Hanson (1990) surveyed and solicited written comments from railroad and chemical workers about drug testing at their own organizations. Respondents generally viewed testing as justifiable only for current employees who seemed to be under the influence or for job applicants. These employees found it especially distasteful to be subjected to post-accident testing in cases where the mishap clearly resulted from nonhuman error. Their comments also revealed their perception of random or periodic testing without reasonable suspicion as a humiliating intrusion on their privacy, as well as a sign that managers distrusted them and discounted their years of good service. Some railroad employees feared that drug testing was being used to reduce the workforce. In Axel's (1990) large-scale survey, companies that conducted drug testing reported employee resentment as a major problem. The drug testing program at a major hospital had to be scaled down, just months after its inception, in response to the complain ts of irate employees (Hopkins backs off..., 1991).
Drug testing may take its toll on job applicants as well as current employees. Karren (1989) reported that the drug testing policy of a hypothetical organization was a key factor in about 20% of his student subjects' decisions whether to join it. On the basis of their empirical findings, Murphy et al. (1990) cautioned that high-caliber job candidates may refuse employment offers from organizations with offensive testing practices. Likewise, in a study by Crant and Bateman (1990) that asked students to assume the role of a recent graduate seeking a job to respond to a description of a fictitious organization, those presented with a scenario about an organization that did not conduct drug testing had more positive attitudes toward the organization and reported greater intentions to apply for a job than did those who read about one that did test. In a similarly designed experiment by Ross, Ringer, and Miller (1992), in which all students assumed the role of a job-seeker who had been invited for a secon d interview, those asked to submit a urine specimen expressed less positive attitudes toward the application process and the organization and reported lower intentions of continuing the application process and accepting a job if offered than did those not asked to take a urine test.
How effective are workplace testing programs?
Not only does drug testing elicit negative responses from employees, but there is a lack of definitive evidence of its effectiveness at achieving organizational safety or productivity (see Cropanzano & Konovsky, 1993; Harris & Heft, 1992; Morgan, 1991; Normand, Lempert, & O'Brien, 1994; Thompson, Riccucci, & Ban, 1991). That is, despite conventional wisdom, drug testing has not been shown to have predictive validity as a measure of organizational effectiveness (Hoffman & Lovler, 1989; Vodanovich & Rey na, 1988). Because certain demographic factors may simultaneously affect one's decision to use drugs and one's work behavior (see Holcom, Lehman, & Simpson, 1993; McDaniel, 1988), only some of those members of the workforce who use drugs are compromising their performance by using them on the job, and some off-hours drug use may even boost productivity (Gill & Michaels, 1992; Register & Williams, 1992). Although proponents of drug testing presume it will discourage drug use, those current employees most deterred by the prospect of workplace testing are likely the most casual offhours users who abstain from their at-home use in order to avoid potential embarrassment and job loss. Because addicts have less control over their drug intake, testing may not so easily deter their substance abuse (Orentlicher, 1990), which has greater potential for personal and organizational harm. Thus, applicant screening may keep both serious and casual users from gaining employment in the first place (although some member s of this former group may thwart detectors by altering their own urine or even substituting drug-free urine samples; see Bearman, 1988; Crown & Ross, 1991; Hanson, 1988), but testing cannot prevent incumbents' drug use in the workplace.
Another problem with drug tests is their inability to detect drug use in time to prevent it from causing harm. Testing can only distinguish between someone who has used or been exposed to a drug and someone who has not; it cannot tell when the former took the drug, how much was taken, how frequently this person has taken this drug, or the effect of the drug on the user (Axel, 1990; Lundberger, 1986; Morgan, 1987). By the time an employee's test result has been interpreted as positive, any drug-impair ed behavior would already have taken its toll. On the other hand, the metabolites in a person's urine that produce a positive test result do not necessarily mean the person cannot work, as any effects of the drug could have long worn off by the time the test was administered. As explained by Orentlicher (1990), "a test that is positive for drug use may be falsely positive for drug impairment" (p. 1039).
Because there is considerable evidence that drug testing can have a potentially negative impact on employees' attitudes and behaviors, and that it can neither dependably deter incumbents' workplace drug use nor detect their impaired performance resulti ng from such drug use, its usefulness as a management tool should be questioned.
Fitness-for-duty testing
What is needed is an alternative to biological testing that is better suited to identifying the substandard work that should ultimately concern managers -- before it occurs -- while preserving employees' rights: actual performance-based fitness-for-du ty testing. The technology of performance testing was developed 30 years ago, but it has been applied commercially only since the late 1980s (Stevens, 1990). According to accounts in the popular press, the computer-based critical tracking test, a sort o f video game in which employees engage in the same psychomotor skills their jobs require, is less intrusive and more valid than urine testing for determining impairment that may or may not stem from drug use (Flaig, 1990; Frieden, 1990; Maltby, 1990; Stev ens, 1990; Yes, test..., 1991). Test-takers use a control knob to try to keep a constantly and randomly veering cursor centered on their video screen. Employees' scores at a given trial, based on speed and accuracy, are compared to their average previou s performance.
Although test-takers' baseline performance may eventually improve, a complex algorithm precludes mastery (Maltby, 1990; Stevens, 1990). Further, the test has been designed so that an employee cannot trick the computer into establishing a low baseline (so as to pass the test later while impaired), because an unusual lack of progress on baseline-setting trials is recognizable (Fine, 1992).
Compared to urinalysis, the critical tracking test takes less time to collect data (about a minute); it costs approximately 100 to 200 dollars per employee per year, affording more tests than urinalysis programs that charge about 40 to 60 dollars per t est per employee (Fine, 1992); and results are immediately available (Fine, 1992; Frieden, 1990; Hanson, 1990; Maltby, 1990; Stevens, 1990; Warshauer, 1991). Further, because the test measures employees' reaction time and coordination, it assesses job pe rformance rather than personal matters. Maltby (1990) has even reported that performance testing has detected drug use in cases that eluded urinalysis. For instance, whereas cocaine will not show up in the urine of an employee who has used it just minut es before a test, a (valid) performance test will readily and unequivocally disclose the employee's impaired skills.
Fitness-for-duty testing has generally been used to assess the motor skills of drivers, pilots, etc.; indeed, a leading commercial supplier of the critical tracking test recommends its product especially for individuals whose jobs do not provide them w ith an opportunity to correct mistakes of hand-eye coordination (Sandra Jaffe, Director of Marketing, Performance Factors, Inc., personal communication, 7/92). However, another test battery has potential applications for a variety of occupations. The Es sex Corporation's computer-based Delta-WP system can test short-term memory, perception, reasoning, linguistic/symbolic manipulation, and spatial relations, in addition to psychomotor functioning (Delta-WP Manager's Manual, 1994). Licensing for the softw are costs from 2,495 dollars per organization. Optional testing materials and training and consultation are available for additional fees.
As skills testing has only recently been adapted for the workplace, it is not yet widely known or understood beyond what little has appeared in a handful of articles in the popular press.
Indeed, not one of 50 drug-testing organizations in a recent survey practiced performance testing; and representatives of the 84 nontesting organizations in the sample generally viewed performance testing as less appropriate than drug testing (Comer & Buda, in press). Likewise, a mere six (.756%) of the 794 organizations responding to the American Management Association's latest survey (Greenberg et al., 1994) reported that they used performance testing. It is also notable that the concept and techno logy of fitness-for-duty testing are generally more familiar to transportation experts and biomedical engineers than to psychologists in either academia or industry. Indeed, Drasgow, Olson, Keenan, Moberg, and Mead's (1993) recent review of the advantage s of computerized tests over more traditional paper-and-pencil tests did not mention the application of the former to determine fitness for duty. Yet skills testing seems promising for organizations that care more about their employees' readiness to perf orm than the cause of any impairment. Organizations that use performance tests have reported that they are more effective and efficient than urine tests; that they have the added capacity to detect the harmful effects on performance of illness, sleep dep rivation, and emotional preoccupation (as well as drug use); and that employees prefer them to drug tests (McGinley, 1992; Stevens, 1990). For example, according to the vice president of a gasoline and diesel fuel transporter, a significant drop in accid ents and mistakes followed his company's implementation of performance testing for truck drivers (McGinley, 1992). Nonetheless, as such accounts are anecdotal, and information about these testing programs is more journalistic than scientific, it is neces sary to conduct a less haphazard evaluation of the impact of impairment testing. Indeed, Harris and Heft (1992) and Gilliland and Schlegel (1993) recently commented on the lack of "scholarly writing" about performance tests, and called for studies on the ir validity and efficacy. And Normand et al. (1994), recognizing the potential of this form of assessment, likewise recommended: "further research is a high priority" (p. 206).
The remainder of this paper attempts to provide a more systematic evaluation of fitness-for-duty testing. First, criticisms of these performance-based tests are addressed. Then, after describing concurrent research on these tests, the author presents her own field study. Archival data from two test manufacturers are examined, as are semi-structured interviews with managers of organizations using these tests and questionnaires completed by their employees.
Criticisms of fitness-for-duty testing
As discussed earlier in this paper, drug testing has been readily accepted, despite the absence of strong evidence that it is enhancing workplace safety and productivity (Cropanzano & Konovsky, 1993; Morgan, 1991; Normand et al., 1993). Indeed, govern mental and organizational decisions to implement drug testing have sometimes stemmed more from sociopolitical or symbolic than rational practical factors (Cavanaugh & Prasad, 1994; Guthrie & Olian, 1991; Karper, Donn, & Lyndaker, 1994; Thompson, Riccucci, & Ban, 1991; Zimmer & Jacobs, 1992). In contrast, many in the U.S. will not even consider fitness-for-duty testing until extensive evaluation has occurred. Blanton, Kidwell, and Bennett's comment, that "until the reliability of performance testing is fi rmly established, criticism of it will continue" (1992, p. 355), is instructive. Performance testing, it appears, is dismissed out of hand by those wedded ideologically to urine drug testing, even as it is viewed skeptically by those made cautious by the momentum of the drug-testing movement.
Fitness-for-duty testing has been faulted for not ascertaining if an employee who fails an impairment test is using drugs (see, e.g., Blanton et al., 1992; Butler, 1993). As discussed, however, a major asset of behaviorally-based tests is that they pr otect employee rights by not disclosing to employers the cause of impairment. Rather, unlike biochemical tests, they are designed to determine whether or not the employee is impaired. As summed up by erstwhile urinalysis proponents Normand et al.: "If.. .the goal is to identify impaired workers to either refer them to treatment or to prevent them from performing a task that may endanger themselves or others, then behavioral indicators provide a more direct means than drug tests of identifying employees u nable to perform at required levels....[T]o minimize hazardous behavior and other performance problems [b]ehavioral indicators may be a better means...than are chemical tests" (1994, p. 206).
Nonetheless, much is at stake in terms of the national commitment and investment toward biochemical testing. For example, in the transportation industry, drug testing efforts are expanding: recently passed Department of Transportation regulations requ ire biochemical testing for mass-transit workers and intrastate truckers and bus drivers, which more than doubles the number of workers who must submit to these tests (Newman, 1994). Organizations that must conduct biochemical testing, as well as those t hat have taken it upon themselves to do so, can be expected to have less incentive to incur the additional expense of performance testing -- regardless of the relative merits of the two approaches.
A related criticism is that the critical tracking test, a major type of performance test, does not always detect impairment when relatively low levels of alcohol or marijuana have been consumed (Butler, 1993). The particular concern is that the CTT do es not consistently pick up impairment at the blood alcohol level used to arrest individuals for drunk driving. Apparently, the standard used to determine legal driving has been reified and deemed infallible. However, the same blood alcohol level cannot realistically be appropriate for all drivers (John Morgan, M.D., Professor of Pharmacology, City of New York University School of Medicine, personal communication, 12/8/94). The degree of impairment associated with a given BAC is not constant and may va ry among individuals. This may be explained in part by the phenomenon of tolerance. Tolerance is a decrease in the magnitude of an effect of a given dose of a drug after repeated exposure to the drug ("Alcohol-related impairment," p. 2).
Indeed, more experienced alcohol drinkers exhibit less perceptual, motor, and cognitive impairment than their less experienced counterparts across a range of blood alcohol levels (Chesher & Greeley, 1992; Morrow, Yesavage, Leirer, & Dolbert, 1993). Rel atedly, in the case of marijuana, Grinspoon and Balakar explain that some chronic users develop behavior tolerance...learning to compensate for the effects of the high. [This] may explain why farm workers in some third world countries are able to perform heavy physical labor while smoking a great deal of marihuana....Behavioral tolerance substantially reduces the effects of intoxication on attention and motor coordination in long-term users (1993, p. 146).
Again, the real issue should not be whether an individual has been engaging in certain behavior (that is, ingesting a particular quantity of a drug) that could compromise some individuals' performance, but whether this individual's performance has, in fact, been impaired. A valid fitness-for-duty test, unlike a physiological test, can assess an employee's actual readiness for work. On the other hand, "the tolerance acquired for a specific task or in a specific environment is not readily transferable to new conditions" ("Alcohol and tolerance", 1995, p. 2). Hence, an employee whose tolerance enables him to compensate, and thus easily pass his well-learned fitness-for-duty test and competently perform familiar aspects of his job, may imperil himself a nd others if faced with a novel situation at work.
Blanton et al. (1992) have criticized performance tests on account of "the complexity involved in developing proper employer responses for all the contingencies that can lead to an unacceptable score" (p. 358). Again, they have missed the point. The cause of impairment -- whether it be lack of sleep, emotional stress, substance use, etc. -- is unimportant. What matters is that job performance may suffer. If a pattern of impairment emerges, then the employer may understandably be concerned about the employee and perhaps recommend that the employee seek counseling, but still does not need to know the cause of the impairment (see Desjardins & Duska, 1987). And surely, any decent administrator or human resources manager should be able to design an app ropriate system for addressing test failures.
Butler (1993) is concerned that some performance-based tests may not appropriately assess performance on tasks and skills relevant to those that employees use to do their jobs. Indeed, the job-related face validity of fitness-for-duty tests may be imp ortant insofar as Rynes and Connerley (1993) and Schuler (1993) observed that individuals have more favorable views toward selection devices they deem job-relevant. Yet, one's performance on a behavioral test of any type would seem more closely related to one's ability to do one's job than would the results of a biochemical test. Thus, it has been argued that employees would be more apt to accept a fitness-for-duty vs. biochemical test on the grounds of job-relevance (Gilliland & Schlegel, 1993).
More importantly, Gilliland and Schlegel (1993) note that because readiness-to-perform tests are used primarily to detect the presence of performance-impairing risk factors, their job-related face validity, i.e., whether they appear to be assessing act ual job performance, may be less relevant. Gilliland and Schlegel (1993) do acknowledge that job-related face and criterion validity may be important to bolster the legal defensibility of using fitness-for-duty tests. On the other hand, they point out a major disadvantage of claiming job-relevant criterion validity: If a test has only risk-factor (and not job-related) criterion validity, then individual differences on the test do not matter, because individuals' test scores are simply interpreted in li ght of their own baseline. However, suppose there is evidence that the test has job-related criterion validity. Suppose further that Jack meets his baseline but underperforms Jill, who fails to meet hers. It may not be equitable or sensible to prevent Jill from climbing a steep hill while allowing her co-worker, Jack, to do so (see also Gross, 1991). Thus, if job-related criterion validity has been established for a test, manufacturers and clients need to address questions regarding the equity and pra cticality of using relative vs. absolute impairment to determine readiness to perform.
Another criticism waged against performance-based fitness-for-duty tests (often by proponents of biochemical testing) is that employees can and will cheat on them, by performing beneath their capabilities on their baseline-establishing trials, so as to pass later when they are under the influence (Clark, 1994). Vendors of testing devices claim (naturally) that their products have been designed to prevent cheating. Likewise, Gilliland (personal communication, 10/12/94) asserts that "falsing" is not t he serious problem some make it out to be. Rather, he believes that although it is possible to suppress one's performance one day, variability in performance from day to day would make it "awfully hard to consistently fall short of your [true level of] p erformance." In contrast, Dennis Attwood (personal communication, 6/26/95), a human factors engineer at Exxon, disagrees. Although he supports fitness-for-duty testing in theory, he questions the validity of tests that measure voluntary responses becaus e test-takers can "fudge up or down." He says that he would have greater confidence in a test that monitored changes in an involuntary response, such as pupil diameter.
Fitness-for-duty tests have also been faulted for not being able to prevent a worker from using an impairing substance after passing his or her early-in-the-shift test (Butler, 1993). (Of course, there is no guarantee that a worker will refrain from r iskimpairing behavior after taking a drug test.) Gilliland and Schlegel (1993), also, raise the question as to the sufficiency of daily testing, but are concerned about detecting fatigue rather than drug use. They recommend testing "following breaks or lunch prior to returning to work" (p. 31). However, inasmuch as testing employees more than once a day may increase logistical and administrative burdens, a reasonable compromise might be to test all employees at the beginning of their shift and randomly select a percentage of them each day for a second, post-break test.
Whether test-takers can "recruit" -- that is, motivate or psych themselves up to give the performance test all their energy and attention for the short period of time it requires -- is another important question (Gilliland, personal communication, 10/1 2/94). Some people may be able to perform at very high levels for a minute or so -- even in the face of high impairment. In particular, there may be certain risk factors (e.g., emotional stress) that will not be detected by a brief behavioral test but t hat can still detract from job performance (Gilliland & Schlegel, 1993). Also, as noted earlier, employees who are too impaired to perform non-routine tasks may still be capable of performing a well-learned fitness-for-duty test or accustomed aspects of their jobs. If so, the predictive validity of the testing methodology to detect these risk factors would be compromised.
According to anecdotal reports, employees respond positively to impairment tests. But Butler (personal communication, 9/26/94) posits that performance testing can be just as intrusive as drug testing. Indeed, employees may resent having to take perfo rmance tests, especially more than once daily. Relatedly, Gilliland and Schlegel (1993) posit that failing one's performance test "can, and probably will, be viewed as stigmatizing in the same manner as a positive biochemical drug test" (p. 30). They fu rther reason that because a performance test failure may stem from any number of factors, hapless employees may be the target of rumors and doubts. Moreover, Blanton et al. (1992) argue that performance testing is no more desirable than urine testing beca use it is equally susceptible to corrupt managerial prerogatives.
Yet another potential problem with performance testing is the hidden administrative costs it may generate, including the expense of procedures for follow-up in cases where employees fail their performance tests (Gilliland & Schlegel, 1993). Conducting biochemical tests, offering counseling, and/or reassigning skilled employees to nonskilled non-safety-sensitive jobs and replacing them compound the costs of computer equipment and training.
Moreover, even though it takes just a minute or so to run one trial of a performance test, if an employee fails several consecutive trials, at least 10 minutes must elapse before the employee can try again. This waiting period is sure to break the flo w of operations at an organization. Also, in large organizations with many employees or sites, testing apparatus and program coordinators are needed at every station or site, thus driving up the initial costs of a fitness-for-duty program (Transportation Research Associates, Inc., 1994).
In sum, the following questions remain about fitness-for-duty tests:
1) Do fitness-for-duty tests detect the presence of performance-impairing risk factors and/or do they predict actual job performance?
2) To what extent can employees pass their fitness-for-duty tests when they are capable of performing routine aspects of their job but unfit to meet the challenge of novel non-routine tasks?
3) To what extent can employees recruit -- motivate themselves sufficiently to pass their fitness-for-duty test -- even when they are too impaired to work?
4) What kinds of policies have organizations devised to administer these tests and address test failures?
5) To what extent are fitness-for-duty test administration and interpretation susceptible to managerial prerogatives?
6) What are the administrative costs of fitness-for-duty testing?
7) To what extent can employees cheat on fitness-for-duty tests?
8) How do employees view fitness-for-duty testing? These questions will be revisited after considering concurrent research on these tests, archival data from two test manufacturers, interviews with managers of former and current fitness-for-dutytesting client organizations, and questionnaires completed by currently tested employees.
Concurrent research
As Gilliland and Schlegel advised in a critical report on readiness to perform (RTP) testing: The degree to which an RTP measure is related to either job performance or a risk factor cannot be assumed -- it must be verified empirically. Further, it sh ould be verified by comparing the specific RTP measure in question with actual job performance measures or with task performance measures while in an experimentallymanipulated risk-factor state (1993, p. 11).
Gilliland and Schlegel are presently analyzing data they generated in a "synthetic work environment" (personal communication, Gilliland, 10/13/94). They have, in essence, conducted the very sort of empirical study called for in their 1993 technical re port. For four weeks, college student research participants "worked" for two hours per day, taking five readiness-to-perform test batteries and two job performance tests expected to be linked with performance on the RTPs. (Before the experimental trials began, they performed training trials to establish baseline scores.) The investigators collected objective data on the effects of alcohol, sleep loss, and antihistamines on RTP tests and task performance, as well as subjective data concerning how perfor mers felt after exposure to the various stressors. Their data will thus provide evidence as to the risk-factor and job-related criterion validity of each test battery.
Meanwhile, the Transportation Cooperative Research Program (TCRP) is sponsoring its own evaluation of fitness-for-duty devices. The purpose of the evaluation, led by David Kerr of Battelle, is to determine if any commercially available test works well enough to be used in the transportation industry. According to Kerr (personal communication, 6/26/95), five of the six vendors of fitness-for-duty tests invited to participate in the laboratory study accepted. Stephen Andrle of TCRP emphasized that alt hough test vendors have made impressive claims about their products, it is necessary to assess their validity and feasibility. He explained that due to "quirks" in the experimental environment, results of the investigation, in which subjects performed fi tnessfor-duty tests after drinking screwdrivers, were inconclusive. A second phase of the project, which will repeat the first phase but also include a placebo group, is to be completed by the spring of 1996 (personal communication, 6/29/95).
Attwood (personal communication, 6/26/94) insists, however, that what is missing is a solid field study in which actual employees take a fitness-for-duty test under true workplace conditions. Such a study would need to expose these employees to stress ors (such as alcohol and fatigue), and then use manipulation checks to ensure that the stressors had their intended effects and to determine if the stressors affected performance on the fitnessfor-duty test. He underscores the need for a field (vs. labor atory) study because a performance test must be "sensitive to the factors you care about" but immune to factors like light and noise in the testing room, "which are the reality in the testing situation". Indeed, an investigation he and his colleagues con ducted suggests that distractors in the test-taking environment can seriously undermine the usefulness of a computerized fitnessfor-duty test: "[T]asks that are intended to measure fitness-for-duty must be located in an area that is free from distraction so that the measured behavior is not confounded by other mental activity (Attwood, Nicholich, & Muise, 1994, p.5)."
The study I will now present does involve real employees who take fitness-for-duty tests under actual workplace conditions. Because I did not conduct an experiment, but obtained data less obtrusively, my methodology is less rigorous than that used by G illiland and Schlegel or by the TCRP (or, presumably, that required to satisfy Attwood). Nevertheless, this study, unlike those of these other investigators, is better able to provide insights into how well fitness-for-duty programs work in real organiza tions and how real employees view fitness-for-duty tests.
Archival data from two manufacturers of fitness-for-duty Tests
Two prominent commercial fitness-for-duty tests are the Essex Company's Delta-WP and Performance Factors Inc.'s Factor 1000.
Delta-WP has been carefully developed according to sound psychological testing theory. Research by Turnage and Kennedy (Turnage & Kennedy, 1992; Turnage, Kennedy, Smith, Baltzley, & Lane, 1992) documents the extensive empirical evidence of the reliabi lity, stability, and construct and predictive validity (for risk factors) of the tests, which involve selecting on one's keyboard the correct answer about the images on one's screen. The seven tests in the Delta-WP repertoire that measure different facto rs important for cognitive, perceptual, and psychomotor skills are combined into various batteries to match the skill and educational levels of the test-taking population at a given client company (Delta-WP Manager's Manual, 1994; personal communication, Janet Turnage, 9/28/94). A test-taker's score is the sum of the scores on the (usually) three subtests in the battery.
According to David Parry, Manager of Information Systems and Product Manager of Delta-WP at the Essex Corporation (personal communication, 10/11/94), Delta-WP is measuring impairment broadly and generally, and that the capabilities of Delta-WP to use j obspecific test batteries are therefore neither especially necessary nor practical (for clients who would have to pay for a job analysis). In essence, he asserts that the test has risk-factor criterion validity, but makes no claims about its job-relevant criterion validity. Indeed, the Delta-WP Manager's Manual (1994) clearly specifies that empirical research on the relationship between an employee's test score and his or her performance on the job has not yet been conducted. Only an indirect relations hip between test score and future on-the-job performance is suggested in the Manual, which posits that the implementation of Delta-WP in a workplace may contribute to a reduced number of accidents because a) impaired employees who are detected will not ha ve an opportunity to make mistakes or perform in a hazardous fashion and b) the knowledge that they will be taking a fitness-for-duty test will discourage employees from arriving at work in an impaired state.
A passing score for an employee takes into consideration the employee's own baseline as well as the passing score standard set by the compan -- ybased on tradeoffs between costs of preventing employees who are actually fit from working on a given day ( false positives) and costs of potential damage caused by employees who pass but are not fit to work (false negatives). The client works with the test manufacturer, Essex, to determine, and make adjustments to, this standard, thus meeting Gilliland and Sc hlegel's (1993) recommendation of consumer involvement in standard-setting.
The passing score standard is set depending on the employer's comfort level with test scores that deviate a little vs. a lot, such that a passing score for a given employee indicates that the employee is performing within n% of his or her baseline.
Parry (personal communication, 10/11/94), echoing Gilliland's (1994) views on falsing, asserts that it would be difficult for an employee to deflate performance artificially on the baselinesetting trials so as to pass later when impaired, because it wo uld be too tough to repeat the pattern of continued depression.
Because it is highly unusual for a test-taker to give more than a very few incorrect responses per test, doing so would alert test administrators. Thus, a test-taker would have to depress his or her response time, which, according to Parry, would be t ricky.
Although there is no research specifically addressing the use of Factor 1000, a particular version of the critical tracking test, Performance Factors, Inc. asserts that it acquired the license for the critical tracking test in 1988, maintaining its val idity and embellishing upon its applicability for the workplace with the Factor 1000 system (Factor 1000 Fitness-for-Work Assessment Program Technology Overview and Technical Validation, 1991-1994). There is, indeed, much empirical evidence of the risk-f actor criterion validity of the critical tracking test (Allen, Jex, & Stein, 1984; Allen, Stein, & Jex, 1981; Allen, Stein, & Miller, 1990; Belleville, Dorey, & Bellville, 1979; Burns & Moskowitz, 1980; Jex, 1988; Klein & Jex, 1975; O'Hanlon, 1981; Stolle r & Bellville, 1976). Evidence of its job-related criterion validity is considerably sparser (but see Allen, Stein, & Jex, 1981).
At companies using Factor 1000, a test-taker sits in front of a video screen showing a cursor that is beneath a target area and surrounded by a boundary marker on each side. The test-taker uses the control knob to try to keep a randomly careening curs or centered between the markers. Each time the cursor is returned by the test-taker to its position between the markers, it accelerates, until, finally, the test-taker can no longer control it. As the brief testing session progresses, it becomes increas ingly difficult to control the cursor. In a sense, controlling the cursor is akin to trying to stay perched atop a bucking bronco that bucks more vehemently as the rider persists. Scoring depends on how far the cursor veers from the center each time, ho w long the test-taker takes to regain control of the cursor each time, and how long the cursor is kept centered. A passing score is one that meets or exceeds the employee's baseline in at least one of eight attempts (Factor 1000 Fitness for Work Program: Common Questions and Answers, 1993). Because a test-taker's baseline is set so that he or she has a 60% chance of passing any given trial, there is only a .06553% probability of his or her passing none of the eight trials, according to Marc Silverman, Chief Technology Officer of Performance Factors, Inc. (personal communication, 11/21/94).=20 Silverman adds that the computer algorithm can detect if an employee is purposely underperforming, and will alert test administrators accordingly.
An employee who has not taken the Factor 1000 critical tracking test within seven days (e.g., during a leave of absence for a reason unrelated to impairment) has several practice trials to "refresh" his or her skills at the test (Factor 1000 Fitness fo r Work Program Policy and Procedure Development Workbook, 1994).
In contrast with Essex, the manufacturer of Delta-WP, Performance Factor, Inc. does claim that Factor 1000 has jobrelated validity and advises the test be administered only to those performing work requiring hand-eye coordination: Hand-eye coordination is a basic requirement for jobs which involve manipulating or controlling equipment or devices with the hands or feet. Therefore, hand-eye coordination is the job function tested through Factor 1000 (Factor 1000 Fitness for Work Program Common Questions and Answers, 1993, p. 5).
Nonetheless, Silverman has displayed greater confidence in Factor 1000's risk-factor validity than its job-related validity, telling me that not meeting one's baseline means that one is at risk, not that one will necessarily have an accident (personal communication, 11/21/94).
Because Performance Factors, Inc. believes that performance on the Factor 1000 test can predict work performance, its client companies are encouraged to set specific productivity and safety goals so that the impact of Factor 1000 can be assessed (Facto r 1000 Fitness for Work Program Development Workbook, 1994).
Performance Factors also asserts that use of Factor 1000 deters risky behavior because employees know they will have to pass a performance test (Factor 1000 Fitness for Work Program Common Questions, 1993).
The job-related criterion validity of Factor 1000 also requires Performance Factors, Inc. to address the situation in which an employee who fails to meet his or her personal baseline, but outperforms a co-worker, is considered impaired: Workers learn t o perform their jobs given the set of skills each possesses, and if a critical skill is missing they are not able to compensate (Factor 1000 Fitness for Work Program Common Questions and Answers, 1993, p. 14). So, even if Paul outperforms Richard, if Paul does not meet his baseline, he is not deemed fit to perform because he is unaccustomed to working under these conditions. If Richard meets his baseline, even though he underperforms Paul, he is allowed to perform a safety-sensitive job because he will b e working under skill conditions that are normal for him.
Interviews with former users of fitness-for-duty Testing
It is instructive to consider the experiences of former users of Factor 1000. (There are no former users of the Delta-WP, a product that has only very recently been available for commercial use.) Newspaper articles from a few years ago reported on ot her Factor 1000 customers, some of whom, I discovered, had discontinued their testing programs.
One former Factor 1000 customer had gone out of business and another had, according to the EAP representative, stopped administering Factor 1000 due to union problems (I was unable to contact the former program administrator, who had since left the org anization). But I was able to ask the general managers of 1) a small Midwestern tool-and-die company, 2) his counterpart at a sister organization, a tire-and-wheel distributor, and 3) a West Coast transportation concern why their companies had discontinu ed using Factor 1000 and how they, their managers, and their employees had viewed fitness-for-duty testing.
In the case of two of the companies, it was not possible to determine whether using Factor 1000 had improved safety or productivity because no "before" measures had been taken. At the third, the safety record while Factor 1000 was in place was not sig nificantly better than that of other branches of the company, which were not using Factor 1000.
The three general managers described the following types of problems: Logistical glitches. One general manager complained that reassigning employees to non-safety sensitive work and finding replacements for them stressed his operations: "Having to p lace the one who failed elsewhere created grief." Another problem was that daily testing took a lot of time because there were not enough terminals (which are expensive). Because employees could not start working before their test, time was wasted. Sta ff time was also consumed by monitoring the testing process. Additionally, some employees have schedules not conducive to testing. E.g., salespeople ordinarily do not report to the office until they have made their morning calls in the field, and long-d istance truckdrivers work odd shifts. Validity concerns. One general manager asserted that employees who felt and behaved normally could not always pass the test, even if they thought they were doing well. Another reported that some days, the program se emed much harder than others. On "easy" days, he explained, the cursor moved very little; on "hard" days, it moved all over. Employees at these organizations grew to believe that whether they passed or failed depended on chance. And one doubtful superv isor, who experimented during his off hours by trying out the test when he was inebriated, concluded that the test lacked validity after he passed it in his impaired state. Challenges to privacy. Some employees perceived testing as an imposition, rather than as a boon for their welfare. Those with excellent performance records complained about having to submit to daily testing. Further, people knew who failed and there was a stigma attached to failing. Test-taking anxiety. Respondents told me that som e older employees had computerphobia. (Indeed, Sharit and Czaja, 1994, observed that older people may have difficulty learning computer skills.)
Additionally, some other employees, regardless of their age, lost sleep worrying about their daily duel with the computer. Although Burke, Normand, and Raju (1987) found that attitudes toward computer-administered tests were generally positive, it sho uld be noted that their sample consisted of clerical office workers -- a group that does not ordinarily perform safety-sensitive work and is therefore not ordinarily made to take a fitness-for-duty test. Supervisors. Not only did supervisors dislike deal ing with subordinates' complaints, but they thought the testing program preempted their authority. They resented that the test result prevailed as the deciding factor even if they deemed their subordinates fit for work.
When asked to account for these complaints, PFI's Chief Technology Officer Marc Silverman (personal communication, 11/21/94) countered that technology (i.e., the hardware and software of the test itself) is less important in the success of Factor 1000 than how management implements the testing program.
Having consulted with clients to design appropriate policies and procedures, he believes that organizations that are concerned about their employees' safety and welfare and have amicable labormanagement relations generally have greater success with an impairment testing program. He seems to attribute failures of organizational testing efforts to faulty client implementation rather than to any flaws with his company's product.
|