Guardians Ass'n of the New York City Police Department v. Civil Service Commission

484 F.Supp. 785 (1980)

The GUARDIANS ASSOCIATION OF the NEW YORK CITY POLICE DEPARTMENT, the Hispanic Society of the New York City Police Department, Nydia I. Diaz, James Michael Hidalgo, Wilfred Cebollero, Andre Lopez, Reinaldo Salgado, Denise Santos, Deborah Holmes, and Pamela Obey, Individually and on behalf of all those similarly situated, Plaintiffs,
v.
CIVIL SERVICE COMMISSION OF the CITY OF NEW YORK, Department of Personnel of the City of New York, and the New York City Police Department, Defendants.

No. 79 Civ. 5314.

United States District Court, S. D. New York.

January 23, 1980.

*786 *787 The Puerto Rican Legal Defense and Education Fund, Inc., New York City, for plaintiffs; Kenneth Kimerling and Peter A. Bienstock, New York City, of counsel.

Robert B. Fiske, Jr., U. S. Atty. for the Southern District of New York, New York City, Drew S. Days, III, Asst. Atty. Gen., Civil Rights Div., U. S. Dept. of Justice, Washington, D. C., for amicus curiae United States of America; Nancy E. Friedman, Richard N. Papper, and Dennison Young, Jr., Asst. U. S. Atty., New York City, David L. Rose and Steven H. Rosenbaum, Attys., U. S. Dept. of Justice, Washington, D. C., of counsel.

Ira H. Leibowitz, Garden City, N. Y., for amicus curiae Policewomen's Endowment Association.

Allen G. Schwartz, Corp. Counsel of the City of New York, New York City, for defendants; Judith A. Levitt, Maureen McCabe, New York City, and Steven Goldberg, Brooklyn, N. Y., of counsel.

REVISED OPINION[1]

ROBERT L. CARTER, District Judge.

Summary of Proceedings

On June 30, 1979, Examination No. 8155 was given to provide an eligibility list of applicants for appointment as police officers on the New York City police force. Approximately 36,797 applicants took the examination. The police department had indicated that it could absorb an eligibility list of some 12,000 candidates over the next four years. Because the 12,000th candidate scored 94, that became the cut off point for placement on the eligibility list. Roughly 13,000 persons qualified with scores of 94 or above. The city proposes to select applicants for the police force by their ranked order of combined examination scores and veterans' preference points until the list is exhausted. In November, 1979, defendants hired 415 police recruits from the eligible list: 318 white males, 49 white females, 11 black males, 3 black females, 29 hispanic males, 2 hispanic females and 3 oriental males.

Plaintiffs, the Guardians Association of the New York City Police Department, Inc., the Hispanic Society of the New York City Police Department, Inc. and Nydia I. Diaz, James Michael Hidalgo, Wilfred Cebollero, Andre Lopez, Reinaldo Salgado, Denise Santos, Deborah Holmes and Pamela Obey, individual black and hispanic applicants who either received a passing grade too low on the eligibility list to afford an expectation of appointment in the foreseeable future, or who scored below the qualifying 94, instituted this litigation. They challenge the legality of the examination and seek a preliminary and permanent injunction barring the planned use of the test's results. Defendants, the Civil Service Commission, the Department of Personnel and the Police *788 Department of the City of New York, are responsible for the preparation, administration and use of the examination.

As indicated, the eligibility list has already been utilized, and defendants propose to hire another class of recruits from the list on Monday, January 14, 1980. With the parties' consent the hearing on the preliminary injunction was consolidated with a trial on the merits which took place on November 13, 14 and 15.

At the hearing defendants presented testimony indicating how the examination had been devised and structured. Plaintiffs presented similar testimony by individual participants in the preparation of the test, as well as evidence concerning the adverse effect on minorities of the examination and of use of an eligibility list with ranking based on the candidates' scores. Both sides presented expert testimony concerning the test's validity. The parties filed their briefs in support of their contentions on or about December 3, 1979, but neither side filed proposed findings of fact and conclusions of law until on or about December 13, 1979. On December 10, 1979, the government filed a brief amicus curiae asserting that Examination No. 8155 was unlawful, and thereafter participated as amicus curiae in the proceedings.

On December 17, 1979, the court advised the parties orally that it was of the view that the examination and the rank order selection of designated eligible candidates violated Title VII of the Civil Rights Act, 42 U.S.C. § 2000e et seq., and that no appointments from the list in ranked order should be made. The court anticipated that this oral indication of its determination would enable the city to make temporary adjustments in the selection process to bring the contemplated January appointments into compliance with the law by taking affirmative action to eliminate discrimination against blacks and hispanics. At a subsequent hearing on defendants' order to show cause and motion for reconsideration, counsel for defendants stated that defendants would not modify the selection process by the appointment of black and hispanic applicants out of rank order, and that they were determined to utilize the discriminatory listing. The matter was then heard by the Court of Appeals on defendants' petition for a writ of mandamus. The Court of Appeals ordered this court not to bar the city from making the contemplated appointments without findings of fact and conclusions of law being filed in compliance with Rules 52(a) and 65(d), F.R.Civ.P. The court was further ordered to refrain from enjoining defendants from selecting police recruits from the existing list until 48 hours after such findings and conclusions had been filed.

On January 11, 1980, a hearing on relief was held. The Policewomen's Endowment Association was allowed to intervene as amicus curiae. Plaintiffs offered testimony on the black-hispanic make-up of the relevant labor pool from which applicants for employment with the New York City police force were expected to come. Defendants have qualified for immediate appointment on January 14, 1980, some 380 candidates, of whom 280 are white males and 62 are white females, 9 are black males and 5 are black females, 20 are hispanic males and 3 are hispanic females, and 1 is an oriental male.

Preparation of Examination No. 8155

The development of the test and the job analysis employed was described at the November hearings by Esther Juni, administrative staff analyst of the department of personnel and chief of the police unit of the criminal justice task force. Juni disclaimed expertise in both testing and police work. This was only the second entry level examination she had ever worked on, the first being an examination for police administrative aides manning the 911 system where the skills tested were basically clerical and communicative. Moreover, no outside experts in testing had been brought in to aid defendants in devising a test in compliance with the Uniform Guidelines on Employee Selection Procedures ("Uniform Guidelines") promulgated by the Equal Employment Opportunity Commission, Department of Justice, Department of Labor and the *789 United States Civil Service Commission, 5 C.F.R. § 300.103(e), 28 C.F.R. § 50.14, and 29 C.F.R. § 1607, effective September 25, 1978.

Juni's testimony was hardly a model of clarity. It was often confusing, contradictory and obscure. However, from her testimony and the report outlining the procedures followed in job analysis and in preparation of Examination No. 8155, which I believe she testified she prepared or whose preparation she supervised, the following facts were gleaned.

Preparation of the test began in the fall of 1978 when, as a first step, the department of personnel asked the police department to supply 10 staff analysts to interview 49 police officers and 49 of their supervisors. These officers, in Juni's words, were chosen on the basis of "good job performance record, their ethnicity, the sexual composition (sic), types of duty they performed, the tours they worked on, the boroughs they worked in and it was made to be a comprehensive analysis of the positions of the police officers" (Tr. 9-10). The officers were asked to identify the tasks performed by police officers, their frequency and importance, the amount of time spent thereon, and the knowledge, skills and abilities (labeled KSA's)[2] necessary to perform each identified function. By this process some 71 tasks were identified.

Presumably this group of officers was chosen as "job knowledge experts,"[3] but except that they were apparently selected to comprise a cross section of good performers, we do not know their level of expertise or indeed whether they qualify as job knowledge experts[4] as that term is used in the department of personnel manual on job analysis ("Procedures"). There is no evidence that Juni or anyone else examined the officer-consultants on this question before asking them to define the tasks police perform.

Next, a panel of police officers was chosen to refine the identified tasks further. Again, except that the panel was chosen by officers on the police force, using the same criteria Juni described above, we were not advised whether the panel members possessed any special job knowledge expertise warranting their selection over others or even insuring their professional competence to do the task assigned. At any rate, the panel reduced the 71 identified tasks to 42. Thereafter a questionnaire was sent to 5,600 officers for their rating of the 42 tasks in terms of frequency, time spent and level of importance. Some 2,500 responses were received. At the same time, observers from John Jay College, watching police officers at work, recorded their observations of 28 of the identified 42 tasks. These observations were then correlated with the officers' responses to the questionnaires as to the frequency with which tasks were performed. The questionnaires were fed into a computer programmed to take data on time, frequency and importance, average them and produce a list of the 42 tasks in hierarchical order based on these three factors.

The tasks were next divided into 5 clusters of tasks "that seem to us to go together on a rational basis," "that looked somewhat similar," "had some kind of common factor" (Tr. 19). A panel of officers knowledgeable *790 about the tasks involved was chosen for each cluster. If, for example, one of the clusters was the arrest process, police officers very familiar with the arrest process, "meaning they were in a precinct that maybe made a large number of arrests, etc." were chosen (Tr. 20). These panels were to determine the KSA's needed for each task within the cluster. Again the job knowledge expertise of these people is not identified or defined, nor was that factor determined by Juni or anyone else from the department of personnel. Juni's report refers to this group as "job knowledge experts" without further elaboration. The KSA's to be assigned to each task by the panels were characterized as filling out forms; human relations, including communication techniques; understanding written instructions and applying appropriate procedures; comprehending and applying sections of the law; and recalling details and facts. The KSA's were never more specifically defined, although the human relations KSA's were described as interacting with people.

The 5 panels never exchanged ideas or concepts, and we do not know whether the description and identification of the KSA's were the same or different from panel to panel. The KSA's were then rated by the 5 panels as to the relative importance each had in the successful completion of each task within the assigned cluster. John Galea, one of the officers detailed to one of the panels assigning the relevant KSA's to each task, testified, however, that his group made no effort to differentiate the human relations KSA components assigned to different tasks. The panels' conclusions were fed into a computer which multiplied the weight of each task by the weight of the KSA's necessary to accomplish it. Based on the weighted result, the test plan was devised with an appropriate percentage of the examination being allocated to each of the 5 KSA's.

Then police officers were asked to propose questions covering the 5 KSA's. According to Juni the police department was asked to choose for this assignment "police officers who were familiar with the job of police officer" (Tr. 24). Some 10-11 officers were given this responsibility. They were not shown the KSA lists or the tasks with KSA weight accorded to them. Gerald Burkhalter, one of this group, testified that he was asked by an officer in the personnel bureau of the police department to participate. He was told that they were looking for people of various ethnic groups and that he had patrol experience and a college background. Presumably these factors and his being black qualified him for the assignment. Between December, 1978 and June, 1979, he spent each Friday writing questions. The first two months he wrote questions based on materials from the Police Academy. He was not shown the tasks until he was over two months into the assignment and never saw the KSA's. He stated that he had raised the issue of the adverse impact of written tests on minorities several times with Juni and in the equal opportunity office of the police department where he worked (Tr. 408). Although police officers proposed some of the questions, a good proportion of the questions were written by employees of the department of personnel.

A panel of police officers reviewed the questions to determine whether the police procedures described in the examination were consonant with current practices. Nicholas Estavillo, asked to work with this panel, joined it at its second meeting. The group was told what the questions dealt with, and discussed whether there was any conflict between the questions and police department procedures. Estavillo concluded that what was being tested was reading comprehension, and expressed this view to Juni and others on several occasions (Tr. 440). Department of personnel employees reviewed all of the proposed questions to make certain they were stated clearly, simply, and grammatically, but these employees had no clear idea of what KSA's the questions they clarified and simplified purported to test. There was no final review by police or department of personnel officials to determine whether the questions as finalized actually tested for the KSA's they purported to measure.

*791 The examination was divided into 100 multiple choice questions. Questions 1-15 dealt with recalling details and facts; questions 16-24 with filling out forms; questions 25-38 with comprehending and applying sections of the law; questions 39-62, 64-68, and 71-74 with understanding written instructions and applying appropriate procedures; and all the rest with human relations, including communications techniques. It was decided not to pre-select a passing score, but to qualify the 12,000 top scores.

Juni testified that she did not measure police work behavior but, rather, the KSA's necessary to perform a police officer's functions; that the test did not mirror the tasks of officers on the job, but the KSA's needed for the job (Tr. 48). She conceded that the work product that is the police officer's basic function was not completely described, and testified that while some of the tasks concern work product, work product must be inferred from others (Tr. 50). When asked for an operational definition of the KSA's, she replied that the KSA's are themselves simply stated. The KSA's refer to one or many tasks, and "the tasks themselves are operational definitions in the sense that the tasks themselves are things" (Tr. 51).

At one point we were told that none of the 42 tasks relate to training at the academy (Tr. 67); subsequently we were advised that the police recruits are trained in the tasks at the academy, and that the 42 tasks had to be used as a basis for isolating the KSA's (Tr. 68). What was being tested, Juni said, was whether the candidate could apply the information given him. Procedures were taught at the academy, but not how to apply them. She conceded, however, that if "you understand procedures, you know the way it is applied" (Tr. 70).

After stating what the 5 KSA's were said to be, she could describe them no further. Police officers were the ones who knew that. They also were the ones who could tell us whether the level of difficulty and complexity or abilities needed differ among tasks. She could not tell whether the examination questions actually measured the relevant KSA's. If the police approved a human relations question, that meant they believed the question measured the human relations KSA's, but she could not independently verify this.

The KSA's were identified by one group, questions were written by a second group, and evaluation was done by a third group. No panel was shown the basis for another panel's KSA determinations, because it was felt that the stated KSA's would be understandable to police officers without further elaboration. The KSA's were said to differentiate the good from the bad officer, but how that was done was not explained.

The next witness, Dr. Jay Finkelman, defendants' expert, was impressive, but his central thesis was unpersuasive. Dr. Finkelman argued that the test was content valid, and that one could validate a test for content if it tested for work behavior or for the KSA's necessary for work behavior. He defined the KSA's as present capacity to perform a certain function and skills requisite to work behavior (Tr. 138), but he could not say whether the various panels involved had been operating pursuant to his definition (Tr. 141). While he did not completely understand what a particular task entailed, or what the police officer meant in listing the KSA's, he asserted that he, as an expert, could testify as to the test's content validity (Tr. 141), as long as he understood the procedures through which the KSA's were derived, and the manner in which the KSA's determined the final test (Tr. 209).

He agreed that a content validation model was inappropriate for testing constructs (Tr. 149), but argued that the Uniform Guidelines were not violated by the test's overlapping the training the successful candidate would receive prior to employment. Moreover, he could not see how what was being tested could be separated from training, since the tasks comprising the test were common tasks performed by the police on the job. The skills measured — for example the human relations KSA's — were not the human relations skills themselves, but a procedure testing ability to follow guidelines *792 and instructions (Tr. 197); thus, he concluded, the examination was really a paper and pencil test of procedure-following ability.

Prudence Opperman testified that she had analyzed the test to determine its level of reading difficulty. Based on two formulas (Dale-Chall and Fry), she concluded that the level of reading difficulty was no higher than the 12th grade (Tr. 270). On cross examination, she conceded that some of the questions did require college level reading skills (Tr. 286-289), but stated that people reading at the 12th grade level can handle some material at the college level. She characterized this as "loading" (Tr. 283).

Plaintiffs' two experts, Dr. Richard Barrett and Dr. James J. Kirkpatrick, both of whom helped draft the Uniform Guidelines, testified that the examination was akin to an aptitude test and that it had not been shown to be content valid. They testified that it did not measure job skills, but the skills of following given instructions and applying them to a hypothetical situation. The test measured constructs, not work behavior; further, it was the type of test that typically disfavors minorities. Measured against the standards of the American Psychological Association ("APA") and the Uniform Guidelines, the test was said to be deficient.

They both contended that it had not been established that the examination measured abilities which differentiate a good from a poor police officer. For that reason, they felt rank order selection was inappropriate. While conceding the difficulty of constructing a test for police officers that could be validated as having job relatedness, they both testified that if tests such as this were used, the adverse impact on minorities should be eased by maintaining separate lists for blacks, hispanics and whites, and selecting from these separate lists in a way that would insure equal opportunity to the minorities. They were the more credible witnesses.

The Examination Results

The test as given presented a series of issues or problems that police purportedly met on the job and for which the applicants were asked to select the correct responses. Of the 36,797 persons taking the test, 16.7% (6,142) were black, and 14.2% (5,239) were hispanic. 37% (13,749) of all applicants qualified with scores of 94 or above. Of this group only 7.6% (1,046) were black and 7.8% (1,074) hispanic. The list is roughly 84% white. Thus, although blacks and hispanics comprised 30.9% of those taking the test, they constituted only 15.4% of those on the eligibility list. To put it another way, 46% of the white applicants (9,094 of 19,798) made the eligibility list with scores of 94 or above, while only 17% of the black applicants (1,046 out of 6,142) and 20.5% of the hispanic applicants (1,074 out of 5,239) were similarly successful. Plaintiff's Ex. 3, applying standard deviation analysis, reveals that the disparity between the number of blacks and hispanics who could be expected to qualify on Examination No. 8155 and the number who actually qualified is some 39 standard deviations.

Hearing on Relief

On Friday, January 11, a further hearing was held on relief. Plaintiffs presented, through Dr. Mark Killingsworth, a labor economist, evidence based on Survey of Income and Education, 1976, United States Census Department ("SIE"), to the effect that the relevant labor pool of those who by virtue of age, residence and education would be expected to be applicants for the job of police officer was 34.9% black and hispanic. The geographic area used by Dr. Killingsworth was New York City, Rockland, Westchester, Nassau and Suffolk Counties. The group included persons aged 17-29 with 12 or more years of education. He utilized the same geographic base as the area from which those who took Examination No. 8155 came, except that he excluded Orange and Putnam Counties. However, only .31% of the applicants who took the test at issue here came from Orange and only .10% came from Putnam. Defendants object to the pool as defined. They presented no evidence to the contrary, but requested permission to file whatever documents they desired to present after the *793 hearing. Defendants have filed an affidavit of Martin Oling.[5]

Determination

I

Section 4D of the Uniform Guidelines states that any "selection rate for any race, sex or ethnic group which is less than four-fifths ( 4/5 ) (or eighty percent) of the rate of the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact. . . ." See, e. g., United States v. City of San Diego, 20 E.P.D. ¶ 20,154 (S.D.Cal. 1979); United States v. City of Montgomery, 19 E.P.D. ¶ 9239 (M.D.Ala. 1979). The Uniform Guidelines are, of course, entitled to deference and should be followed, unless the employer demonstrates some cogent reason to the contrary. Cf. Griggs v. Duke Power Co., 401 U.S. 424, 433-34, 91 S.Ct. 849, 854-55, 28 L.Ed.2d 158 (1971); Albemarle Paper Co. v. Moody, 422 U.S. 405, 431, 95 S.Ct. 2362, 2378, 45 L.Ed.2d 280 (1975); United States v. City of Chicago, 549 F.2d 415, 430 (7th Cir.), on remand, 437 F.Supp. 256 (N.D.Ill.), aff'd, 567 F.2d 730 (7th Cir. 1977), cert. denied, 436 U.S. 932, 98 S.Ct. 2832, 56 L.Ed.2d 777 (1978), 434 U.S. 875, 98 S.Ct. 225, 54 L.Ed.2d 155 (1977). Moreover, the Supreme Court in Hazelwood School District v. United States, 433 U.S. 299, 308 n.17, 97 S.Ct. 2736, 2743 n.17, 53 L.Ed.2d 768 (1977), and Castaneda v. Partida, 430 U.S. 482, 496 n.17, 97 S.Ct. 1272, 1281 n.17, 51 L.Ed.2d 498 (1977), has recognized that "if the difference between the expected value and the observed number is greater than two or three standard deviations then the hypothesis that the [selection] was random would be suspect to a social scientist." Ibid. Here, the uncontroverted evidence is that disparity between the number of blacks and hispanics expected to qualify on Examination No. 8155 as opposed to the number who qualified in fact is some 39 standard deviations. As the Court stated in Castaneda such a difference is statistically significant and could not have occurred by chance. This result compounds and perpetuates a pattern of disparity in minority group representation on the New York City police force. See Def. Ex. A-1 (Report of Test Development), Tables 5 and 6 at 15, which reveal that as of June 30, 1978, of some 20,926 members of the police force, 1,860 (8.9%) were black, and 799 (3.8%) were hispanic. Census figures for 1978 show that the relevant labor force of the New York Standard Metropolitan Statistical Area (SMSA) is more than 30% black and hispanic. See Geographic Profile of Employment and Unemployment: States 1978, Metropolitan Areas 1977-78, September 1979, Bureau of Labor Statistics.

*794 Where it has been shown, as here, that the selection process qualifies applicants for hire or promotion in a racial pattern significantly different from that of the pool of applicants, see McDonnell Douglas Corp. v. Green, 411 U.S. 792, 802, 93 S.Ct. 1817, 1824, 36 L.Ed.2d 668 (1973), a prima facie case of discrimination has been established. Albemarle Paper Co. v. Moody, supra, 422 U.S. at 425, 95 S.Ct. at 2375. A prima facie case may be established by evidence of statistical disparities alone. Dothard v. Rawlinson, 433 U.S. 321, 329, 97 S.Ct. 2720, 2726, 53 L.Ed.2d 786 (1977). The burden then shifts to the employer to demonstrate that the process used is job related or, to put it more precisely, has a "manifest relationship to the employment in question." Griggs v. Duke Power Co., supra, 401 U.S. at 432, 91 S.Ct. at 854; Guardians Association v. Civil Service Commission of New York, 431 F.Supp. 526, 538 (S.D.N.Y.) (Carter, J.), vacated and remanded on other grounds, 562 F.2d 38 (2d Cir. 1977) ("Guardians II"), on remand, 466 F.Supp. 1273 (S.D. N.Y.1979) (Carter, J.), app. pending, No. 79-7377 (2d Cir.) ("Guardians III"); Vulcan Society of the New York Fire Department v. Civil Service Commission, 360 F.Supp. 1265, 1268 (S.D.N.Y.) (Weinfeld, J.), aff'd in part, remanded in part on other grounds, 490 F.2d 387 (2d Cir. 1973). If the employer meets this burden, the plaintiff is then entitled to show that alternative processes would meet the employer's needs without the adverse racial impact that the complained of process inflicts on the minority group. Albemarle Paper Co. v. Moody, supra, 422 U.S. at 425, 95 S.Ct. at 2375.

II

We now examine the record to determine whether the test processes employed meet the required standard of a "manifest relationship" to the job of police officer in the City of New York. Since 1972, when Title VII first became applicable to municipalities, the police department of New York City has been under attack in courts for selection procedures alleged to be discriminatory against blacks and hispanics, see Guardians Association of New York City Police Department, Inc. v. Civil Service Commission, 72 Civ. 928 (S.D.N.Y.) (Ryan, J.), aff'd, 490 F.2d 400 (2d Cir. 1973) ("Guardians I"); Guardians II, supra; Guardians III, supra; and against women, see Acha v. Beame, 401 F.Supp. 816 (S.D.N. Y.1975), rev'd, 531 F.2d 648 (2d Cir. 1976), on remand, 438 F.Supp. 70 (S.D.N.Y.1977) (Duffy, J.), aff'd, 570 F.2d 57 (2d Cir. 1978). Examination No. 8155 is the first examination for entry level police officer given since 1973, and the above cases surely put city authorities on notice that this examination would be tested under Title VII strictures.

One of the accepted indicia of a lack of job relatedness is proof of a flawed job analysis. In Vulcan Society v. Civil Service Commission, supra, 490 F.2d at 396, this circuit indicated that the poorer the preparation of the test, the greater the need to demonstrate its "manifest relationship" to the job.

In Guardians II, the job analysis for examinations given prior to 1972 had come under heavy attack. Preparation for Examination No. 8155 began in the fall of 1978. At about that time, the department of personnel published Procedures, a guide setting out the requisites of job analysis for an examination which would conform to APA's professional standards and satisfy the Uniform Guidelines. Esther Juni was given the responsibility for designing the 1979 test. She took some of the steps required by Procedures, APA standards and the Uniform Guidelines but deviated therefrom in some respects. Unfortunately, her departures fatally flawed the test's preparation and in essence reduced that effort from the required professionalism to the level of a homemade test.

The most fatal flaw of all is that Juni as the test designer lost control of various critical aspects of the test's preparation. She went to police officers for help in defining the work behaviors and abilities in a policeman's job performance. While Procedures approves of this practice, it is made clear that the test maker's responsibility is to verify for herself that the KSA's or *795 behaviors to be tested are in fact those actually performed on the job. The police were asked to furnish officers to describe the tasks performed, to define the abilities needed and to rate them in terms of frequency, importance, and time spent, and to frame appropriate questions. She requested that these officers constitute a fair cross section of the department and that past good performance records be a basis for selection. No one from the department of personnel, however, seems to have reviewed the selection to determine whether the officers were indeed "job knowledge experts" for the assigned tasks. The casual manner in which at least one of the participants was chosen (see Burkhalter's testimony) leaves some question as to what standards the police were using. At any rate, the test maker cannot demonstrate that the consultants utilized were suited for the assignment.

While this deficiency might be considered de minimus had the test maker been in control of actually defining what police officers' work behaviors encompass, the latter task was left entirely to the selected officers. At trial defendants could not define the governing KSA's except to tell us what the police said they were. Counsel for defendants, Juni and Finkelman contended that it was sufficient for the police to know what the KSA's were. Neither they, nor we, had to be enlightened on that issue. That proposition must surely be incorrect. It is not possible for the test maker to design an examination to measure work behaviors, or in this case KSA's, that she cannot explain, analyze or define. She must know that the KSA's identified by those with knowledge of the job actually relate to a substantial portion of the job in issue. Procedures would seem to require that the test designer, not the work knowledgeable consultants, be in charge of analyzing the KSA's and work behaviors in structuring the test. Here much of the test's preparation was done by various teams of police officers without the test maker understanding precisely what had been done.

The Uniform Guidelines, § 14C(2), require that a "job analysis focus on work behavior(s) and the tasks associated with them." The 71 tasks initially identified and subsequently reduced to 42 were not delineated with precision; e. g., interacts with the community and provides information and assistance to the public; performs stop and frisk of appropriate individuals; interacts with juveniles in non-arrest situations are vague definitions and do not provide the test maker with a mirror image of the police officer's work behavior. Nor was the process of identifying the KSA's sufficiently controlled to insure that they were "operationally defined" in terms of observable aspects of work behavior on the job. See Uniform Guidelines, § 14C(4) and Procedures, passim.

In addition, the five KSA's are not denoted with specificity; and the KSA's most imprecisely defined of all — human relations, including communication techniques — were scored as constituting 30% of a police officer's needed abilities and skills. These KSA's must entail oral communication at least in some degree. However, no oral examination was contemplated to test this aspect of the human relations KSA's. In such a case, Procedures suggests the KSA's must be eliminated from consideration. This was not done, and the examination must be rated unreliable by department of personnel's own yardstick.

Finally, it is not clear at what level of employment — entry or later — a police officer performs the tasks and uses the KSA's reflected in the examination. The job analysis done here did not even conform to professional standards of the department of personnel's own guide, Procedures. Thus the "manifest relationship" of Examination No. 8155 to the work behavior of police officers on the New York police force is, at best, highly suspect.

III

There was a consensus among the parties' experts that in operation, the examination was a paper and pencil test that required applicants to read and comprehend a set of *796 facts and multiple choice questions about those facts and to select a correct answer from the possibilities presented. Defendants' witnesses testified that the test was designed to measure the KSA's necessary to function adequately as a police officer, and not to determine whether applicants possessed those abilities requisite to successful training for the job. Thus, it was argued that the test possessed content validity. Yet all successful applicants undergo a period of specialized training before actually assuming any duties. There was some unwillingness by defendants to concede that the tasks defined encompassed those tasks for which the successful candidate would be given training before being assigned to duty as a member of the police force, but all 42 tasks seem to speak to some aspect of the ongoing functions of policemen. Hence, the applicant must secure training in substantially all of the designated tasks.

Validation of the test as content valid is accordingly inappropriate. The Uniform Guidelines, § 14C(1) state that content validation is not appropriate when the process of selection "involves . . . abilities which an employee will be expected to learn on the job." Moreover, a content validation model may be utilized only when the test corresponds closely to those tasks that are actually performed on the job. United States v. City of Chicago, 573 F.2d 416, 425 (7th Cir. 1978). Neither Finkelman nor Juni contended that the test measured on the job behavior of police officers. It was agreed that, at best, the test gauged applicants' ability to comprehend and follow instructions.

Although much was made of the KSA's as identifying abilities, it is clear that the test measured constructs, not abilities or skills. While abilities are "present competence to perform an observable behavior," Uniform Guidelines, § 16A, Tr. 138, 312, constructs are intangibles, such as "intelligence, aptitude, personality, judgment." Uniform Guidelines, § 14C(1), Tr. 309. The examination attempted to test for verbal aptitude or reasoning. Both defendants' and plaintiffs' experts agreed that content validation procedures are inappropriate for determining the validity of a test for constructs. Defendants concede that the test did not measure present competence since they do not argue that the successful applicant could perform the job of police officer without first undergoing a period of training. Accordingly, the test merely identified some underlying aptitudes or propensities, and these are surely constructs. Plaintiffs' experts, Dr. Richard Barrett (Tr. 320) and Dr. James J. Kirkpatrick (Tr. 461), concluded that the test was a generalized aptitude test which measured competence to read instructions and apply them to a simulated situation.

The Uniform Guidelines, § 14C(4), require:

to be content valid, a [test] measuring a skill or ability should either closely approximate an observable work behavior, or its product should closely approximate an observable work product . . .. As the content of the [test] less resembles a work behavior, or the setting and manner of the administration of the [test] less resemble the work situation, or the result less resembles a work product, the less likely the [test] is to be content valid, and the greater the need for other evidence of validity.

Almost a third of the test purported to reveal the candidates' human relations, including communications techniques KSA's. Yet, it is obvious that the KSA's to perform well in this area could not be tested in a written multiple choice examination. The applicant was asked to choose the best response to a simulated situation. No one could argue seriously, however, that the chosen response indicates how an applicant would actually react in real life to a situation where he must interact with real people. Judge Foley of the Northern District of New York met that contention head on in United States v. State of New York, 77 CV 343 (Sept. 6, 1979) (Findings of Fact and Conclusion of Law); see also 475 F.Supp. 1103 (N.D.N.Y. 1979), and I quote his finding # 175:

*797 The 1975 written examination for the position of trooper was a situations test which, in essence, sought "will do" responses to situations that normally do not occur behind a desk. The fact that someone selects a particular course of action as appropriate on such an examination does not mean that the same course of action would be followed by that individual under different circumstances in a real life situation. Certainly, in such situations, an individual would not consider and determine what would be the worst course of action to be followed. The situations on the written examination are simulated situations that bear little resemblance, marginal at best, to actual situations in which the behavior might occur. Sitting at a desk selecting the best alternative and the worst alternative in response to a written situation, I find, is quite different from being in a real life situation deciding what to do and doing it effectively.

See also Guardians II, supra, 431 F.Supp. at 433; United States v. San Diego County, supra, 20 E.P.D. ¶ 30,154 at 11,812.

Nor does the test fare any better in relation to the other KSA's. The questions testing the KSA's for filling out forms were based on a situation involving the robbery of a delicatessen and arrest of the suspect. A simulated arrest form was reproduced, and the questions asked the candidate to choose relevant information to be placed in blanks on the form. Four possible choices were presented for each question. A police officer in real life, however, must select the correct form and fill in the required information without help from suggested alternatives, one of which is appropriate.

In testing the recalling details KSA's, the candidate was given ten minutes to study a booklet containing a story about a burglary and a detailed description of the suspect, vehicles, scene, etc. The booklet was collected after ten minutes, and test questions were distributed which related to a theft of $500,000 worth of gold from a jewelry company and required the recall of specific details. Being asked about details immediately after studying a text for ten minutes does not mirror the recall a police officer may have to exercise in the performance of his duties. He may have to remember details of what he saw when under the pressure of physical danger to himself or others, the risk of the escape of the suspect, while making up his mind whether to seek assistance or to use firearms, and a myriad of other crises. A police officer's memory under such critical circumstances indicates whether he has optimum recalling details KSA's in police work, and it cannot be measured in a paper and pencil test.

The questions concerning comprehending and applying sections of the law were based on 19 definitions of crimes. Each question presented a fact pattern, and the candidate was to choose from four options the crime corresponding to the fact pattern described. Again, this does not reflect the KSA's involved in a police officer's comprehending and applying sections of the law. In a real situation, the officer sees activity and must determine rather quickly whether the activity is illegal, with no definitional aids before him. He must operate on instinct and experience.

All that any of the questions tested were the KSA's involved in reading, comprehending and following instructions, not the KSA's needed to perform police work.

Defendants, in an apparent effort to meet criticism that the test was a reading test, sought to design the examination at a readability level at the 12th grade. At this level it was evidently felt that the adverse impact of the test on blacks and hispanics might be diminished or removed. Of course, where such alternatives are available and consistent with the employer's legitimate interests, they must be utilized. See Albemarle Paper Co. v. Moody, supra, 422 U.S. at 425, 95 S.Ct. at 2375.

Defendants did not altogether succeed, however, in writing an examination at the 12th grade reading level. They conceded that some of the questions, explanations or instructions required college equivalent reading skills. In any event, basing eligibility on rank order neutralized whatever *798 gains had been achieved in lessening the adverse impact of the test on blacks and hispanics by simplifying the wording of the questions. Moreover, a test easier to read may make it possible for a larger number of blacks and hispanics to succeed, but it qualifies a larger number of whites as well, and the overall ratio of success among the three groups may remain virtually unchanged. Indeed, if one compares the distribution of scores by race on this test with that of the tests at issue in Guardians II, supra, 431 F.Supp. at 551-53 (appendix), the results are basically indistinguishable. When rank order is the basis of selection on an aptitude test such as this, whites in the main will be clustered in the upper ranks and blacks and hispanics on lower levels. In qualifying the 12,000 highest scores, more blacks and hispanics may have been eliminated than had a pass-fail grade been set. Unless it is demonstrated that those with the highest scores will perform best as police officers, rank order as the basis of eligibility for appointment cannot be justified on the ground that this form of selection is an insurance of competence. The police department apparently presumes that the higher scorers possess more police-related KSA's, but there is nothing in the record to support that presumption.

Since the adverse impact of Examination No. 8155 has been established and defendants have failed to prove that the test has a "manifest relationship" to a police officer's duties, Title VII has been violated. Accordingly, the examination is held to be invalid under Title VII, and defendants are permanently enjoined from using the eligibility list in rank order to make appointments to the police force.

Relief

This case is not one of first impression but involves evidence of a pattern, long continued, of discrimination against blacks and hispanics in the hiring and appointment of police officers on the New York City police force. As pointed out earlier the issue of the selection process for police officer has been in this court at least since 1972. Examinations given in 1968 and 1970 were held to have an adverse impact on minority applicants. See Guardians II, supra, 431 F.Supp. at 552, 553, Tables 4 and 6. A discriminatory pattern, therefore, was found to exist in Guardians II and a similar pattern has been evidenced here. Moreover, in light of Guardians II, it would have been expected that defendants in structuring any new examinations would be especially careful to follow requisite guidelines and procedures to make certain that new tests complied with Title VII. Instead, defendants have persisted in devising and utilizing testing procedures that continue to discriminate against blacks and hispanics. Examination No. 8155 fails to meet even the standards established by the department of personnel itself. This studied adherence to discriminatory procedures must at this point be deemed conscious and deliberate, and insistence on maintaining procedures for the selection of police officers known to have an adverse impact on blacks and hispanics cannot be dismissed as inadvertent but must be the result of deliberate policy. Since there is no proof that police work is performed more efficiently and effectively by persons who score high on this examination and its adverse racial impact has been proven, there is no rational support for the rank order selection process.

Accordingly, I find defendants have designed an examination which they knew or should have known tests for reading skills and does not demonstrably reveal those skills that differentiate between a good police recruit and a poor one. The test advantages whites and disadvantages blacks and hispanics without proof that it appropriately identifies those persons best suited to be police officers. Griggs v. Duke Power Co., supra, and cognate cases defining the scope and reach of Title VII have uniformly struck down such practices as unlawful. Defendants' assertions that the examination is a test of a policeman's skills are misleading. Moreover, testimony revealed that during the preparation phases of the test several police officers warned the test makers that the test would disfavor blacks and hispanics. I find and hold that Examination No. 8155 was designed either with a *799 deliberate intention to discriminate against blacks and hispanics or with reckless disregard of whether the test would have that result.

While the 1973 examination for appointment to the police force has not been, as far as I know, the subject of attack, the most recent examinations prior thereto were condemned as discriminatory in Guardians II. In that case Dr. Bernard Cohen, testifying on the basis of a study by the Rand Institute, authorized by the parties and paid for in part by the city, to determine the significance of the distribution by race of scores on entry level examinations given in 1968 and 1970 for appointment to the New York City police force, stated that the chances of such substantial and significant differences between the distributions of the scores of whites and of blacks and hispanics occurring on a racially neutral test were "statistically less than one in a billion." See Guardians II, supra, 431 F.Supp. at 540. Now defendants have repeated the process in the current examination. The imbalance between hispanics and blacks on the one hand and whites on the other on the New York City police force is directly caused by past and current discriminatory practices. See Association Against Discrimination in Employment v. City of Bridgeport, 594 F.2d 306 (2d Cir. 1979).

Affirmative action is mandated as an interim measure either until such discrimination has been totally eliminated or until defendants proceed to select police officers under procedures that are in full compliance with Title VII. Requiring defendants to take positive steps to eliminate the imbalance will not have adverse consequences for "`a small number of readily identifiable' non-minority members." Id. at 310, citing Kirkland v. New York State Department of Correctional Services, 520 F.2d 420, 427, 429 (2d Cir. 1975), rehearing en banc denied, 531 F.2d 5 (2d Cir. 1975), cert. denied, 429 U.S. 823, 97 S.Ct. 73, 50 L.Ed.2d 84 (1976). At the January 11, 1980 hearing on relief, defendants' testimony showed that only approximately 20% of those on the eligibility list in each group called thus far had been able to meet the additional qualifications required for appointment. At this qualification rate, it is likely that all white applicants who scored 94 or higher on the test and who meet the additional requirements will be appointed, even with affirmative action being mandated.

Defendants are enjoined from using Examination No. 8155, except that defendants may use the eligibility list derived from the above examination both to achieve minority composition of the police force comparable to the labor force of the relevant hiring area, which is at least 30% black and hispanic, and to appoint during the interim 50% of their entry level officers from among qualified blacks and hispanics.

Plaintiffs are entitled to their court costs and reasonable attorneys' fees to date. Jurisdiction is retained for such further relief as may be appropriate to ensure compliance with the requirements of Title VII.

IT IS SO ORDERED.

NOTES

[1] This opinion replaces the opinion filed on January 11, 1980, and that opinion is withdrawn.

[2] While Juni's testimony implied that the KSA's which the examination was structured to measure were knowledge, skills and abilities, Dr. Finkelman testified that the KSA's did not measure knowledge but only abilities and skills needed to perform the police officer's function.

[3] This is a term used in the manual entitled "Department of Personnel Procedures for Job Analysis and Preparation of Notice of Examination" ("Procedures"). On page one, the manual describes itself as a "guide . . . to provide examiners of the Department of Personnel with a tool for developing job-related and content-valid examinations for appointments and promotions. . . . " Procedures goes on to state that the instructions embodied in the manual are designed to be consistent with the Uniform Guidelines, American Psychological Association's (APA) Standards and recent court decisions. Ibid. Juni testified that Procedures was in effect at the time the instant test was being structured.

[4] The report indicates the length of service of the various police officers. Long experience, however, does not necessarily translate into job knowledge expertise.

[5] Defendants were given permission to make further submissions after the January 11, 1980 hearing on relief. Defendants have now submitted an affidavit, dated January 15, 1980, from Martin Oling, a statistician employed by the New York City department of planning, challenging Dr. Killingsworth's analysis. Accompanying the affidavit are various exhibits and charts from the 1970 United States census, extrapolations apparently prepared by the affiant from that data, extracts from Technical Documentation, 1976 Survey of Income and Education ("SIE"), Computer Microdata File, with appendices dealing with standard error computations and various other documents related to 1976 SIE.

I cannot consider this affidavit and the exhibits attached to it. The parties had been informed on December 17, 1979, that the court desired to hear live testimony on the issue of relief. Plaintiffs advised the court that defendants knew by December 27, 1979, that they planned to call Killingsworth and were given a computer print-out from which the witness was expected to testify. At the January 11 hearing, defendants' counsel cross examined Killingsworth at some length. Plaintiffs allege that Oling was in the courtroom at the time. The court cannot attest to that, but there were a number of people present who appeared to be assisting defendants' counsel, and she interrupted her cross examination of Killingsworth a number of times to consult with several such persons.

Acceptance of Oling's affidavit with its voluminous exhibits is not appropriate without first allowing plaintiffs the opportunity to cross examine him, as defendants were permitted to cross examine plaintiffs' expert. Accordingly, defendants' affidavit is not properly before the court and cannot be considered at this time. This does not bar the defendants, however, at some future time, from requesting a hearing to present evidence establishing that the court's order should be modified on the ground that blacks and hispanics do not constitute 30% of the relevant labor pool from which the police department is expected to draw its applicants.