Vanguard Justice Society, Inc. v. Hughes

592 F.Supp. 245 (1984)

VANGUARD JUSTICE SOCIETY, INCORPORATED, et al.
v.
Harry HUGHES, Governor of the State of Maryland, et al.

Civ. No. 73-1105-K.

United States District Court, D. Maryland.

June 14, 1984.

*246 *247 Norris C. Ramsey, and Anthony W. Robinson and Donald Jones, Baltimore, Md., for plaintiffs.

Benjamin L. Brown, City Sol., and J. Shawn Alcarese and Millard S. Rubenstein, Asst. City Sols., Baltimore, Md., for defendants.

FRANK A. KAUFMAN, Chief Judge:

In an opinion filed March 29, 1979[1], this Court held, inter alia, that the sergeant's promotional examinations used by the Police Department ("Department") of Baltimore City ("City") in 1972, 1973, 1974, 1976 and 1977 violated Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq., ("Title VII") because those exams had a racially adverse impact upon blacks and because defendants had not shown that the exams were job-related.[2] In that opinion, certain questions relating to relief were held sub curia pending further presentation of evidence and legal argument. Those outstanding issues are now ripe for decision.

Plaintiffs[3] herein challenge the validity of the 1982 written sergeant's promotional exam. That written exam was designed by Baltimore Civil Service Commission ("Commission") personnel and was administered to 605 candidates for sergeant on May 8, 1982. During a nonjury trial held on January 5-6 and January 17-20, 1984, a number of expert and lay witnesses testified. Subsequently, pre- and post-trial memoranda, along with other documents, were filed. After careful review of the entire record in this litigation, this Court holds that the 1982 written exam is invalid and that appropriate relief as set forth infra is required. Findings of fact and conclusions of law, in accordance with Federal Civil Rule 52(a), are set forth below.

I. FACTS

The written exam in question, designated Exam No. 820508, was one part of a threestep *248 promotional procedure employed in 1982 to promote police officers to the rank of sergeant. The other two parts consisted of a promotional appraisal and an oral examination. All three components of the promotional procedure were designed and administered by the Commission, under the supervision of Robert G. Wendland, Deputy Personnel Director. The 605 candidates for sergeant, in addition to sitting for the 115-question written exam, were evaluated by their supervisors on the basis of a supervisory appraisal, called the promotional appraisal. The promotional appraisal was designed to test seven skills, deemed by the Commission to be "significant elements of a sergeant's job."[4] A candidate's scores on the written exam and the promotional appraisal were scaled, multiplied by the weight assigned to the written exam (40%) and to the promotional appraisal (30%) and added together to produce an overall score for each candidate on those two components. Then, each candidate's weighted, composite score was ranked. Only the top 95 candidates, of the original 605, were given the oral exam. The oral exam (weighted 30%) consisted of problem analysis exercises and was designed to test for six skills which the Commission deemed "essential"[5] to the sergeant's job. After all three component scores were computed, the Commission published an eligibility list, ranking the top 95 candidates.[6]

The eligibility list is designed to be used until the list is exhausted, a new selection procedure is developed, or the list expires. Individuals are promoted, in accordance with their ranking, from the eligibility list as vacancies occur. At the time of trial on January 20, 1984, 15 persons — 12 whites and 3 blacks — had been promoted from the 1982 eligibility list.[7] The eligibility list is scheduled to expire in accordance with applicable law in October, 1984.

Plaintiffs in this litigation are presently challenging only the written exam. Plaintiffs concede that any racially adverse impact of the overall promotional procedure is attributable solely to the written exam.[8] The 115-question, multiple-choice written exam was developed by a two-phase process: first, a job analysis was devised and second, the test itself was constructed. A thorough job analysis was prepared by Management Scientists, Inc. ("MSI") experts hired by the City in connection with the 1981 sergeant's promotional procedure.[9] No independent job analysis was performed for the 1982 promotional procedure. Rather, in the Commission's 1982 Validation Report, "the reader is referred to the Ford report[10] and volume one of the MSI report for a comprehensive discussion of the task analysis and the linkage of *249 measured knowledges, skills and abilities to the individual tasks required on the job."[11] In other words, while an expert psychometric firm, MSI, was consulted and did prepare a job analysis for the 1981 promotional procedure, no new job analysis was prepared in 1982. Instead, the City, acting without expert assistance, relied on the 1981 job analysis, the 1981 Ford report, and earlier MSI reports, in devising its 1982 promotional procedure and, in particular, the written exam. Mr. Wendland testified at trial that the 1982 Validation Report "piggybacked" upon the 1981 job analysis, although some modifications were made to the job analysis in 1982. Thus, in considering the validity of the job analysis used in the 1982 sergeant's promotional procedure, review of both the 1981 MSI job analysis and the modifications made thereto in 1982 is required.

The job analysis prepared by MSI in 1981 consisted of five separate steps.[12] First, based on interviews, full shift observation and questionnaires completed by 73 randomly selected sergeants, MSI developed a list of "tasks" related to various "job components" which sergeants generally perform.[13] As part of that step, the incumbent sergeants assigned point values to each task and to each job component based on frequency of occurrence, importance, complexity and criticality, and also based on the amount of time normally spent performing that task or job component. Second, SKAP lists[14] (lists of S kills, K nowledges, A bilities and P ersonal Characteristics) were generated for each job component found to be a part of the sergeant's job.

In the third step, the MSI Project Director, David Wagner, prepared a SKAP rating questionnaire to measure the importance, or to determine the relative weight, of each SKAP with respect to the successful performance of a sergeant's job. The SKAP rating questionnaire was distributed to fifty randomly selected incumbent sergeants who assigned points to each SKAP to reflect the importance of that SKAP to successful performance of each job component. The sergeants also indicated whether they were of the opinion that a sergeant, on the first day he was on the job, needed to perform a particular SKAP.

In the fourth step, MSI determined the importance of each SKAP in relation to the job of police sergeant. To so do, MSI multiplied the average SKAP weight by each job component value. That produced a total value for a SKAP within a given job component. The values for SKAPs which appeared in more than one job component were then added together to arrive at a total SKAP value.[15] In its simplest form, that procedure assigned a numerical value to each SKAP which each sergeant needed in order to perform the job of police sergeant. *250 The relevant SKAPs and their values are set forth in Table 1, "Description of SKAP Clusters with Associated Job Analysis Points."[16]

As a final step in the 1981 job analysis, MSI determined which of three measurement components—the written test, the performance appraisal[17] or the assessment center[18] — would be most effective for testing each SKAP cluster.[19] MSI determined that the Knowledges SKAP, assigned 4182 job analysis points, was to be measured on the written exam, while 27 other SKAPs were to be measured by the other two test components.[20] MSI calculated the appropriate *251 weights for the written exam (25%), the performance appraisal (30%) and the assessment center (45%) by determining the percentage of job analysis points allocated to each test component.

As noted earlier, the 1981 MSI job analysis was modified substantially by Commission personnel in 1982. Only 12 of the 28 SKAPs identified in the 1981 job analysis were included in the 1982 job analysis and were the subject of testing in the 1982 promotional procedure. Among those SKAPs deleted in 1982 was the second most-heavily weighted SKAP, that is, supervisory ability.[21] Of the 12 SKAPs included in the 1982 job analysis, two, knowledges and reading comprehension, were tested for in the 1982 written exam. The 10 remaining SKAPs were measured by the promotional appraisal and/or the oral exam. Additionally, the weightings of the three test components were substantially altered in 1982. In explaining the reweighting, the Commission stated:

Some of the factors [SKAPs] rated or assessed in 1981 were deleted from the 1982 examination. Based on the new set of factors, the relative weights for the written test, supervisory rating and oral examination are as follows:
Written Test: 40%
*252 Supervisory Promotional Appraisal: 30%
Oral Examination: 30%[22]

The reweighting resulted in the written exam, which is the only test component alleged to have a racially adverse impact,[23] being weighted most heavily in the 1982 promotional procedure.

The second major phase in developing the 1982 promotional procedure was the construction of the three testing components. Because the only component of the 1982 promotional procedure challenged in this litigation is the written exam, discussion in this opinion is limited to the construction of that exam. It bears emphasizing that the procedures used in constructing the 1982 written exam differed significantly from those employed by MSI in constructing the 1981 written exam. In 1981, for example, MSI first conducted a series of workshops for Commission personnel to train them in writing multiple choice test items. MSI then reviewed the job analysis to decide what major knowledge areas were to be tested. The number of test items for each knowledge area reflected the relative number of job analysis points associated with that knowledge area.

The next step involved in construction of the 1981 written exam was the identification of viable primary and secondary sources. Thereafter, a test outline was prepared based on an 100-item test.[24] Finally, construction of the 1981 written exam involved drafting the 100 multiplechoice test items. In so doing, the Commission retrieved items from previously administered sergeant, lieutenant and captain examinations from the Civil Service item pools. The items were then classified, if possible, into one of the knowledge areas and were reviewed to determine whether they possessed current vitality. Each previously used item was also classified according to its level of difficulty,[25] level of discrimination,[26] and level of racial impact.[27] All previously used items which had questionable or unacceptable impact levels were deemed nonusable, as were all items which were poor discriminators.

After determining the number of previously used items available for pretesting in 1981, the Commission staff drafted additional items in order to produce three times *253 the number of items for each knowledge area as were in the end considered to be needed. In other words, more than 300 usable items were developed for the 100-item test. The 300-question test was then pilot-tested in parts in three cities, St. Louis, Detroit and Denver. Each pilot-tested item was analyzed for difficulty, disparate racial impact, internal consistency, validity and relevance. Based on those analyses, "the MSI Project Director selected for inclusion in the [final 100-item multiplechoice] test those items which possessed the best psychometric information."[28] An item was considered "desirable" if, among other things, it demonstrated "no difference between minorities and non-minorities on [sic] Difficulty level."[29]

In constructing the 1982 written exam neither MSI nor any other expert in psychometrics played any active role. Instead, Commission personnel determined the 1982 written test content and wrote test items "in-house." In its 1982 Validation Report, the Commission described the basis for construction of the 1982 written exam as follows:

The specific areas of knowledge listed in the job analysis were evaluated based on their statistical ratings and trivial knowledges were eliminated. The remaining knowledges were grouped into broader categories of related knowledge, and the categories were then statistically analyzed to determine approximately what proportion of the test should be related to each category. Reference materials and source documents for each area were identified by Police Department personnel.
The major change in test content was the inclusion of reading comprehension items. Since this skill was rated so heavily in the job analysis, since it is appropriate to test reading comprehension in a written test and since Dr. Barrett made unsubstantiated references to reading level differences between black and white populations, the Commission felt it was important to measure reading comprehension. In addition, areas measured by fewer than 5% of the items were dropped because of problems in sampling, reliability and importance. The test was therefore lengthened to 115 items in 1982 compared to 100 items in 1981.

1982 Validation Report, supra, at 2. The Commission's Report does not indicate which Commission personnel were item-writers in 1982, nor whether such personnel received any training in item-writing.[30] Moreover, the Report states: "The test items were not reviewed by experts within the Police Department prior to the examination for test security purposes, and the CSC [Civil Service Commission] did not have the expertise to spot the highly technical flaws in [some] questions."[31] Further, the Report indicates that "in 1982, pretesting multiple choice items was not possible."[32] As an alternative to pilot testing, the Commission "elected to administer the test and then delete those items which might later prove to be defective"[33] based on candidates' appeals. Thus, at the time the written test was administered, "candidates were given a form on which they could comment on specific parts or questions in the test."[34] Based on a review of those comments by Department and Commission *254 staff, eighteen items were deleted from the written test prior to final scoring.[35]

The test was then scored from 0 to 97 (115 original questions minus 18 deleted questions) with one point given for each correct answer. The raw scores were converted to standard scores based upon standard deviations.

As noted earlier, the combined scores on the written test (weighted 40%) and the promotional appraisal (weighted 30%) were then computed for all 605 candidates. The top 95 candidates, with their respective rankings achieved on the basis of their written exams and promotional appraisal scores, were then given the oral examination (weighted 30%). Only those 95 candidates were provided the opportunity to proceed through all three stages of the promotional procedure because the Department estimated that that number would meet the Department's needs for 40 sergeants during the two-year life of the eligibility list. In that regard, the Department stated in its 1982 Validation Report that the cutoff point of 95 was "based on the number of anticipated vacancies," not on "the level of performance below which a candidate could not function as a Sergeant."[36]

Of the 605 applicants who participated in the first two components of the 1982 promotional procedure, 460 were white, 128 were black, and 8 were members of other minority groups. The whites constituted 76% of the total applicants, while blacks constituted 22% of the total applicant pool.

Of the white candidates, 17.4% passed the promotional procedure, i.e., were given the chance to proceed to the oral examination and were placed on the eligibility list. By contrast, only 10.9% of the black candidates passed the procedure. Thus, the black pass rate was about three-fifths of the white pass rate, substantially below the four-fifths minority-to-white pass ratio deemed acceptable by the Uniform Guidelines § 4(D).[37] Because the results of the promotional appraisal were almost identical for blacks and whites, any adverse impact is attributable to the written exam.[38]

II. LAW — TITLE VII

Title VII, and more particularly 42 U.S.C. § 2000e-2(h), forbids the use of employment "practices, procedures or tests, neutral on their face, and even neutral in terms of intent" if they are discriminatory in effect. Griggs v. Duke Power Co., 401 U.S. 424, 430, 91 S.Ct. 849, 853, 28 L.Ed.2d 158 (1971). See also Dothard v. Rawlinson, 433 U.S. 321, 329, 97 S.Ct. 2720, 2726, 53 L.Ed.2d 786 (1977); Albemarle Paper Co. v. Moody, 422 U.S. 405, 430, 95 S.Ct. 2362, 2377, 45 L.Ed.2d 280 (1975). In Griggs, Chief Justice Burger wrote:

Nothing in the Act precludes the use of testing or measuring procedures; obviously they are useful. What Congress has forbidden is giving these devices and mechanisms controlling force unless they are demonstrably a reasonable measure of job performance. Congress has not commanded that the less qualified be preferred over the better qualified simply *255 because of minority origins. Far from disparaging job qualifications as such, Congress has made such qualifications the controlling factor, so that race, religion, nationality, and sex become irrelevant. What Congress has commanded is that any tests used must measure the person for the job and not the person in the abstract.

Id. 401 U.S. at 436, 91 S.Ct. at 856.

In a disparate impact case, such as the one at bar, a tripartate analysis must be applied. First, a Title VII plaintiff bears the initial burden of making out a prima facie case of discrimination. To so do, such a plaintiff need only show "that the tests in question select applicants for hire or promotion in a racial pattern significantly different from that of the pool of applicants." Albemarle Paper Co. v. Moody, 422 U.S. at 425, 95 S.Ct. at 2375 (1973). A prima facie case may be established by evidence of statistical disparities alone. Dothard v. Rawlinson, supra, 433 U.S. at 329, 97 S.Ct. at 2726; International Brotherhood of Teamsters v. United States, 431 U.S. 324, 329, 97 S.Ct. 1843, 1851, 52 L.Ed.2d 396 (1977).[39] Once a plaintiff establishes a prima facie case, the burden shifts to the employer to demonstrate that the selection process which has produced the disparate racial impact is job related or, to state it more precisely, that the selection process has a "manifest relationship to the employment in question." Griggs v. Duke Power Co., supra, 401 U.S. at 432, 91 S.Ct. at 854. Finally, if the employer meets that burden, the plaintiff may still prevail by persuading the trier of fact that the challenged selection process was a mere pretext for discrimination in hiring or promotion. Connecticut v. Teal, 457 U.S. 440, 446-47, 102 S.Ct. 2525, 2530-31, 73 L.Ed.2d 130 (1982). In a Title VII cause of action, discriminatory purpose need not be proven. Albemarle Paper Co. v. Moody, supra, 422 U.S. at 422, 95 S.Ct. at 2373; Griggs v. Duke Power Co., supra, 401 U.S. at 432, 91 S.Ct. at 854.[40]

A. Adverse Impact

The parties have asked this Court to assume, on a binding basis for all purposes of this litigation, that the 1982 written exam had a disparate impact as defined by the Equal Employment Opportunity Commission ("EEOC") Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. § 1607 (1983) ("Guidelines").[41] Under the Guidelines, "[a] selection rate for any race, sex, or ethnic group which is less than four-fifths ( 4/5 ) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact ...." Section 4(D). See Connecticut v. Teal, supra, 457 U.S. at 443 n. 4, 102 S.Ct. at 2529 n. 4. Plaintiffs here have proffered evidence that the written exam had an adverse impact ratio of 62.9% and that the promotional procedure as a whole had an adverse impact that fluctuated between an acceptable ratio of 89.9% to an unacceptable ratio of between 50 and 75%. However, plaintiffs' expert, Dr. Richard S. Barrett, testified that if relatively few of the 95 candidates on the eligibility list were promoted, e.g., 23 or fewer, the 80% rule of thumb established in *256 the Guidelines would not be violated because several black candidates scored disproportionately well on the written exam. Additionally, Dr. Barrett noted that if more than 23 candidates were promoted from the eligibility list, adverse impact would become apparent, fluctuating between unacceptable ratios of 50% and 75%, depending upon the number promoted.

The City has informed the Court that as of the time of trial only 15 eligible candidates — 12 whites and 3 blacks — had been promoted to sergeant. Thus, in terms of actual promotions made, the 80% selection ratio has not been violated.

On the basis of this initial testimony, the parties agreed not to litigate the issue of adverse racial impact but to assume that such adverse impact existed for the 1982 sergeant promotional procedure as a whole and for the written exam in particular.[42] The parties further agreed that no additional promotions would be made from the eligibility list pending this Court's ruling with regard to job-relatedness and validity of the 1982 written exam.[43] That agreement not to promote further from the 1982 eligibility list was designed to preserve a promotion ratio of black to white sergeants which did not, in actual practice, offend the 80% Guidelines rule.

B. Job Relatedness

In view of defendants' concession of adverse impact, defendants must rebut plaintiffs' prima facie case of discrimination by demonstrating that the 1982 written exam is job-related. See, e.g., Connecticut v. Teal, supra, 457 U.S. at 446-47, 102 S.Ct. at 2530-31; Albemarle Power Co. v. Moody, supra, 422 U.S. at 425, 95 S.Ct. at 2375; Griggs v. Duke Power Co., supra, 401 U.S. at 432, 91 S.Ct. at 854; Vanguard I, 471 F.Supp. at 698-99. To so do, defendants must show that the written exam is "demonstrably a reasonable measure of job performance." Griggs, supra, 401 U.S. at 436, 91 S.Ct. at 856.

"The threshold task in determining the validity or job-relatedness of a challenged examination is to select the appropriate measure for assessing its job-relatedness." Guardians Ass'n of New York City v. Civil Service Comm'n, 630 F.2d 79, 91 (2d Cir.1980), cert. denied, 452 U.S. 940, 101 S.Ct. 3083, 69 L.Ed.2d 954 (1981) ("Guardians IV"). The Guidelines, which draw heavily upon professional standards of test validation established by the American Psychological Association recognize three validation techniques: content validation, construct validation and criterion-related validation. Guidelines §§ 5(B), 14. The Guidelines specify when each technique is appropriate and also specify the requirements for successfully validating an examination by use of each technique. Justice White, in Washington v. Davis, 426 U.S. 229, 247 n. 13, 96 S.Ct. 2040, 2051 n. 13, 48 L.Ed.2d 597 (1976), recognized those three methods of test validation as set forth in the Guidelines and described them as follows:

Professional standards developed by the American Psychological Association in its Standards for Educational and Psychological Tests and Manuals (1966), accept three basic methods of validation: "empirical" or "criterion" validity (demonstrated by identifying criteria that indicate successful job performance and then correlating test scores and the criteria so identified); "construct" validity (demonstrated by examinations structured to measure the degree to which job applicants have identifiable characteristics that have been determined to be important *257 in successful job performance); and "content" validity (demonstrated by tests whose content closely approximates tasks to be performed on the job by the applicant).

Further, Chief Justice Burger in Griggs v. Duke Power Co., 401 U.S. 424, 433-34, 91 S.Ct. 849, 854-55, 28 L.Ed.2d 158 (1971), and Justice Stewart in Albemarle Paper v. Moody, 422 U.S. 405, 431, 95 S.Ct. 2362, 2378, 45 L.Ed.2d 280 (1975), concluded that the EEOC Guidelines are entitled to great deference. This Court earlier in this litigation surveyed the substantial authority favoring deference to the Guidelines and observed that "a determination as to whether any test is job-related requires consideration of the EEOC Guidelines," Vanguard I, supra, 471 F.Supp. at 728. Other courts, too, have expressed similar viewpoints.[44] In Guardians IV, supra, the Second Circuit discussed extensively the proper weight to be given the EEOC Uniform Guidelines. Judge Newman, in cautioning against viewing every deviation from the Guidelines as an automatic violation of Title VII, wrote:

To the extent that the Guidelines reflect expert, but non-judicial opinion, they must be applied by courts with the same combination of deference and wariness that characterizes the proper use of expert opinion in general .... Thus, the Guidelines should always be considered, but they should not be regarded as conclusive unless reason and statutory interpretation support their conclusions. As this Court has previously stated: "If the EEOC's interpretations go beyond congressional intent, the Guidelines must give way."

Id. at 91 (citations omitted). In order for an examination to be found job-related, it should usually comply, not 100%, but nevertheless in substantial measure, with the reasonably attainable requirements set forth in the Guidelines. Thus, in this case, where defendants contend that the written exam is both content valid and criterion valid,[45] defendants will be required to demonstrate substantial compliance with the Guidelines requirements governing content and/or criterion validation.

C. Content Validity

A test's degree of content validity has been defined as "the degree to which the subject matter of the test relates to those significant parts of the curriculum for which the test is intended to predict the applicants' capability." Rivera v. City of Wichita Falls, 665 F.2d 531, 537 (5th Cir. 1982). Another oft-quoted definition of content validity is set forth by Judge Weinfeld in Vulcan Society of the New York City Fire Dep't, Inc. v. Civil Service Comm'n, 360 F.Supp. 1265 (S.D.N.Y.), aff'd in part, remanded in part, 490 F.2d 387 (2d Cir.1973):

An examination has content validity if the content of the examination matches the content of the job. For a test to be content valid, the aptitudes and skills required for successful examination performance must be those aptitudes and skills required for successful job performance. It is essential that the examination test these attributes both in proportion *258 to their relative importance on the job and at the level of difficulty demanded by the job.

Id. at 1274 (footnotes omitted). In short, an examination is content valid if it tests knowledges, skills and abilities critical to a job and thereby rates applicants on the basis of their ability to perform that job.

While the Guidelines describe various aspects of content validation, they "do not neatly list ingredients of an adequate exam." Guardians IV, supra, at 95. Yet, courts have been able to distill certain factors from the Guidelines for consideration in determining whether an examination possesses sufficient content validity to justify its use, notwithstanding its disparate racial impact. To begin with, constructing a content valid exam requires proof of a thorough job analysis. Thus, in United States v. County of Fairfax, Va., 629 F.2d 932, 943 (4th Cir.1980), cert. denied, 449 U.S. 1078, 101 S.Ct. 858, 66 L.Ed.2d 801 (1981), Judge Winter wrote: "Usually the starting point in proof of validity is evidence of a thorough job analysis."[46] Second, the test-makers must have used "reasonable competence" in constructing the examination itself.[47] Third, the designers of a test must develop a test whose content has a direct relationship with the content of the job.[48] Fourth, the content of the test must be representative of the content of the job.[49] Finally, the test must be used or scored in such a manner as to assure selection, with some precision, of those applicants best able to perform the job.[50]

1. Job Analysis

Judge Weinfeld has described a job analysis as "a thorough survey of the relative importance of the various skills involved in the job in question and the degree of competency required in regard to each skill." Vulcan Society, supra, 360 F.Supp. at 1274. According to the Guidelines, a job analysis consists of an assessment "of the important work behavior[s] required for successful performance and their relative importance." § 14(C)(2). In the case at bar, the Commission did not devise an independent job analysis for the 1982 promotional procedure.[51] Rather, as noted supra, the Commission relied largely upon the 1981 job analysis which was prepared by MSI based upon information gathered in 1979 and 1980.

The 1981 job analysis seemingly comported with the Guidelines requirements. First, the 1981 job analysis identified the important work behaviors required for successful performance of the sergeant's job. As discussed in more detail, supra, the sergeant's job was divided into "job components" and associated "tasks" based upon questionnaires distributed to 73 incumbent sergeants. Those job components and tasks were defined with a relatively high degree of precision. Each of the 73 sergeants completing the task questionnaire was asked to indicate, inter alia, whether each component and each task was a part of his job and the relative importance of each component and of each task. The *259 record herein calls for the conclusion that the first part of the Guidelines standard, i.e., "identification of important job behaviors," was satisfied by the 1981 job analysis.

Second, the MSI job analysis met the requirement of determining the relative importance of the identified work behaviors. The City assessed the importance of each job component and task by means of the task questionnaires referred to above. Further, the City identified a list of SKAPs which a sergeant needed to possess in order to perform each component of his job, and established the relative weight of each SKAP by virtue of SKAP rating questionnaires distributed to 50 randomly selected sergeants. Finally, in order to determine the overall importance of each SKAP to the job of sergeant, the City calculated an overall SKAP value by multiplying the SKAP weight by each job component weight. The care taken by MSI first to determine the relative importance of work behaviors and then to identify the relative importance of critical SKAPs associated with those work behaviors was in full accord with the Guidelines. Moreover, not only was the job analysis for the 1981 promotional procedure as a whole adequate, but the three components of that procedure — the written exam, the performance appraisal and the assessment center — were properly weighted in accordance with the number of job analysis points tested by each component. In 1981, the written exam was weighted 25%, the performance appraisal, 30% and the assessment center, 45%. Finally, the distribution of questions on the 1981 written exam, which tested only for the SKAP of knowledges, reflected the number of job analysis points assigned to each identified knowledge area.

The piggybacking of the 1982 promotional procedure onto the 1981 job analysis presents difficulties. To begin with, there exists some question as to whether work behaviors (job components and tasks), identified as early as 1979, retained vitality as late as 1982. The 1982 Validation Report makes no mention of any inquiry as to whether the sergeant's job, and the work behaviors or job components associated therewith, changed somewhat between 1979 and 1982. Cf. Vanguard I, supra at 740. Further, and of much greater concern, are the unexplained modifications made by the Commission in 1982 in connection with the 1982 use of the 1981 job analysis. The 1982 Validation Report states that "[s]ome of the factors rated or assessed in 1981 were deleted from the 1982 examination." 1982 Validation Report, supra, at 2. In fact, of the 28 SKAPs measured in 1981, only twelve were tested for in the 1982 promotional procedure. Among the SKAPs eliminated in 1982 was supervisory ability, which was deemed to be the second most important SKAP in the 1981 job analysis, receiving 968 job analysis points out of a total of 15,160 job analysis points.[52] Indeed, the failure to test in 1982 for such a critical work behavior might alone be sufficient to defeat defendants' claim of content validity. In Firefighters Institute for Racial Equality v. United States, 549 F.2d 506, 511-14 (8th Cir.), cert. denied, 434 U.S. 819, 98 S.Ct. 60, 54 L.Ed.2d 76, the Court concluded that a fire captain's examination, which did not test for supervisory ability, was fatally deficient. The Court there noted that that supervision was the fourth most important task of a fire captain's job, and that supervision was "the only major job attribute that separates a firefighter from a fire captain." 549 F.2d at 511. Here, too, supervision is one of the major factors separating the job of a police officer from that of a police sergeant. In 1981, the City did test for supervisory ability by way of the assessment center. It did not so do in 1982 because the assessment center had been eliminated and replaced with a more limited *260 oral examination.[53] The Court in Firefighters did not accept the contention that the City of St. Louis could delete a major job attribute because "an Assessment Center is too expensive." 549 F.2d at 512. This Court concurs.

Other significant SKAPs no longer tested for by the 1982 promotional procedure included written communication skills (725 job analysis points in 1981), honesty (612 job analysis points in 1981), objectivity (43 job analysis points in 1981) and logical reasoning (422 job analysis points in 1981). It is also to be noted that while the Commission in 1982 eliminated some SKAPs with significant job analysis point values, it continued to test for SKAPs with relatively low job analysis point values, including Ability to Act under Stress (190 job analysis points in 1981) and Report Preparation Ability (239 job analysis points in 1981). Those changes, in and of themselves, impaired the integrity of the 1981 MSI job analysis. Accordingly, it cannot be said that the 1982 job analysis, which tested for only 12 SKAPs, accurately identified important work behaviors of a sergeant's job. See Guidelines § 14(C)(2).

Further, the relative importance of each SKAP was incorrectly assessed in 1982 because the Commission re-weighted the three components of the promotional procedure (written examination — 40%; promotional appraisal — 30%; oral examination — 30%).[54] The Commission's seemingly random inclusion and exclusion of SKAPs from the 1981 job analysis resulted in an end product in 1982 which omitted important work behaviors and overemphasized the relative weights of certain work behaviors which were tested for.

2. The Test Construction Process

With a job analysis of dubious accuracy, defendants must shoulder an unusually heavy burden to demonstrate that the written exam, as constructed, is content valid.[55] "Because of the unlikelihood that an examination prepared without benefit of a probing job analysis will be content valid, ... in the absence of such an analysis the proponent of the examination carries a greater burden of persuasion on the issue of job-relatedness." Guardians Ass'n of New York City v. Civil Service Comm'n (Guardians V), 633 F.2d 232, 242-43 (2d Cir.1980), cert. denied, ___ U.S. ___, 103 S.Ct. 3568, 77 L.Ed.2d 1410 (1983). In a similar vein, Judge Weinfeld observed that a showing of a substandard job analysis must be met by "the most convincing testimony as to job-relatedness," Vulcan Society, supra, 360 F.Supp. at 1276. In affirming Vulcan on appeal, Judge Friendly characterized with seeming approval Judge Weinfeld's approach as follows: "[T]he poorer the quality of the test preparation, the greater must be the showing that the examination was properly job-related, and vice versa". Vulcan Society of New York City Fire Dept., Inc. v. Civil Service Comm'n, 490 F.2d 387, 396 (2d Cir.1973). It is in this context, then, that the written test itself must be analyzed.

*261 The 1982 written exam was developed "in-house" by staff members of the Commission. MSI, which had played an instrumental role in developing the 1981 sergeant's promotional procedure, did not participate in designing the 1982 written exam. Rather, MSI, at most, answered questions propounded to it by the City and assisted the City in post-exam administration validation analyses. While, as Judge Newman has written, "the law should not be designed to subsidize specialists, ... employment testing is a task of sufficient difficulty to suggest that an employer dispenses with expert assistance at his peril.... [T]he decision to forgo such assistance should require a Court to give the resulting test careful scrutiny." Guardians IV, supra, 630 F.2d at 96. Moreover, it is further worthy of note that many "in-house" examinations scrutinized by the courts have failed to pass muster.[56]

Exam No. 820508 (i.e., the 1982 written exam) does not meet the Guidelines standard required for test construction. That written exam was designed to measure 18 different knowledges,[57] and reading comprehension. Reading comprehension had not been tested for in the 1981 written exam because MSI personnel believed that it could best be measured through the assessment center.[58] Because of the addition of reading comprehension to the written exam, the 1982 written test was expanded from 100 items in 1981 to 115 items in 1982.

In 1982 the Commission determined how many questions to allocate to each identified knowledge by reference to the number of job analysis points associated with each knowledge. Yet despite the Commission's initial care in 1982 in testing for each knowledge in proportion to its relative importance to the sergeant's job, the test, as finally scored, was not proportionately representative of each knowledge area. The *262 deletion of 18 questions by the Commission prior to final scoring of the written exam resulted in a written exam which, as scored, placed far too little emphasis on the knowledge areas of "Reports" and "Search and Seizure."[59]

Further, and of far greater concern, is the fact that the 115 questions were apparently drafted in an haphazard manner. The questions were written by unidentified Commission personnel, who have not been shown to be expert in the art of test-writing. Indeed, according to the Commission's 1982 Validation Report, the item-writers lacked the expertise to spot "highly technical flaws in the questions."[60] Additionally, the 1982 Validation Report does not state whether, and to what extent, the item-writers relied on the job analysis materials. Nor, as far as the record herein reflects, were the 1982 questions ever reviewed by persons within or without the Department for accuracy or reliability. The 1982 Validation Report explains that the items were not reviewed by Department personnel for "security reasons." Thus, incumbent sergeants had no input into the test-construction process and were not provided an opportunity to comment on whether the items tested for the knowledges which they purported to test for. Equally disturbing is the fact that there was no pre-test administration review to assure that the questions were not ambiguous, overly complex, overly specialized, dependent on prior knowledge, or based on information which would be acquired during a training period.[61] Nor were the questions tested on a sample population, even though such pilot testing had been employed by the City in selecting the items for inclusion in the 1981 written exam. Moreover, no item-analysis was performed on the questions prior to their use, despite the fact that such was considered necessary by MSI in constructing the 1981 written exam. Specifically, prior to administering the 1981 written exam, item-impact and item-discrimination analyses were performed. Those analyses, which were designed to select the items most predictive of job performance and with the least racially adverse impact, were not explored prior to test administration in 1982. In short, the Commission in 1982 made far too little effort to ensure that the test items were reliable, comprehensible, ambiguous or racially-neutral in impact.

Not surprisingly, the 1982 test construction process resulted in a written exam of questionable validity. The Commission itself found 18, or 15.7%, of the items to be defective and removed them before scoring the exam. Two other items were rekeyed on the grounds that they had been miskeyed initially.[62] Further, as Dr. Barrett noted in his March 23, 1983 Critique, 17 or 14.8% of the items were answered correctly by 90% or more of the candidates and, therefore, contributed little to the value of *263 the exam.[63] Additionally, Dr. Barrett expressed the view that eight items were "inappropriate distractors," in that an incorrect alternative was selected more often than the keyed answer."[64] In his October 13, 1983 affidavit, Dr. Barrett also expressed the view that of the 80 "knowledge" items reviewed by the panelists, only 56%, or 45 questions, were part of the sergeant's job and that 26% of the questions were the responsibility of someone else.[65] Illustrative of this type of question is item 25 which reads: "Who makes the final decision on a complaint of excessive force filed against a member of the Dept.? The (a) Police Commissioner; (b) Complaint Evaluation Board; (c) Director of the Internal Investigation Division; (d) Commanding Officer of the Person charged." That question, in addition to being the responsibility of someone other than a sergeant, is ambiguously worded. That is true because although the Police Commissioner takes final action on a complaint of excessive force, the Complaint Evaluation Board is charged, by statute, with making final decisions on such complaints. Moreover, question 25, like so many others, does not test for critical knowledge. Indeed, of the 45 questions which were deemed to be job-related, many were, at best, marginally so. An example of a question identified as marginally related to the sergeant's job is item 38, which states:

According to the Digest of Laws, concerning domestic violence which of the following acts between family members would be considered "abuse"?
(a) Verbally assaulting another;
(b) Putting another in constant fear of bodily harm;
(c) Malnutrition of children;
(d) None of the above.

It is of marginal importance that a sargeant knows the Digest of Laws definition of domestic abuse. A prosecutor will be responsible for making an official charge of abuse. The sergeant's role is more centrally what action should be taken if he is confronted with any one of the alternatives delineated in question 38(a)-(c).

At trial, Dr. Barrett testified that "a big problem with the test is ambiguity," specifically observing that items could not be deemed job-related if a test-taker could not understand what was being asked. Dr. Barrett identified at least 15 items, out of the 80 knowledge items, which he considered to be ambiguous. This Court agrees with Dr. Barrett that far too many of the 80 knowledge items are ambiguous.

Moreover, Dr. Barrett was not the only witness who testified to the fact that a significant number of questions were either ambiguous or unrelated to satisfactory performance of the sergeant's job. The City's own witness, Major Norris, testified to such. In particular, Major Norris identified 17 of the first 80 questions on the written exam which were not job-related.[66] Of those 17 questions, Major Norris stated that five items either had more than one correct answer or no correct answer. Additionally, the Major expressed the view *264 that at least seven of those questions covered information wholly or substantially outside the scope of a sergeant's job and that other questions challenged were ambiguous.

While this Court is mindful of Judge Newman's warning that the "`burden of judicial examination-reading' ... need not inevitably be assumed," Bridgeport Guardians v. Bridgeport Police Dep't, 432 F.Supp. 931, 937 (D.Conn.1977), the assumption of that burden is appropriately undertaken by this Court because it is important herein to determine the approximate number of questions which are subject to attack as non-job-related. Further, the subject matter of the 1982 written exam, namely, the knowledge required to be a good sergeant, is not too far removed from judicial competence.

Herein, defendants have the burden of showing, in the face of conceded disparate racial impact, that the written exam tested for those knowledges which "are critical and not merely peripherally related to successful job performance," Kirkland, supra, 374 F.Supp. at 1372. In the words of the Uniform Guidelines, the employer "should show that (a) the selection procedure measures and is a representative sample of that knowledge, skill, or ability; and (b) that knowledge, skill, or ability is used in and is a necessary prerequisite to performance of critical or important work behavior(s)." Guidelines § 1607.14(C)(4). In this case, this Court is unconvinced that the 1982 written exam adequately tested for the knowledge areas which it was designed to test for. And, the examination also seemingly failed to test for those knowledges which are a necessary prerequisite to the performance of critical or important aspects of a sergeant's job. Approximately 17 of the first 80 questions of the 1982 written exam relate to information which is only of marginal importance to a sergeant's job.[67] Further, an additional 12 of the first 80 questions are so ambiguous or incomprehensible that they cannot be said to measure the knowledge areas which they were intended to test for.[68]

To sum up, 14 of the first 80 questions were deemed deficient by the Commission and were deleted. An additional 29 questions out of the first 80 are substandard for reasons set forth above in this opinion. Thus, considering only the first 80 items, arguably there remain at the most 37 "good" questions. That finding alone warrants a conclusion that the 1982 written exam is fatally deficient. In addition, it is to be noted that with respect to questions 101 through 115, they dealt with Department procedures and have not been vigorously analyzed or challenged by plaintiffs. Yet, defendants' witness, Major Norris, when asked to comment on questions 101 through 115, expressed the view that 6 of those 15 items were not job-related.[69] Thus, of the final 97-item test (115 items minus 18 items deleted by the Commission), 35 (29 out of the first 80 and 6 out of the last 15) are not job-related. Those numbers speak for themselves and strongly suggest that Exam No. 820508 is not content valid. However, while the substantial shortcomings of the job analysis and test construction process might alone compel the conclusion that the written exam is not content valid, it would appear helpful, at least in terms of the future, for this Court to reach and discuss other factors related to content validity.

3. The Direct Relationship Requirement

As the Second Circuit has observed, the central requirement of Title VII is "relationship of test content to job content." Guardians IV, supra, 630 F.2d at 97-98. In this case, it is questionable, at best, whether Exam No. 820508 tests for those knowledges which are requisite to successful performance of a police sergeant's job. *265 While the 1981 job analysis adequately identified the relevant knowledge areas necessary to performance of the sergeant's job and while those same knowledge areas ostensibly formed the basis for the 1982 written exam, there is considerable doubt as to whether the 1982 test, as constructed, measured such knowledge areas.[70] In other words, although the knowledge areas sought to be tested, including "evidence collection," "warrants," "discipline," "crime-related matters," "reports," and "court-related matters," appear to be related to information which a "good" sergeant should know, the questions relating to each knowledge area may in fact test for unimportant rather than important information.

For example, the knowledge area of "crime related matters" received the most job analysis points in 1981 and was most heavily tested for in 1982.[71] Yet, of the 15 questions devoted to that knowledge area, most did not test for crime related information which is central to a sergeant's successful job performance. Thus, question 35 asked: "Which of the following states does not participate in the Non-Resident Violator Compact?" There were some testimonial differences during trial concerning whether such a compact in fact exists. Even assuming one does exist, it is doubtful, at best, whether a sergeant needs to memorize the information. Rather, it would seem likely that a sergeant could refer to reference materials when the need to know arises. Moreover, the knowledge tested for does not appear to go to the core of crime-related matters. The same can be said of question 37 which asked: "How the word wilful, used in the term first degree murder, was defined according to the Digest of Laws." Again, it is hard to see how knowledge of such a definition in the Digest of Laws is critical to a sergeant's job. Someone other than a sergeant will presumably determine whether a murder is chargeable as first degree murder. Questions of this sort predominate throughout the 1982 written exam to far too great an extent, and militate against a finding that the 1982 written exam satisfies the direct relationship requirement of content validation.

4. The Representative Requirement

The Guidelines require that a test be "a representative sample of the job." Guidelines § 14(C)(4). That requirement is construed primarily to mean that the content of the test must be proportionately representative of the content of the job. Many courts have construed that requirement strictly. For example, in Kirkland v. New York State Dep't, supra, 374 F.Supp. at 1372, Judge Weinfeld observed that the test must be shown "to examine all or substantially all the critical attributes of the sergeant's position in proportion to their relative importance to the job and at the level of difficulty which the job demands." Similarly, Judge Keady commented: "A content valid examination must not only test required skills and aptitudes, it must test them in proportion to their relative importance on the job." Walls v. Mississippi State Dep't of Public Welfare, 542 F.Supp. 281, 312 (N.D.Ill. 1982). The Standards for Educational and Psychological Tests of the American Psychological Association state: "[A]n employer cannot justify an employment test on the grounds of content validity if he cannot demonstrate that the content universe includes all, or nearly all, important parts of the job" (at 29), cited favorably in United States v. City of Chicago, 573 F.2d 416, 425 (7th Cir.1978) (emphasis added by Chief Judge Fairchild). In Guardians IV, supra, the Second Circuit rejected the contention that an examination must test for all abilities of a job and for each in its proper proportion, and adopted the following more relaxed standard:

The reason for a requirement that the content of the exam be representative is to prevent either the use of some minor aspect of the job as the basis for the selection procedure or the needless elimination *266 of some significant part of the job's requirements from the selection process entirely; this adds a quantitative element to the qualitative requirement — that the content of the test be related to the content of the job. Thus, it is reasonable to insist that the test measure important aspects of the job, at least those for which appropriate measurement is feasible, but not that it measure all aspects, regardless of significance, in their exact proportions.

630 F.2d at 99 (emphasis in original).

Precise proportionality of each knowledge in relation to its importance need not be demonstrated. Similarly, measurement of each and every knowledge area is not required. Yet significant knowledge areas, comprising up to 30% of the job, may not be omitted from a written promotional exam simply because testing for such knowledge areas is difficult.[72] Rather, in order for a written exam to pass muster under the representativeness requirement, the exam must measure all, or nearly all, of the significant knowledge areas of a job in approximate proportion to each knowledge area's relative importance to the job.

Assessed in the light of that standard, the 1982 written exam is not adequately representative. While the written exam purported to measure the relative importance of 18 knowledge areas, the exam failed to measure those knowledge areas because of faulty test construction. Moreover, the deletion of 18 test items skewed the actual weight attributed to each knowledge area. Finally, knowledge areas and reading comprehension, the two SKAPs ostensibly measured by the written exam, were weighted too heavily in 1982 because of the reweighting of the written exam component.

5. The Scoring Requirement

As described supra, the 1982 promotional procedure was scored as follows: A candidate's scaled written exam score (weighted 40%) was combined with his promotional appraisal score (weighted 30%). On the basis of that composite score, the top 95 candidates were administered the oral examination (weighted 30%). Thus, a cutoff score was established after two parts of the procedure were administered. That cutoff score was not a score which indicated a candidate's ability to perform the job but was simply the composite score of the 95th candidate at that stage of the procedure. After the oral exam was scored, the 95 candidates were placed on the eligibility list in order of their rank. The rankings were derived by adding a candidate's weighted, scaled score on each of the three promotional procedure components. Although each candidate was not ranked solely on the basis of his written exam score, that score was a significant factor because of the heavy weighting given to it, i.e., 40% of the total promotional procedure score. Similarly, the written exam score was the most important factor in the cutoff score (57% of the composite score before administration of the oral exam).

The Guidelines provide that rank ordering should be employed only if it can be shown that "a higher score ... is likely to result in better job performance." Guidelines § 14(C)(9). Further, the Guidelines state that "evidence which may be sufficient to support the use of a selection procedure on a pass/fail (screening) basis may be insufficient to support the use of the same procedure on a ranking basis ...." Guidelines § 1607.5(G).[73]

*267 One court has summarized those Guidelines requirements by stating that "[i]f test scores do not vary directly with job performance, ranking the candidates on the basis of their scores will not select better employees." Guardians IV, supra, 630 F.2d at 100. Further, the inference that higher scores closely correlate with better job performance must be "closely scrutinized." Id. This close scrutiny is required because "[a] test may have enough validity for making gross distinctions between those qualified and unqualified for a job, yet may be totally inadequate to yield passing grades that show positive correlation with job performance." Id. at 100.

Courts have consistently disapproved of ranking where a test's content validity is suspect. In Guardians IV, supra, the Second Circuit stated that "the defects we noted in the job analysis and the test construction are substantial enough to preclude an inference that passing scores will correlate with job performance closely enough to justify rank-ordered selections." 630 F.2d at 101. Similarly, in Firefighters Institute v. City of St. Louis, 616 F.2d 350, 358 (8th Cir.1980), cert. denied sub nom. City of St. Louis v. United States, 452 U.S. 938, 101 S.Ct. 3079, 69 L.Ed.2d 951 (1981), the Court noted that the EEOC's "Questions and Answers ... specifically require empirical evidence that mastery of more knowledge is linked with better performance on the job." (Emphasis in original). Finding that no empirical evidence had been adduced demonstrating an association between levels of performance on the multiple choice exam and the job, the Court there held the rank-ordering was invalid. Id. at 357-60. Finally, in Berkman v. City of New York, supra, the Court criticized the City's rank-ordering of a firefighter's exam, stating, "neither the job analysis instrument, the test instrument, or the validation instruments appear able to perform their tasks with the precision necessary to justify the rank-ordering used here." 536 F.Supp. at 212. The Court went on to note that:

This conclusion does not necessarily mean that a future exam must be administered on a pass/fail basis. It does suggest, however, that some larger use of random selection within fewer, more rationally grounded ranks will have to be substituted for the present system, unless finer test instruments can be found.

Id. See also EEOC Questions and Answers, supra n. 76, Q. 62 which states that it is "easier" to make the inference of a relationship between higher scores and better job performance "the more closely and completely the selection procedure approximates the important work behaviors."

In the case at bar, the rank-ordering of the 1982 promotional procedure cannot be justified because the written exam, which was a substantial ingredient of a candidate's ranking, possessed insufficient validity and reliability.[74] Here, as described in detail, supra, there exist substantial defects in the written exam which dictate against any use of the test, particularly a scaled usage, which forms the basis for ranking. The 1982 written exam fails to test adequately for important knowledges, and there is no indication that there exists any association between levels of performance on the written exam and on the job. The written exam lacks another essential feature required before rank-ordering can be justified, namely, reliability. Specifically, an exam is said to be reliable if there is some likelihood that the exam will produce *268 consistent results among applicants who repeatedly take it or a similar exam. "Like content validity, reliability is not an all or nothing matter. It too comes in degrees. What is required is not perfect reliability, but rather a sufficient degree of reliability to justify the use being made of the test results." Guardians IV, supra, 630 F.2d at 101.

In the case at bar, no evidence has been produced to indicate that the 1982 written exam is reliable. Indeed, important indicia of reliability are absent.[75] The exam questions are not of high quality. Thus, there is no reason to believe success on one question will correlate with success on other questions.[76] Further, no reliability analyses were performed by the test designers prior to test administration. No pilot testing of questions was undertaken on a sample population, despite the fact that such was done in 1981. Nor was a "reliability estimate" computed. See, e.g., Rivera v. City of Wichita Falls, 665 F.2d 531, 537 (5th Cir.1982). While a post-test administration, upper-group/lower-group item analysis was performed after the 1982 test was administered, the results of that analysis were not relied upon by the City to demonstrate reliability and, indeed, the results seemingly do not indicate such. In addition, the reliability of the questions is suspect because of the high number of easy questions (17) and of the high number of inappropriate distractors (8). Easy questions — ones answered correctly by 90% or more of the candidates — contribute little to the value of a test because they do not differentiate among candidates based on their respective capabilities and "magnif[y] effects that may make scoring arrangements unjustified." Guardians IV, supra, 630 F.2d at 103 n. 19. Inappropriate distractors — "wrong" answers which attract more responses than the keyed "correct" response — also cast doubt upon the reliability of the questions because they suggest that another alternative might actually be correct.

In a given case, rank-ordering may well be valid. So may other alternatives such as random selection of candidates from within a group determined to have passed a content valid exam. See Association Against Discrimination in Employment v. City of Bridgeport, 594 F.2d 306, 313 n. 19 (2d Cir.), on remand, 479 F.Supp. 101 (D.Conn.1979), aff'd in part, vacated in part, and remanded, 647 F.2d 257 (2d Cir.1981), cert. denied, 455 U.S. 988, 102 S.Ct. 1611, 71 L.Ed.2d 847 (1982). As to the City's contention that the written exam cannot be graded on a pass/fail basis because Maryland state law requires rank-ordering, "Title VII explicitly relieves employers from any duty to observe a state hiring provision `which purports to require or permit' any discriminatory employment practice. 42 U.S.C. § 2000e-7 (1976)." Guardians IV, supra, 630 F.2d at 105.

6. Cutoff Score

Despite the absence of a cutoff score attributable solely to the written exam, there was a cutoff score for the first two components (written test and promotional appraisal) of the promotional procedure. The Guidelines require that a cutoff score "should normally be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force." Guidelines § 5(H). Here, the Commission has stated that the cutoff score was set at the point which met the City's anticipated needs for sergeant and not at the point where a sergeant could no *269 longer perform satisfactorily.[77] A cutoff score based on anticipated vacancies is appropriate only if the underlying scoring is valid. See Guardians IV, supra, 630 F.2d at 105. Because the rank-ordering of the 1982 promotional procedure is invalid, the cutoff score is likewise suspect. Varying performance on the 1982 written exam has not been found to be correlated with job requirements; thus, it cannot be said that candidate 95 would perform the sergeant's job better than candidate 96. In short, the cutoff score of 95 appears to be neither valid nor reliable. Moreover, like the rank-ordering procedure, the cutoff score appears to have contributed to the adverse racial impact and, accordingly, is invalid.

In sum, defendants have not met their burden of demonstrating that Exam No. 820508 is content valid. Significant defects exist in the 1982 written exam with respect to the job analysis, the test construction process, the direct relationship requirement, the representative requirement, and the scoring requirement.

D. CRITERION-RELATED VALIDITY

Defendants' criterion-related validity claim cannot withstand analysis. Criterion-related validity requires a demonstration from empirical data that the challenged test successfully predicts job performance. Guidelines §§ 5(B), 14(B). The required data "may be obtained by studying correlations of test scores of accepted candidates with their subsequent job performances, or correlations of the test scores of present employees with their current job performances." Guardians IV, supra, 630 F.2d at 92 n. 11. Developing the required data is difficult and criterion-related validation claims have often failed.[78] Here, defendants did not conduct a criterion-related study, an important if not essential, first step to criterion-related validation under the Guidelines. Guidelines §§ 5(B), 14(B). Rather, in response to Dr. Barrett's criticisms of the written exam set forth in his October 12, 1983 affidavit, the City in November, 1983 informally attempted to compare candidates' performances on the written exam with their promotional potential appraisal scores in order to ascertain whether a correlation existed. The City has admitted that it was able only to measure the correlation between performance on the written exam and performance as a police officer, not as a police sergeant. The City was not able to measure whether scores on the sergeant's written exam correlated with subsequent performance as a sergeant because only 15 candidates had been promoted to sergeant when the City did its correlation measurement. The City has recognized further that a sample size of 15 is too small to be deemed reliable. Indeed, the Guidelines so indicate. Guidelines, §§ 1607.14(B)(1) & 1607.16(U).[79] The Guidelines provide that a user is not required "to hire or promote persons for the purpose of making it possible to conduct a criterion-related study," § 1607.14(B)(1). In this case it appears that a criterion-related study may not have been feasible. Performance as a sergeant would not appear to be predictable from performance as a police officer in the absence of empirical data to support such an extrapolation.[80] Dr. Barrett expressed *270 such a view in his trial testimony. Additionally, Dr. Wagner, a MSI Vice President, stated during trial that the criterion "study" did not meet Guidelines standards under section 1607.14(B), and was only an "additional and supportive study."

Further, the promotional appraisals were completed after the results of the 1982 written exam had been made public. Thus, it is possible, as Mr. Wendland testified, that if supervisors knew a candidate's score was high, they may well have given that candidate high ratings, thereby producing an artificially high correlation. The Guidelines specifically caution against such possible "contamination," and state, "proper safeguards should be taken to insure that scores on selection procedures do not enter into any judgments of employee adequacy that are to be used as criterion measures." Guidelines § 14(B)(3). In Albemarle, supra, 422 U.S. at 432, 95 S.Ct. at 2379, Justice Stewart wrote: "While [the Guidelines] allow the use of supervisorial rankings in test validation, the Guidelines quite plainly contemplate that the rankings will be elicited with far more care than was demonstrated here." (Footnote omitted). The same applies in the case at bar.

Finally, the Guidelines contemplate a correlation between performance in the new job (as demonstrated by supervisory appraisals in that job) and the test used to select candidates for the new job. There exists no proof of any such correlation in the record before the Court. The only evidence of any such correlation was Dr. Wagner's rather conclusorily expressed opinion during trial to that effect. No studies, however, were introduced into evidence.[81]

For all of the reasons set forth supra, this Court concludes that defendants have not proven the 1982 written sergeant's examination to be job-related under any validation technique.

III. RELIEF

Once a violation of Title VII has been established, a district court possesses broad power to fashion appropriate relief, limited, however, by the goals of Title VII.[82] Title VII is designed to prevent discrimination, to achieve equal employment opportunity in the future, and to make whole the victims of past discriminatory practices. Id. The Second Circuit has subdivided Title VII relief into three categories: compliance relief, compensatory relief, and affirmative relief.[83] "Compliance relief is designed to erase the discriminatory effect of the challenged practice and to assure compliance with Title VII in the future.... Compensatory relief is designed to `make whole' the victims of the defendant's discrimination.... Affirmative relief is ... designed principally to remedy the effects of discrimination that may not be cured by the granting of compliance or compensatory relief.... Affirmative relief is normally justified only if the defendant's discrimination has been intentional ... or there has been a long-continued pattern of egregious discrimination." Berkman, supra at 595-96.

In this case, full relief has already been provided in connection with Title VII violations found to exist in the sergeant's promotional exams challenged in Vanguard I. An interim examination was administered on October 15, 1979 to all minority police officers and, as a result of that examination, 15 minority officers were promoted to the rank of sergeant by Order of this Court *271 dated November 27, 1979.[84] Subsequently, the parties entered into a partial settlement agreement on March 24, 1980 "for the purpose of full and final settlement of all issues ... arising out of or in connection with the police sergeant examination used in and by the City of Baltimore for the years 1972 to 1977 inclusive and the [1979] interim sergeants promotional examination ...."[85] That partial settlement agreement, which was approved by this Court, provided that 22 additional black candidates who had failed the interim sergeant's examination were "given the opportunity to proceed through the assessment center." ¶ 3. That additional "catch-up" type of relief was required in order to correct significant numerical imbalances and to provide plaintiffs with "make whole" relief up to and including the year 1979.

Because of such prior relief accorded to plaintiffs, this Court is not now presented with a situation of substantial unremedied discrimination extending over a number of years. Following 1979, the next sergeant's promotional procedure was administered in 1981. That procedure is unchallenged herein. Thus, only the effects of the 1982 sergeant's promotional procedure require redress. The City has made 15 promotions from the 1982 eligibility list; of those promoted, 12 candidates were white and 3 were black. As noted earlier, that ratio does not offend the Uniform Guidelines, nor add up to any meaningful disparity among black and white police officers seeking promotion to sergeant.

In that context, compliance relief in this litigation at this time requires an Order of this Court which will accomplish the following:

(1) Enjoin further use of the 1982 sergeant's examination and of the eligibility list resulting therefrom, except as otherwise approved by this Court in order to permit sufficient promotions by the Commissioner to the rank of sergeant to enable the Department to operate at top efficiency.

(2) Order that a new sergeant's examination be developed as expeditiously as possible, pursuant to a sergeant's promotional procedure which will comply with the Uniform Guidelines and the standards discussed in this opinion. While the City, of course, retains ultimate responsibility for the design of any sergeant's promotional procedure, it may well be useful if an expert selected by plaintiffs participates in each stage of the promotional procedure design process and reviews any written examination for its validity prior to that exam's administration. Such participation and pre-test review may go a long way toward ensuring that subsequent exams are not the subject of protracted litigation. So that, if the needs of the Department so require, interim promotion of sergeants can take place prior to the development and administration of a new, valid sergeant's promotional procedure, the parties are required to consult and to submit to this Court, in writing, on or before July 2, 1984, suggested procedures with respect to such interim promotions.

With respect to long-term remedies in this case, compensatory or "make whole" relief must include the opportunity for all members of the plaintiff classes to take and to attempt to pass a valid promotional procedure as soon as possible. Any member of the classes herein certified who performs successfully pursuant to such procedure and who is thereafter promoted shall receive back pay, retroactive seniority, and other appropriate retroactive benefits, if any. Because the first appointments from the invalid 1982 eligibility list were made in September, 1982, backpay, seniority and other benefits should begin to accrue as of the date that list was first used.

Plaintiffs argue that affirmative relief is required for individual class members who were not promoted to sergeant on the basis of the 1982 sergeant's promotional procedure. Specifically, plaintiffs contend that *272 black candidates should be permitted to undergo a reevaluation and that those black candidates deemed qualified should be promoted to the rank of sergeant. That procedure, which envisions promotion to the position of sergeant of only blacks, would constitute quota-type relief which this Court will not grant in the context of this case. Rather, each individual class member will be given full relief by this Court by the provision of a new, valid promotional procedure and the opportunity to obtain retroactive benefits if such class member is promoted.

"In the absence of intentional discrimination, affirmative relief requires some demonstrated pattern of significant prior discrimination." Guardians IV, supra, 630 F.2d at 112. Herein, the disparity between black and white sergeant promotions, based on the 1982 sergeant's promotional procedure and in light of the fact that relief for all discrimination prior to 1982 has already been given, does not constitute "significant prior discrimination." In contrast to the situation which existed within the Department in late 1979 when this Court issued its opinion in Vanguard I, there is presently no gross imbalance in the number of black and white sergeant promotions. In those cases in which courts have utilized quotas, flagrant disparity has existed.[86] In each Fourth Circuit published opinion known to this Court, less extreme remedies than the use of quotas have been deemed appropriate.[87] In Sledge v. J.P. Stevens & Co., 585 F.2d 625, 648-49 (4th Cir.1978), cert. denied, 440 U.S. 981, 99 S.Ct. 1789, 60 L.Ed.2d 241 (1979), the Court reversed the district court's determination that the use of a quota was the appropriate method of correcting past discrimination in that case and in so doing wrote: "[A]ssuming quotas are permissible elements of remedial decrees in employment discrimination cases, they are appropriate only under limited and `compelling' circumstances." 585 F.2d at 646. The use of quotas is hardly required when other effective relief is available or when an employer has made progress toward equal hiring. Quota-type relief has been said to be appropriate only if "the discrimination to be remedied has been egregious, purposive or blatant." Sledge, supra at 647. The instant case does not fall within the latter description.

As a final note, the United States Commission on Civil Rights' recent policy statement condemning the usage of quotas in employment contexts, while not binding on this Court, is instructive, and highlights the problems often associated with such relief.[88] The Commission stated that it "deplores *273 the [City of Detroit's] use of a racial quota in its promotion of sergeants as one of the methods for achieving its laudable objectives." 52 U.S.L.W. 2147 (1984).

In sum, affirmative relief is not justified in this case at this time. Compliance and compensatory relief is all that is needed. The parties are directed to confer and to submit a proposed Order to this Court on or before July 2, 1984 in the light of this opinion.

NOTES

[1] Vanguard Justice Society, Inc. v. Hughes, 471 F.Supp. 670, 675 (D.Md.1979) ("Vanguard I").

[2] Id. at 722-42.

[3] Plaintiffs are members of the Vanguard Justice Society, Inc., a group of black police officers. Four plaintiff race classes have previously been certified by this Court. See Vanguard I, supra, at 677.

[4] See "Baltimore City 1982 Police Sergeant Examination Validation Report" ("1982 Validation Report"), Defendants' Exhibit 1, at 5.

[5] Id. at 6.

[6] Eligibility List, Nov. 19, 1982, Plaintiffs' Exhibit 7. There exists some confusion about whether 95 or 96 candidates were administered the oral exam and placed on the eligibility list. The 1982 Validation Report indicates that 95 candidates were given the oral exam, while some witnesses at trial testified that 96 candidates were evaluated on the basis of the oral exam. Because the precise number of candidates who took the oral exam and who were placed on the eligibility list is not controlling herein, this Court assumes that the number, 95, is accurate.

[7] Mr. Wendland, Deputy Personnel Director of the Civil Service Commission, testified at trial that the first 16 promotional rankings had been utilized, including 13 whites and 3 blacks, but that the number one ranked candidate, who was white, resigned from the Department. Thus, the actual promotion statistics to date include 15 persons, 12 of whom are white and 3 of whom are black.

[8] Plaintiffs' expert, Dr. Richard S. Barrett, testified that "the severe adverse impact was based on the results of the written examination."

[9] The validity of the 1981 promotional procedure is not per se an issue in this litigation. Although plaintiffs have alleged in this case at various times that the 1981 promotional procedure, and all of its component parts, had a racially adverse impact, plaintiffs have never sought to have the Court determine the legality of the 1981 promotional procedure.

[10] The Ford Report was prepared by Hilda E. Ford, Personnel Director of the Commission on May 25, 1981, in connection with the 1981 promotional procedure.

[11] 1982 Validation Report, supra n. 4 at 1.

[12] MSI's test development procedures are described in detail in "Report on Development/Validation/Administration of Police Sergeant Examination for City of Baltimore: Volume 1: Job Analysis & Measure Development" ("1981 Job Analysis Report"), prepared by MSI, August 8, 1981, Defendants' Exhibit 2. MSI's job analysis and test construction procedures were designed to comply with the Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. § 1607 (1983) ("Guidelines"). The Guidelines establish general standards which should be adhered to in order to design valid selection procedures.

[13] By way of example, the "tasks" which a sergeant performs include "Reviews bulletins ... to become updated on new departmental policies," and "Completes quarterly performance evaluation reports on subordinates." "Job components" are broader in scope than specific tasks and include "General Report Review" and "Evaluating Subordinates' Performance." See 1981 Job Analysis Report, supra, at Summary Statistics on Job Components & Task Ratings. That first step of identifying tasks and job components and their relative importance was designed to meet the standards set forth in Guidelines, § 14A and § 14C(2).

[14] See note 16, infra, for Table of SKAP Clusters.

[15] By way of example, if the SKAP of supervisory ability were needed to perform the job components of "Direct Supervision" and "Evaluating Subordinates' Performance," the SKAP values for supervisory ability associated with each of those two job components would be added together.

[16] See 1981 Job Analysis Report at 11. Table 1, Description of Police Sergeant SKAP Clusters with Associated Job Analysis Points, provides:

  Knowledges                                              4182
  Ability to Size-up/Evaluate Situations/Conditions        650
  Decisiveness                                             239
  Judgement                                                655
  Objectivity                                              437
  Planning/Organizing                                      927
  Resourcefulness                                          313
  Sensitivity                                              683
  Supervisory Ability                                      968
  Listening/Attending to/Attentiveness                     314
  Oral Communication Skill                                 683
  Record Organization/Maintenance/Retrieval                207
  Report Preparation Ability                               236
  Written Communication Skill                              725
  Scheduling/Assigning                                     173
  Report Review Ability                                    236
  Reading Comprehension                                    434
  Logical Reasoning Ability                                422
  Verbal Comprehension                                     191
  Dependability/Responsibility                             313
  Honesty/Truthfulness/etc.                                612
  Ability to Act Under Stress/Pressure                     190
  Ability to Relate to Others (Interpersonal
       Skill)                                              406
  Calmness/Self Control/Poise                              268
  Patience                                                 303
  Neat, Appropriate Personal Appearance                    170
  Self-Restraint/Self-Disciplines                          134
  Memory for Facts/Ideas/Operations                         89

[17] A supervisory appraisal known as a "performance appraisal" was used in the 1981 promotional procedure. In 1982, the supervisory evaluation was redesignated as the "promotional appraisal."

[18] An assessment center was used in the 1981 promotional procedure while an oral examination was used in the 1982 procedure. Essentially, the assessment center was a "mock laboratory" where candidates were evaluated by trained raters. Candidates were given problems to read and were required to prepare a response and orally to defend that response. The oral examination, used in 1982, was a more limited oral interview rather than a full-scale problem-solving exercise.

[19] MSI based its decision on which measurement component to use for each SKAP on, inter alia, two factors: first, the known or suggested disparate impact of that component; and second, the time, cost and efficiency of each component. That decision dictated the content and weight of each of the three components of the 1981 promotional procedure. There is no indication that the disparate impact factor was considered in 1982 in deciding which test component should be used to measure each SKAP.

[20] The SKAPs to be measured by each test component are set forth in Table 2 of the 1981 Job Analysis Report at 13:

Table 2: Results of Measurement Mode Determination by Measurement Expert Panel
-------------------------------------------------------------------------------------------------
SKAP CLUSTER                               TYPE OF MEASUREMENT CATEGORY
                                             WRITTEN     ASSESSMENT        PERF.           NOT
                                              TEST          CENTER       APPRAISAL     MEASURABLE
-------------------------------------------------------------------------------------------------
Knowledge Areas                                X
Ability to Size-up/Evaluate                                   X              X
Decisiveness                                                  X              X
Judgement                                                     X              X
Objectivity                                                   X              X
Planning/Organizing                                           X
Resourcefulness                                               X              X
Sensitivity                                                   X              X
Supervisory Ability                                           X
Listening/Attending
    to/Attentiveness                                          X              X
Oral Communication Skill                                      X
Record Organization/Maintenance/Retrieval                     X              X

Table 2: Results of Measurement Mode Determination by Measurement Expert Panel
-------------------------------------------------------------------------------------------------
SKAP CLUSTER                               TYPE OF MEASUREMENT CATEGORY
                                             WRITTEN     ASSESSMENT        PERF.           NOT
                                              TEST          CENTER       APPRAISAL     MEASURABLE
-------------------------------------------------------------------------------------------------
Report Preparation Ability                                    X              X
Written Communication Skill                                   X
Report Review Ability                                         X
Scheduling/Assigning Ability                                  X
Reading Comprehension                                         C
Logical Reasoning Ability                                     C
Verbal Comprehension                                          C
Dependability/Responsibility                                                 X
Honesty/Truthfulness/etc.                                                    X
Ability to Act Under
    Stress/Pressure                                                          X
Ability to Relate to Others                                                  X
Calmness/Self-Control/Poise                                                  X
Patience                                                                     X
Neat, Appropriate Personal
    Appearance                                                               X
Self Restraint/Self Discipline                                               X
Memory for Facts/Ideas/Operations                             C
(X indicates direct measurement by measurement mode; C indicates measurement is indirect through
content of the exercises used).

[21] The following SKAPs were not tested for in the 1982 selection procedure:

                                        Job
                                      Analysis
                  SKAP                 Points
Objectivity                              437
Resourcefulness                          313
Supervisory Ability                      968
Record Organization/Maintenance
  Retrieval                              207
Scheduling/Assigning                     173
Report Review Ability                    236
Logical Reasoning Ability                422
Verbal Comprehension                     191
Dependability/Responsibility             313
                                        Job
                                       Analysis
                  SKAP                 Points
Honesty/Truthfulness                     612
Ability to Relate to Others
  (Interpersonal Skill)                  406
Calmness/Self Control/Poise              268
Patience                                 303
Neat, Appropriate Personal Appearance    170
Self-Restraint/Self-Disciplines          134
Memory for Facts/Ideas/Operations         89

Thus, out of a total of 15,160 job analysis points, 5,242, or 34%, were not included in the 1982 job analysis and were not tested for.

[22] 1982 Validation Report, supra, at 2.

[23] See note 8, supra.

[24] Knowledges, assigned 4182 job analysis points, was the only SKAP tested for in the written test in 1981. The number of test items drawn from each knowledge area in 1981 was as follows:

        KNOWLEDGE AREA                    # ITEMS
Knowledge concerning Arson                    1
Knowledge of Evidence Collection,
 Preservation, Retrieval, Etc.               12
Knowledge concerning Warrants                 4
Knowledge of Traffic Laws/Accident
 Investigation                                2
Knowledge of Search and Seizure               4
Knowledge of Arrest                           3
Knowledge concerning Discipline/Grievance     9
Knowledge of Crime-Related Matters           14
Knowledge concerning Reports/Record
 Keeping                                     10
Knowledge of Court-Related Activities         4
Knowledge of Patrol-Related Matters           5
Knowledge of Department
 Organization/Functions                      12
Knowledge of Personnel-Related
 Matters                                      5
        KNOWLEDGE AREA                    # ITEMS
Knowledge of Surveillance-Related             1
Knowledge of Non-Criminal
 Investigation                                2
Knowledge of Emergency
 Practices/Crime Prevention                   5
Knowledge of Inspection-Related               3
Knowledge of Equipment Use                    4

[25] Items were rated "easy," "moderate," or "difficult."

[26] An item is said to be a good discriminator if it is predictive of and consistent with a candidate's overall score on an examination. In 1981, MSI calculated item discrimination levels by comparing the percentage of the top 27% answering the item correctly and the percentage of the bottom 27% answering the item correctly. See 1981 Job Analysis Report, supra, at 20.

[27] The level of disparate impact was classified as follows:

100%             - no disparate impact
80-99.9%         - minimum acceptable impact
60-79.9%         - questionable disparate impact
Less than 60%    - unacceptable impact

[28] Report on Development/Validation/Administration of Police Sergeant Examination for City of Baltimore, Volume Two: Pilot Testing/Validation, at 10.

[29] Id. at 11.

[30] Although it is certainly possible that some of the Commission personnel trained by MSI to develop test items in 1981 were the same personnel who wrote items for the 1982 written exam, that is not reflected in the 1982 Validation Report. Of course, even were there considerable or complete overlap in personnel, training in the writing of items is only one step in assuring that test items are properly constructed.

[31] 1982 Validation Report, supra, at 3.

[32] Id.

[33] Id.

[34] Id.

[35] Twelve items were deleted because they were "too difficult," three because there were "two answers," and the remaining three because there was "no answer." 1982 Validation Report at 3.

[36] Id. at 8. The Department explains the selection of 95 candidates on this basis: given the weight the Department assigned to each component of the test and given the Department's assumption that scores in connection with the two components of the selection procedure given to all candidates correlated with a candidate's score on the oral interview, a candidate with a low composite score on the written exam and promotional appraisal would be unlikely to rank among the top 40. Forty was deemed by the Commission to be the maximum number of officers expected to be promoted during the two-year life of the list. Therefore, a cutoff score was established at the point where the Department believed that "there was very little chance of a Candidate [sic] scoring in the top 40 places." Id. at 9.

[37] See note 41 infra.

[38] See "Critique of Report on 1982 Sergeant Police Validation Report," ("March 23, 1983 Critique") by Richard S. Barrett, March 23, 1983 at 2 (Plaintiffs' Exhibit 2). See also note 8, supra.

[39] See also United States v. City of Chicago, 549 F.2d 415, 428 (7th Cir.), cert. denied, 434 U.S. 875, 98 S.Ct. 225, 54 L.Ed.2d 155 (1977); Kirkland v. New York State Dept. of Corrections, 520 F.2d 420, 425 (2d Cir.1975), cert. denied, 429 U.S. 823, 97 S.Ct. 73, 50 L.Ed.2d 84 (1976). Barnett v. W.T. Grant Co., 518 F.2d 543, 549 (4th Cir.1975).

[40] See also United States v. County of Fairfax, Va., 629 F.2d 932, 942 (4th Cir.1980); Ensley Branch of N.A.A.C.P. v. Seibels, 616 F.2d 812, 823 n. 27 (5th Cir.1980); Firefighters Institute for Racial Equality v. City of St. Louis, 549 F.2d 506, 510 (8th Cir.), cert. denied, 434 U.S. 819, 98 S.Ct. 60, 54 L.Ed.2d 76 (1977).

[41] In 1970, the EEOC promulgated its Guidelines on Employee Selection Procedures, 35 Fed. Reg. 12333 (Aug. 1, 1970). Those EEOC Guidelines were superseded in 1978 when the EEOC adopted the Uniform Guidelines on Employee Selection Procedures, 43 Fed.Reg. 38290, 40223 (Aug. 25, 1978, Sept. 11, 1978). Both the EEOC and the Uniform Guidelines deal with job relatedness. The differences between the two are seemingly not significant herein. See, e.g., Vanguard, I, 471 F.Supp. at 730.

[42] The 1982 Validation Report states that the adverse impact of the 1982 promotional procedure as a whole was 62.9%, which falls below the permissible 80% figure established in the Guidelines. See 1982 Validation Report at 10. Dr. Barrett testified that the oral examination and the promotional appraisal had no adverse racial impact violative of the 80% rule of thumb. Accordingly, Dr. Barrett attributed the adverse impact to the written exam. See March 23, 1983 Critique, supra, at 2.

[43] The parties originally agreed at trial on January 5, 1984 not to promote past candidate 23 on the eligibility list, the point at which adverse impact would appear. Later, on January 17, 1984, the parties agreed to freeze promotions at 15.

[44] See, e.g., Ensley Branch of N.A.A.C.P. v. Seibels, 616 F.2d 812, 822 n. 22 (5th Cir.), cert. denied, 449 U.S. 1061, 101 S.Ct. 783, 66 L.Ed.2d 603 (1980); United States v. City of Chicago, 549 F.2d 415, 430 (7th Cir.), on remand, 437 F.Supp. 256 (N.D.Ill.), aff'd, 567 F.2d 730 (7th Cir.1977), cert. denied, 436 U.S. 932, 98 S.Ct. 2832, 56 L.Ed.2d 777 (1978); Firefighters Institute for Racial Equality v. City of St. Louis, 549 F.2d 506, 510 (8th Cir.), cert. denied, 434 U.S. 819, 98 S.Ct. 60, 54 L.Ed.2d 76 (1977); Kirkland v. New York State Dept. of Correctional Service, 520 F.2d 420, 426 (2d Cir.1975), cert. denied, 429 U.S. 823, 97 S.Ct. 73, 50 L.Ed.2d 84 (1976); Douglas v. Hampton, 512 F.2d 976, 986 (D.C.Cir.1975); Burney v. City of Pawtucket, 559 F.Supp. 1089, 1101 (D.R.I.1983); Corley v. City of Jacksonville, 506 F.Supp. 528, 536 (M.D.Fla.1981); Vulcan Society of New York City Fire Dept., Inc. v. Civil Service Comm'n, 360 F.Supp. 1265, 1273 n. 23 (S.D.N. Y.), aff'd in part, remanded in part, 490 F.2d 387 (2d Cir.1973) ("Vulcan Society").

[45] Defendants have repeatedly asserted that Exam No. 820508 is content valid under the Guidelines. Near the close of trial defendants additionally contended that the 1982 written exam is also valid under the Guidelines standards for criterion-related validation.

[46] See also Firefighters Institute for Racial Equality v. City of St. Louis, supra, 549 F.2d at 511; Vanguard I, 471 F.Supp. at 736-40.

[47] Berkman v. City of New York, 536 F.Supp. 177, 208 (E.D.N.Y.1982), aff'd, 705 F.2d 584 (2d Cir.1983); Guardians IV, supra at 95. See also County of Fairfax, supra, 629 F.2d at 942.

[48] See Guardians IV, supra at 95.

[49] Kirkland v. New York State Dep't of Correctional Services, 374 F.Supp. 1361, 1378 (S.D.N.Y. 1974), aff'd in relevant part, 520 F.2d 420 (2d Cir.1975), cert. denied, 429 U.S. 823, 97 S.Ct. 73, 50 L.Ed.2d 84 (1976); Vulcan Society, supra, 360 F.Supp. at 1274.

[50] Firefighters Institute v. City of St. Louis, supra, 616 F.2d at 357-59; Ensley Branch of N.A.A. C.P. v. Seibels, supra, 612 F.2d at 822; Berkman v. City of New York, supra, 536 F.Supp. at 210-12; Ass'n Against Discrimination v. City of Bridgeport, 454 F.Supp. 751, 756-57 (D.Ct.1978), vacated and remanded on other grounds, 594 F.2d 306 (2d Cir.1979).

[51] While it is only the written exam which is challenged herein, the job analysis discussion in this opinion relates to all of the 1982 promotional procedure and not merely to the written exam. Thus, in discussing the 1982 job analysis, reference will be made to all parts of that procedure, rather than merely the written exam itself.

[52] Knowledges, receiving 4182 job analysis points in 1981, was the most important SKAP and was the only SKAP tested for in the written exam component of the 1981 promotional procedure.

[53] Mr. Wendland of the Commission testified at trial that the assessment center was too expensive to administer and, accordingly, was replaced by an oral examination of a far narrower nature than that used in connection with the 1981 promotional procedure.

[54] The 1981 weightings were as follows: written exam, 25%; performance appraisal, 30%; assessment center, 45%. It is apparent that the new weightings had a racially adverse effect, because the written exam, weighted more heavily in 1982, was the promotional procedure component which caused the disparate impact.

While the City claimed at trial that it was unaware that the written exam had an adverse impact when the new weightings were established, the City's decision to abide by the new weightings, in the face of the demonstrated adverse impact of the written exam, may in and of itself violate the Guidelines standards on alternate selection procedures.

[55] While all three portions of the promotional procedure were devised on the basis of a suspect job analysis, content validity itself is investigated only with respect to the written exam, as it is only that part of the 1982 promotional procedure which impacted adversely upon black candidates.

[56] See, e.g., Guardians IV, supra, 630 F.2d at 106; Kirkland, supra, 520 F.2d at 426; Vulcan Society, supra, 360 F.Supp. at 1275; Berkman v. City of New York, 536 F.Supp. 177, 210 (E.D.N. Y.1982), aff'd, 705 F.2d 584 (2d Cir.1983). Cf. Bridgeport Guardian v. Bridgeport Police Dept., 431 F.Supp. 931 (D.Conn.1977), in which Judge Newman, then a district judge, held valid a test prepared by a management consultant firm. See also Guidelines, § 1607.9(B) which provides:

B. Encouragement of professional supervision. Professional supervision of selection activities is encouraged but is not a substitute for documented evidence of validity. The enforcement agencies will take into account the fact that a thorough job analysis was conducted and that careful development and use of a selection procedure in accordance with professional standards enhance the probability that the selection procedure is valid for the job.

[57] The knowledges to be tested for on the 1982 written exam were:

Arson                                          0
Evidence Collection                           13
Warrants                                       5
Traffic Laws & Investigations                  0
Search/Seizure                                 5
Arrest                                         0
Disciplinary Actions/Grievances               10
Crime-Related Matters                         15
Reports-Recordkeeping                         11
Court-Related                                  5
Patrol                                         5
Departmental Organization                     15
Personnel Rules                                5
Surveillance                                   0
NonCriminal Investigation                      0
Emergency Practices/Crime Prevention           6
Inspections                                    0
Equipment Use                                  0
Reading Comprehension                         20
                                             ___
TOTAL                                        115

[58] In its 1982 Validation Report the Commission justifies inclusion of reading comprehension in three ways. First, the Commission claims reading comprehension was included in the 1982 written exam because it was "rated so heavily in the job analysis." In the 1981 job analysis, to which the Report refers, reading comprehension received 434 points. While this is not an insignificant number of points, other SKAPs with larger point totals were eliminated in 1982, at the same time that the Commission added reading comprehension. Second, the Commission states that inclusion was justified because "it is appropriate to test reading comprehension in a written test." That assertion contradicts the 1981 conclusion by defendants' own expert, MSI, that it was inappropriate to test reading comprehension directly. (See note 20 supra). Finally, the Commission seeks to justify the inclusion of reading comprehension by pointing to Dr. Barrett's comment concerning reading level differences between whites and blacks. But that does not necessarily require that reading comprehension be tested for in the written exam.

[59] The knowledge area of "Reports" received 343 job analysis points in 1982 and was to be allocated eleven test items. Four questions were deleted by the Commission in 1982 before scoring, leaving only seven questions in this important knowledge area. Similarly, the knowledge area of "Search and Seizure" received 137 job analysis points and was to be tested for in five questions. However, after three search and seizure questions were deleted before scoring, only two questions measured that knowledge area.

[60] 1982 Validation Report, supra, at 3.

[61] Guidelines § 1607.5(F) entitled, "Caution against selection on basis of knowledges, skills, or abilities learned in brief orientation period," states:

Caution against selection on basis of knowledges, skills, or ability learned in brief orientation period. In general, users should avoid making employment decisions on the basis of measures of knowledges, skills, or abilities which are normally learned in a brief orientation period, and which have an adverse impact.

In a similar vein, section 1607.14(C)(1), which outlines the standards for content validity studies, provides, in relevant part:

Content validity is also not an appropriate strategy when the selection procedure involves knowledges, skills, or abilities which an employee will be expected to learn on the job.

[62] See March 23, 1983 Critique, supra note 38 at 8.

[63] See March 23, 1983 Critique, supra, at 9.

[64] Id.

[65] See October 12, 1983 affidavit of Dr. Richard S. Barrett, Plaintiffs' Exhibit 4 at 9, in which Dr. Barrett concluded "that the 1982 Police Sergeant Examination is not professionally developed ... and is not job-related according to the standards set forth in the Uniform Guidelines on Employee Selection Procedures."

In response to Dr. Barrett's criticisms, Mr. Wendland testified at trial that the Commission eliminated the 20 items to which Dr. Barrett had objected in his October 12, 1983 affidavit and rescored the written exam. Mr. Wendland testified that the correlation between the 97-item test and the "rescored" 77-item test was 96%. Mr. Wendland further stated that rescoring did not cause any significant change with respect to adverse racial impact and that of the 95 candidates on the 1982 Eligibility List, 85 of the 95 would have appeared on the list based on a 79-question written exam.

The record reveals, however, that defendants at no time have submitted any data supporting their 96% correlation. Moreover, even if this Court were to accept as correct the 96% correlation, such a slender reed would not in itself support a finding of job-relatedness in the face of the record herein in its entirety.

[66] The 17 questions were: 3, 8, 11, 22, 29, 35, 37, 38, 42, 59, 61, 64, 65, 68, 70, 71 and 74.

[67] Those questions include: 19, 22, 30, 31, 32, 35, 36, 37, 40, 41, 42, 46, 48, 52, 61, 70, 71.

[68] Those questions include: 2, 3, 8, 11, 29, 34, 38, 59, 64, 65, 68, 74.

[69] In that regard, Major Norris referred specifically to questions 101, 102, 104, 106, 111 and 114.

[70] See notes 66-69 and accompanying text.

[71] Crime related matters received 486 job analysis points.

[72] In Guardians IV, supra, Judge Newman observed that "[t]he inadequate assessment of human relations skill lessens the representativeness of the exam and consequently lessens its degree of content validity, but this deficiency is not fatal, especially in light of the difficulty of assessing such an abstract ability." 630 F.2d at 99.

[73] Section 1607.5(G) of the Guidelines states in full:

G. Method of use of selection procedures. The evidence of both the validity and utility of a selection procedure should support the method the user chooses for operational use of the procedure, if that method of use has a greater adverse impact than another method of use. Evidence which may be sufficient to support the use of a selection procedure on a pass/fail (screening) basis may be insufficient to support the use of the same procedure on a ranking basis under these guidelines. Thus, if a user decides to use a selection procedure on a ranking basis, and that method of use has a greater adverse impact than use on an appropriate pass/fail basis (see section 5H below), the user should have sufficient evidence of validity and utility to support the use on a ranking basis. See sections 3B, 14B(5) and (6), and 14C(8) and (9).

[74] Rank-ordering also may not be appropriate because the 1982 promotional procedure, as a whole, fails to test for critical work behaviors, such as supervisory ability. The exclusion of significant work behaviors casts doubt upon the ability of the promotional procedure to determine with precision those candidates best able to perform as sergeants.

[75] Only two post-test item analyses were performed in 1982. One analysis was an upper-group/lower-group item analysis and the other analysis was a racial impact analysis.

The results of both item analysis are set forth in the 1982 Validation Report.

[76] An item discrimination (upper-group/lower-group) analysis was conducted by the Commission subsequent to the administration of the written exam. However, there was no testimony as to whether the test items exhibit a high or low degree of discrimination. Further, while the 1982 Validation Report includes charts of the item discrimination analyses, those charts do not demonstrate a high degree of discrimination for most items.

[77] 1982 Validation Report at 8.

[78] See, e.g., United States v. Chicago, 549 F.2d 415, 430-32 (7th Cir.), cert. denied, 434 U.S. 875, 98 S.Ct. 225, 54 L.Ed.2d 155 (1977); Douglas v. Hampton, 512 F.2d 976, 985-86 (D.C.Cir.1975); Vulcan Society, supra, 490 F.2d at 395 & n. 10; Bridgeport Guardians, Inc. v. Civil Service Commission, 354 F.Supp. 778, 792 (D.Conn.), aff'd in part, rev'd in part, 482 F.2d 1333 (2d Cir.1973), cert. denied, 421 U.S. 991, 95 S.Ct. 1997, 44 L.Ed.2d 481 (1975).

[79] See, e.g., Burney v. City of Pawtucket, 559 F.Supp. 1089, 1102 (D.R.I.1983) (sample size of 30 police officers noted as "too scanty" for criterion-related validity study).

[80] If defendants could prove that successful performance as a sergeant can be accurately predicted from successful performance as a police officer, then defendants could have relied exclusively on the promotional appraisal as a promotional procedure. No written exam would have been required. After all, if the promotional appraisal, which has no adverse impact, does predict success as a sergeant, then it might well be the preferable promotional procedure.

[81] In addition, other flaws exist in the City's criterion-related validity claim. The City did not conduct a fairness study, which is required by the Guidelines where technically feasible. § 1607.14(B)(8). Further, the flaws in the job analysis, which cast doubt upon defendants' claim of content validity, have the same effect here, as an accurate job analysis is required for criterion-related validation. Guidelines § 1607.15(B)(2).

[82] See International Brotherhood of Teamsters v. United States, 431 U.S. 324, 364, 97 S.Ct. 1843, 1869, 52 L.Ed.2d 396 (1977); Franks v. Bowman Transp. Co., 424 U.S. 747, 764, 96 S.Ct. 1251, 1264, 47 L.Ed.2d 444 (1976); Albemarle, supra, 422 U.S. at 417-18, 95 S.Ct. at 2371-72; Griggs, supra 401 U.S. at 429-30, 91 S.Ct. at 852-53.

[83] Berkman v. City of New York, 705 F.2d 584, 595 (2d Cir.1983).

[84] See court file, Docket No. 119.

[85] See court file, Docket No. 161.

[86] Association Against Discrimination, supra, 594 F.2d at 308 (minorities constituted 0.2% of employees, 41% of population; quota vacated for reconsideration and findings); EEOC v. Local 14, International Union of Operating Engineers, 553 F.2d 251, 256 (2d Cir. 1977) (minorities constituted 2.8% of union members, at least 16.2% of relevant labor force; judgment including quota vacated for further findings); Patterson, supra, 514 F.2d at 770, 772 (minorities constituted 2.45% of union and union-affected job-seekers, 30% of relevant labor force; quota sustained); Bridgeport Guardians, supra, 482 F.2d at 1335 (minorities constituted 3.6% of employees, 25% of population; hiring quota sustained).

Guardians IV, supra, 630 F.2d at 113.

[87] See, e.g., Sledge v. J.P. Stevens & Co., Inc., 585 F.2d 625, 648-49 (4th Cir.1978), cert. denied, 440 U.S. 981, 99 S.Ct. 1789, 60 L.Ed.2d 241 (1979); White v. Carolina Paperboard Corp., 564 F.2d 1073, 1091-92 (4th Cir.1977); Harper v. Kloster, 486 F.2d 1134, 1136 (4th Cir.1973).

[88] Discrimination

RACIAL DISCRIMINATION —

Use of racially preferential quotas in promotions should be avoided as remedy for racial discrimination in employment.

The City of Detroit is commended for its desire to eradicate racial discrimination in its police department and to increase the number of blacks in its force. However, the U.S. Commission on Civil Rights deplores the city's use of a racial quota in its promotion of sergeants as one of the methods for achieving its laudable objectives.

The Detroit Police Department engaged in pervasive discrimination against blacks from at least 1943 to the 1970s in all phases of its operations. In 1974, the city voluntarily adopted an affirmative action plan, which alters the method whereby sergeants are promoted to lieutenants. Prior to 1974, candidates for promotion who scored a minimum of 70 on a written test were ranked on a single list, and accorded a numerical rating based upon a number of factors. Promotions were given to the highest ranking candidates on the list in numerical order.

The affirmative action plan does not change the basic criteria for determining which sergeants receive promotions. However, the plan requires that two separate lists be compiled — one for black sergeants and the other for white sergeants. Promotions are made alternately from each list so that one black officer is promoted for each white officer until 50 percent of the lieutenants are black, an event not expected to occur until 1990.

Enforcement of nondiscrimination law in employment must provide that all discriminatory practices cease, and that any identifiable, direct victim of discrimination be returned to the place he or she would have had in the absence of the discrimination. Such relief should also accord the victim a higher seniority status than that of an innocent employee, who would have been junior to the victim in the absence of the discrimination. The innocent third party properly must share the burden of his or her employer's discrimination against identifiable victims.

The use of affirmative action techniques as tools to enhance equal opportunity for all citizens, rather than as devices to penalize some on account of their non-preferred racial, gender, or other status, should be required of employers found to have discriminated, and encouraged for all employers who wish to improve the quality of their work force. These techniques include additional recruiting efforts aimed at qualified minority or female applicants, and training, educational, and counseling programs for applicants and employees, targeted to attract minority participants.

"Simple justice" is not served, however, by preferring nonvictims of discrimination over innocent third parties solely on account of their race. Such racial preferences merely constitute another form of unjustified discrimination, create a new class of victims, and, when used in public employment, offend the constitutional principle of equal protection of the law. The Detroit Police Department's promotion quota benefits nonvictims as well as victims of discrimination, in derogation of the rights of innocent third parties, solely because of their race. Accordingly, it is a device that should be eschewed, not countenanced.

The Commission also rejects an "operational needs" justification for racial quotas. The city asserts that the promotion quota was necessary to increase black police officers at all ranks, in order to achieve more effective law enforcement and reduce discriminatory treatment against black citizens. This amounts to little more than a claim that only black police officers can effectively provide law enforcement services to black citizens or supervise lower-ranking black officers. Such a claim has no place in a free, pluralistic society. If accepted, it would justify a claim that members of a racial or ethnic group can be properly served or treated only by fellow members of that group. This would turn the clock back to the "separate but equal" days of the past, when public entities dispensed benefits, entitlements, and penalties of all kinds on the basis of a person's skin color.

— U.S. Commission on Civil Rights; Statement, 1/17/84

52 U.S.L.W. 2147 (1984) (emphasis in original).