Firefighters Institute for Racial Equality v. City of St. Louis

Related Cases

HEANEY, Circuit Judge.

I.

PROCEDURAL HISTORY.

This is the third appeal to come before us involving the employment practices of the St. Louis Fire Department. We initially held that a 1974 promotional examination for the position of fire captain had a racially disparate effect. W.e remanded the matter to the district court with directions that it maintain jurisdiction until it approved a promotional examination validated in accordance with EEOC guidelines. Firefighters Institute, Etc. v. City of St. Louis, 549 F.2d 506 (8th Cir.), cert. denied, 434 U.S. 819, 98 S.Ct. 60, 54 L.Ed.2d 76 (1977). In the second appeal, we noted that a valid promotional examination had still not been approved by the district court, that the number of black fire captains had decreased from four to one, and that fifty fire captain positions were vacant. We again remanded the matter to the district court with directions to it to order St. Louis to immediately promote twelve qualified black firemen, and authorized it to order St. Louis to promote an equal number of white firefighters. We fixed a deadline of January 1, 1979, for validating a promotional examination. Firefighters Institute v. City of St. Louis, Mo., 588 F.2d 235 (8th Cir. 1978), cert. denied, - U.S. -, 99 S.Ct. 3096, 61 L.Ed.2d 872 (1979).

Pursuant to our remand, the district court, on December 19, 1978, ordered the immediate promotion of twelve black fire-, fighters to the position of fire captain and directed St. Louis to develop a valid promotional examination by January 1, 1979.

On December 29, 1978, St. Louis filed with the district court a copy of its “Validation Report for the Position of Fire Captain in the City of St. Louis.” This report had apparently been under preparation since the summer of 1978. The report called for a two-part examination consisting of a written multiple choice portion, which would be weighted 30% in the total score, and an assessment center portion, which would be weighted 70% in the total score. The government received a copy of the report on January 3, 1979.

By letter of January 29, 1979, the United States expressed to St. Louis two areas of concern with respect to the proposed examination: the weight to be given the written portion of the test and the manner of administration and supervision of the assessment center portion of the test. The United States suggested that the selection process proceed without resolving these concerns so that the existence of adverse impact on blacks could be determined prior to any further litigation, a process which would additionally delay filling the vacancies. The Firefighters Institute for Racial Equality (FIRE), the black firefighters organization, did not object to the suggestion. The white firefighters organization, intervenors in the action, approved it. St. Louis accepted the suggestion and, on February 1, 1979, it filed a report informing the district court of the agreement to proceed with the examination with all parties reserving their rights to challenge the examination for failure to comport with the guidelines after the results of the examination were known.

St. Louis administered the multiple choice portion of the examination on February 27, 1979, and informed the United States of the *354results on March 13. The results showed a substantial difference in the mean scores of blacks and whites. Although the United States again noted its concern that this portion of the test was not content valid, it agreed with St. Louis that the selection process could proceed with the government reserving its right to challenge the validity of the test if it was found to impact adversely on blacks. The assessment center portion of the examination was administered on April 9, 1979, and thereafter was graded.

On Thursday, April 26, 1979, the Mayor of St. Louis, without consulting either of the plaintiffs or the City Attorney’s office, announced the results of the combined portions of the test, proceeded to have an eligibility list certified and promulgated, and set in motion the promotion process. Sixteen white firefighters were appointed fire captains the following Monday morning, and seven additional whites and one black were scheduled to be appointed that afternoon. The United States learned that an eligibility list had been certified on April 26, 1979, but was assured by counsel for St. Louis that no appointments would be made for at least a week. Attorneys for the United States and St. Louis agreed to meet on Monday afternoon; but by the time of their meeting, sixteen white captains had already been sworn in and thirty-two more were scheduled to be sworn in, eight on Monday afternoon and twenty-four on Wednesday. The United States agreed that the eight promotions scheduled for Monday afternoon could be made, and St. Louis agreed to refrain from further promotion until at least May 11.1 The United States then sought and secured a temporary restraining order from the United States District Court blocking further appointments. A week later, on May 22, a hearing on the government’s motion for a preliminary injunction was commenced. On June 4, 1979, the district court, after an extensive hearing, denied preliminary injunctive relief, vacated its restraining order and denied a motion for a “stay pending appeal.

In the memorandum accompanying the denial, the district court noted that notwithstanding its prior explicit directives that any proposed selection process not be used until properly validated, the plaintiffs had chosen to ignore the procedures outlined by the court and had not contested the examination prior to its administration.2 *355The district court, assuming that the black firefighters would suffer irreparable injury if the appointments were to be made on the basis of the examination results, held that because the plaintiffs had not established a likelihood of success on the merits, they were not entitled to preliminary injunctive relief. It reasoned that even though the plaintiffs had established a prima facie case of discriminatory impact, the defendant had rebutted that case by establishing the validity and job-relatedness of the examination process. The court concluded:

[T]he criticisms levelled at the examination simply do not defeat the overwhelming evidence that this examination was valid. * * *
Based upon the record presented to this Court, the Court must conclude that plaintiffs have not fulfilled their burden of establishing probable success on the merits. Although the examination resulted in a disparate impact upon black applicants, it was sufficiently validated. Thus, defendants have fulfilled their obligations under the law.

II.

CONTENTIONS OF THE PARTIES

The United States contends on appeal that the district court erred in holding that the appellants were not likely to succeed on the merits of their challenge to the examination. It asks this Court to reverse the district court and to direct St. Louis to promptly fill at least 25% of the present fire captain vacancies with black firefighters. It also asks that St. Louis be directed to continue to fill vacancies on a ratio of three whites to one black until a new eligibility list based on a properly validated examination is developed. Its contentions are generally supported by FIRE.

St. Louis asks us to affirm the district court’s denial of preliminary relief. It argues that the district court correctly held that the plaintiffs were not likely to succeed in their challenge to the validity of the examination and, further, that even if the court erred in that determination, it correctly denied preliminary relief because the plaintiffs had suffered no irreparable injury. Finally, St. Louis contends that even if we find the district court incorrectly denied preliminary relief, the matter must be remanded for (1) the district court’s determination of the scope of such preliminary relief, and (2) the district court’s final determination of the validity of the examination.

III.

LIKELIHOOD OF SUCCESS ON THE MERITS

In our first opinion in this case, we determined that the EEOC guidelines applied to the development of a procedure for selecting which of St. Louis’s firefighters should be promoted to fire captain positions. Firefighters Institute, Etc. v. City of St. Louis, supra, 549 F.2d at 510. We ordered St. Louis to develop a promotional examination that was valid under those guidelines. Id. at 513. The issue of likelihood of success on *356the merits is, therefore, to be decided on the basis of whether the examination is valid under the EEOC guidelines.3 This requires a two-step analysis: (1) Did the selection procedure have an adverse impact on blacks? (2) If so, has St. Louis nevertheless shown that the examination complies with the EEOC’s rules for determining the validity of the selection procedure? We agree with the district court’s finding that the procedure had an adverse impact on blacks, but disagree with its determination that the examination was valid.4

A. Adverse Impact.

To determine whether blacks were adversely impacted by the examination, we have computed the number of blacks and whites who will be selected (1) if the sixty-two existing vacancies in the position of fire captain are filled from the eligibility list resulting from the examination, and (2) if the 120 expected promotions are made from the same eligibility list.5

An analysis of the examination results reveals that, in filling the existing 62 vacancies, 58 of the 348 white firefighters who completed the examination (16.7%) will be promoted while only four of the 56 blacks who completed the examination (7.1%) will be selected. It follows that the black selection rate will only be 42.5% that of whites. This rate is substantially below the 80% rate established in the Uniform Guidelines on Employee Selection Procedures 6 as a rule of thumb for determining whether employer policies or practices have an adverse impact on employment opportunities for any race.7

A similar analysis reveals that 112 of the whites who completed the examination (32.2%) will be selected for promotion to the 120 expected positions while only eight of the blacks (14.3%) will be selected. Thus, the selection rate for blacks will be only 44.4% that of whites. This rate is also substantially below the 80% rate established *357in the guidelines.8 It follows that the examination impacts adversely on blacks.

B. Validity of the Examination.

Having determined that the selection procedure had an adverse impact on blacks, we turn to the question of whether the procedure is valid under the Uniform Guidelines. St. Louis attempted to establish the validity of the examination9 through a process known under the guidelines as content validation. Uniform Guidelines 5A, 43 Fed. Reg. 38,298 (1978).

(1) The Multiple Choice Portion of the Examination.

In our view, the multiple choice portion of the examination is not content valid under the Guidelines.10 Because the test is a written, multiple choice examination purporting to select those firefighters who can be expected to perform the best in a physical, stressful job, empirical evidence that the examination will actually accomplish that goal is required. None has been presented.

The job of fire captain in the St. Louis Fire Department involves the fighting of fires, the supervision of firefighters at the scene of a fire, the instruction and training of firefighters at the firehouse, and the maintenance of good morale and working relationships within the captain’s group. The captain’s job does not depend on the efficient exercise of extensive reading or writing skills, the comprehension of the peculiar logic of multiple choice questions, or excellence in any of the other skills associated with outstanding performance on a written multiple choice test. Because of the dissimilarity between the work situation and the multiple choice procedure, greater evidence of validity is required.11

The multiple choice test consisted of 130 items. The items were drawn from a list of knowledges and abilities,12 which were developed from a list of a fire captain’s job tasks.13 The candidates were required to choose one correct response for each item, and the tests were graded exclusively on the basis of the number of “right” answers. These scores were then computed with the assessment center scores to determine each candidate’s rank in the overall selection process. Because these test results were used to rank candidates, St. Louis must prove that the results are associated with different levels of job performance.14

*358The EEOC’s “Questions and Answers,” 15 which provide uniform interpretations and explanations of the Guidelines, explicitly address the requirements for using written examinations which measure knowledge to rank job candidates. They specifically require empirical evidence that mastery of more knowledge is linked with better performance on the job.

Any conclusion that a content validated procedure is appropriate for ranking must rest on an inference that higher scores on the procedure are related to better job performance. The more closely and completely the selection procedure approximates the important work behaviors, the easier it is to make such an inference. * * *
Where the content and context of the selection procedure are unlike those of the job, as, for example, in many paper- and-pencil job knowledge tests, it is difficult to infer an association between levels of performance on the procedure and on the job. * * * To justify use of such a test for ranking, it would also [in addition to tying tested knowledges to work behaviors] have to be demonstrated from empirical evidence either that mastery of more difficult work behaviors, or that mastery of a greater scope of knowledge corresponds to a greater scope of important work behaviors.

Question and Answer No. 62, 44 Fed.Reg. 12,005 (1979) (emphasis added).

Nothing in the record can be construed as empirical evidence of an association between levels of performance on the multiple choice examination and on the job.16 St. Louis, in its description of the development of the test items, explained that each member of the expert panel, which consisted of two fire captains and two deputy chiefs, examined each test item and indicated whether he thought it required basic knowledge or advanced knowledge. The form on which the panel members made this evaluation described “advanced” as performance-differentiating. St. Louis’s expert also testified that the expert panel analyzed the tested knowledges and abilities to determine whether they were performance-differentiating. These exercises by the panel members, however well-intentioned, are not a form of empirical evidence. They are basically opinion and conjecture, not actual observation of the correlation between the extent of mastery of the knowledges and abilities sought *359to be measured by the test and job performance.17

The requirement of empirical evidence to sustain the validity of the multiple choice test is logical and consistent with the spirit of the Guidelines. A procedure that selects candidates on the basis of their performance on a test that closely mirrors actual job behavior would, understandably, be more likely to accurately predict how well the candidates will do on the job. The Questions and Answers offer an example of such a selection procedure. An employer may use a typing test to select persons to fill a job that consists almost entirely of typing. Question and Answer No. 62, 44 Fed.Reg. 12,005 (1979). It is fairly easy to infer that such a test is job related and that, if fairly administered, its results might be used in spite of a showing of adverse impact.

A different situation exists here, however, and justifies the requirement of additional evidence of validity. A fire captain’s job is a physical, hands-on job. It involves complex behaviors, good interpersonal skills, the ability to make decisions under tremendous pressure, and a host of other abilities — none of which is easily measured by a written, multiple choice test. The development of this type of test requires many stages and levels of analysis and a mistake at any stage can destroy the validity of the examination. If the knowledges chosen to be tested are not appropriate to the job, are poorly articulated, are incomplete in some respect or over-emphasized some aspect of work behavior, the examination will be invalid. If the knowledges are not accurately weighted, or if the test disproportionally samples the knowledges, the examination will be invalid. If the test questions are poorly drawn, incomplete or simply inappropriate for sampling the knowledges sought, the examination’s validity is destroyed to the extent of those deficiencies. Because of all these potentials for error, it is logical and reasonable to require something concrete to validate an examination which has an adverse impact on blacks.

The multiple choice test has not been shown to be content valid for the additional reason that St. Louis has not shown that the selection procedure measures “those aspects of performance which differentiate among levels of job performance.” Uniform Guideline 14C(9), 43 Fed.Reg. 38,303 (1978). We are not satisfied with the expert panel’s method of predicting the ability of an item to differentiate job performance based on whether it required basic or advanced knowledge. Dr. Richard Barrett, the plaintiff’s expert witness, testified that the task of predicting the actual difficulty of test items is itself very difficult, even for a trained industrial psychologist. He expressed great doubts about the ability of an expert panel, untrained in matters of test construction, to make that determination. Even if the panel was capable of making this determination, howeyer, its results do not support use of the test for ranking. The panel decided that 77 of 130 test items were basic and that only 53 were advanced or performance-differentiating. There is no showing in the record that the number of each type of item in the advanced group was proportionate to the weight which the expert panel assigned to the corresponding knowledge or ability. We are left with a group of 53 items which the panel declared to be performance-differentiating but which have not been shown to be correlated with the importance of the knowledges and abilities they tested.

*360The results of the multiple choice examination reveal another problem with its use as a ranking device. Ninety percent of the examinees correctly answered 30% of the 130 items, and 80% of the candidates correctly answered 50% of the questions. This again demonstrates a reliance on a relatively small number of test items to rank the candidates. Further, there has been no showing that these more difficult items correlate in number or difficulty with the knowledges and abilities thought to be performance-differentiating.

Moreover, a large portion of the multiple choice test, viewed by itself, reveals additional infirmities. Twenty of the 130 items purport to test a candidate’s “ability to size up a fire.” Since this is a test of an ability rather than a knowledge, the Uniform Guidelines require the test itself to “closely approximate an observable work behavior.” Uniform Guideline 14C(4). St. Louis does not argue that writing a multiple choice examination closely approximates firefighting. Thus, the pool of items on which St. Louis can rely in arguing that its measurement of test performance differentiates among levels of job performance is again reduced. What remains is a series of questions that we cannot say are representative of the necessary performance-differentiating knowledges and abilities.

Many additional criticisms of the multiple choice examination were offered at the hearing by Dr. Barrett and by Captain Daniel Austin, a twenty-seven-year fire department veteran. We do not intend to go into them in detail. We note, however, that several were substantive in nature, suggesting that the examination measured items of knowledge that were not necessary for successful performance of a fire captain’s job, and several were more technical, criticizing the construction of the test items themselves.18 To the extent that these criticisms are valid, they again reduce the pool of potentially performance-differentiating items and weaken the case for the examination’s validity.

On the basis of the record before us, we cannot find that St. Louis has shown the multiple choice portion of the examination to be valid under the Uniform Guidelines. We hold that the district court incorrectly applied the Guidelines in determining that the plaintiffs had little probability of success on the merits of their action.

(2) The Assessment Center.

We turn now to the validity of the assessment center portion ,of the examination. This section of the examination consisted of three parts: a fire-scene simulation, in which the candidates were shown slides of a large fire and were asked to respond in writing to questions regarding their observations and what orders they might give; a training simulation, in which each candidate prepared and presented an informational lecture, similar to one he might give at the firehouse, from printed materials given to him; and an interview simulation, in which each candidate, playing the role of a fire captain, interacted with a person playing the role of a firefighter involved in a personal confrontation with another firefighter. In each of the second two simulations, the candidates were physically observed by three assessors who later conferred to evaluate the candidates.

The fire scene simulation is also a paper- and-pencil test which is far removed from the content and context of the candidate’s actual work behavior. To justify the use of this portion of the examination as a ranking device, St. Louis is again required to. demonstrate “from empirical evidence either that mastery of more difficult work behaviors, or that mastery of a greater scope of *361knowledge corresponds to a greater scope of important work behaviors.” Question and Answer No. 62, 44 Fed.Reg. 12,005 (1979). Once again, we find no such empirical evidence in the record.

Further, the interpretations of the Guidelines specifically discuss the content validity of a paper-and-pencil test which is intended to replicate work behaviors. Question and Answer No. 78 provides that

[p]aper-and-peneil tests which are intended to replicate a work behavior are most likely to be appropriate where work behaviors are performed in paper and pencil form (e. g., editing and bookkeeping). Paper-and-pencil tests of effectiveness in interpersonal relations (e. g., sales or supervision), or of physical activities (e. g., automobile repair) or ability to function properly under danger (e. g., firefighters) generally are not close enough approximations of work behaviors to show content validity.

44 Fed.Reg. 12,007 (1979).

The fire scene simulation was a paper-and-pencil test that required written answers to questions about a slide presentation. It was intended to replicate the work behaviors which a fire captain would undertake in response to the scenes depicted in the slides. The test sought to measure the candidate’s ability to observe the situation depicted in the slides and to decide on the correct course of responsive action, including his projected physical actions, his supervision of the firefighters under his command and his ability to deal with the dangers of the situation. It seems to fall squarely within the scope of Question and Answer No. 78 as a test which is not a close enough approximation of a work behavior to show content validity. Once again, this outcome is sup ported by logic and reason. The fire scene simulation, like the multiple choice examination, cannot avoid testing the candidate’s proficiency in the written exercise of verbal skills which is certainly not a critical or necessary job behavior for a fire captain. The candidates may be very proficient at assessing the scene of a fire and issuing the appropriate oral orders but ineffectual in communicating those orders in writing. It may be that such a simulation test requiring oral responses could be shown to be valid by a content validity study.19.

The appellants also criticize two aspects of the validity of the administration of the assessment center portion of the examination. First, they note that each candidate was observed for a very short period of time. The candidates were not observed at all during the fire scene simulation and were observed for a total of only about thirty minutes during the other two exercises. We consider this to be a substantial criticism, especially considering the Guidelines’ requirement of additional evidence to validate a procedure for ranking candidates, rather than for ascertaining minimum competence. Second, they criticize the assessor’s role in the administration of the examination. They argue that the steps taken to assure the thoroughness of the training of the assessors and uniformity of evaluation among the various assessor groups were not sufficient.20 These arguments are not without merit. The Validation Report filed in December, 1978, anticipated the use of fifty to sixty assessors for whom at least three to four days of training would be necessary to assure standardization of assessment. In fact, seventy-eight assessors were used. Those assessing the interview and training simulations received only two days of train*362ing, and those assessing the fire scene received one day of training.

During this training, twenty-six assessors participated in an exercise designed to demonstrate the reliability of their scores. Although the district court was persuaded by the results of this exercise, we are not. The raw data show substantial variance among the ratings given by the assessors in the exercise. The statistical coefficients of correlation derived by St. Louis from this exercise give an incomplete picture of the reliability of the procedure. First, the coefficient does not measure the differences in scores given by the assessors, but measures only the correlations in their relative ratings. For example, one assessor might give three scores to three individuals — a four and two fives — and another assessor might gives scores of one, two and two. The analysis used to produce the coefficients of correlation results in those sets of scores correlating perfectly, even though the actual ratings differ. Second, the analysis does not address the reliability of the assessment of the candidate’s behavior, but only of the scoring procedure. This missing piece of analysis is an important aspect of the reliability of the assessment center portion of the examination.

Although we share the plaintiffs’ concerns over these aspects of the assessment center portion of the examination, we would be hesitant to hold that the district court erred in sustaining the validity of the interview and training portions of the examination. These latter portions more closely comply with the spirit of the Guidelines than does either the multiple choice examination or the fire scene simulation. But these sections do not stand alone. They are a part of an overall selection procedure which we hold to be inconsistent with the Guidelines. We include our comments only for the purpose of guiding the parties in formulating a new selection procedure that will be entirely consistent with the Guidelines.

IV.

EQUITABLE RELIEF

A. Interim Relief.

Having held that the appellants are likely to succeed on the merits, we must remand this matter to the district court.21 Because St. Louis promoted twenty-four firefighters without awaiting a determination of whether its selection procedure was content valid, the question of the scope of preliminary injunctive relief is a very difficult one. We do not hesitate to require interim relief to ameliorate the effects of the twenty-four promotions, especially in view of the harm that may accrue to the careers of qualified black candidates due to St. Louis’s untimely promotions. See note 21 supra. However, in light of the already chaotic and demoralized condition of the fire department, we do not feel that the demotion of the twenty-three whites and one black is an appropriate alternative.

The only other available alternative is to require that a number of blacks be promoted to the position of fire captain and that their effective promotion date be the same as that of the twenty-four previously appointed captains. While such promotions will undoubtedly also contribute to the tension within the department, we feel that *363the history of discrimination against blacks in the St. Louis Fire Department requires such affirmative relief. Firefighters Institute v. City of St. Louis, Mo., supra, 588 F.2d at 240-241. See Albemarle Paper Co. v. Moody, supra, 422 U.S. at 418, 95 S.Ct. 2362; United States v. City of Chicago, 573 F.2d 416, 429 (7th Cir. 1978); United States v. City of Chicago, 549 F.2d 415, 436-437 (7th Cir.), cert. denied, 434 U.S. 875, 98 S.Ct. 225, 54 L.Ed.2d 155 (1977); Kirkland v. New York St. Dept. of Correctional Serv., 520 F.2d 420, 429-430 (2d Cir. 1975), cert. denied, 429 U.S. 823, 97 S.Ct. 73, 50 L.Ed.2d 84 (1976).

We next must determine the number of black firefighters to be promoted to fire captain and the standard by which those to be promoted will be selected. Although there is no ideal solution, it appears that the best solution is to require St. Louis to immediately promote eight additional black firefighters to the position of fire captain. The promotion of eight additional black firefighters will bring the percentage of black fire captains to 16.6%, which will be an important step towards equating the proportion of black fire captains to that of black firefighters in the department.22 There is also no ideal solution to the question of which black firefighters should receive these promotions. In our view, the assessment center portion of the examination comes the closest to comporting with the Guidelines and would, thus, be the fairest basis for the selection of the eight black firefighters.

B. Longer-Term Relief.

In Section III of this opinion, we held that St. Louis had not established the content validity of its examination. Although the district court heard the evidence on a motion for preliminary relief, the parties presented, and the court considered, exhaustive evidence contesting and supporting the issue of validity. We are not aware of any excluded evidence in support of validity which would be brought forward in the final hearing on the merits of the action. It appears that the district court viewed its decision as a final determination of the merits of the action. In light of these facts, we are inclined to consider our opinion dispositive of the question of validity. If, however, the district court is convinced that St. Louis can offer additional, noncumulative evidence to prove the validity of the examination under this Court’s interpretation of the Uniform Guidelines, the district court may receive such evidence and make its decision on the merits of the case on the entire record. If not, however, St. Louis has no recourse but to develop a new examination for the position of fire captain which is fully consistent with this opinion and the Guidelines, and to do so promptly. The United States and FIRE are obligated to participate with St. Louis in developing such an examination. FIRE has a further obligation to encourage its members to participate actively in the formulation of a valid examination. It would be highly inappropriate for FIRE to “lie in the weeds” until the results of a new examination are known. As we have indicated earlier, their participation in the preparation of an examination does not bar their right to contest the validity of the examination if it proves to have an adverse impact.

All of the parties and their experts should work together to formulate an examination that will be fully consistent with the Guidelines and this opinion. The district court should, upon the request of FIRE and the white firefighters’ organization, require St. Louis to pay a reasonable fee to experts who can assist each of the groups in the preparation of a valid examination.

This leaves yet another question for our determination. St. Louis stated at oral argument that sixty-two vacancies presently exist in the fire captain position, and fifty-eight additional vacancies are predicted to occur over the next two years.23 St. Louis *364has filled twenty-four of the. sixty-two present vacancies, and we have required it to fill eight more in this opinion.

St. Louis has predicted additional problems with department morale if promotions are made from an examination which has not been validated, and we share its concern over this possibility. Nevertheless, we believe, for several reasons, that the present vacancies must be filled promptly and at least thirty of those which arise between now and the development of an eligibility list from a properly validated examination must be filled as they occur. First, St. Louis has urged the importance of restoring the department to full strength, stressing that the efficiency of its firefighting services has been jeopardized by the lack of permanent fire captains. Second, as we previously stated, permitting additional delays in the promotional process will increase the harm which will accrue to black candidates by delaying their possible promotions to a higher rank. Third, a field of candidates eligible for promotion exists because all the firefighters who completed the assessment center portion of the examination had met the department’s requirement of five years’ satisfactory service (“an average rating of ‘Adequate’ or above on the most recent Service Rating”) as a firefighter or fire prevention inspector. Moreover, all of the persons who took the assessment center portion of the examination had received a passing grade in the multiple-choice portion of the examination. Fourth, we suspect there has been a certain amount of foot-dragging by St. Louis,24 which has benefited financially by leaving many of the positions of fire captain vacant and using firefighters to perform the fire captain’s duties. Fifth, we feel that the new examination can be prepared, given and graded before more than thirty additional vacancies occur.

We, therefore, feel that the only satisfactory solution to this problem lies in the immediate promotion of firefighters to fill the existing vacancies, and to fill the first additional thirty vacancies as they occur. St. Louis must fill these vacancies on the basis of one black firefighter for each two white firefighters that are promoted to the position of fire captain.25 These firefighters are to be chosen for promotion on the basis of their rank on the assessment center portion of the examination.

V.

SUMMARY

The district court’s order of June 2, 1979, is vacated and the matter is remanded to the district court, which shall take the following action:

1. It shall order the City of St. Louis to promptly appoint to the position of fire captain those eight black firefighters who received the highest scores on the assessment center portion of the examination, with effective promotion dates of April 30, 1979.

2. It shall order St. Louis to promote firefighters to fill the remaining thirty existing vacancies and the first thirty additional vacancies that occur prior to a final resolution- of this matter in accordance with Section IV of this opinion.

3. It shall promptly determine whether it will receive additional evidence of the validity of the examination under the Guidelines.

4. If it decides it will not receive additional evidence of the validity of the examination, it shall

(a) order the parties and intervenors to proceed to promptly develop a valid examination; and
*365(b) require St. Louis, on the request of FIRE and the intervenors, to make reasonably sufficient funds available to FIRE and the intervenors to permit expert assistance for each group.

5. If it decides it will receive additional evidence of the validity of the examination, it shall hold its hearing on this issue as soon as practicable. If it then determines that the examination is invalid, it shall enter an order in conformance with paragraph 4 above.

. The United States consented to these promotions for two reasons: First, the eight firefighters had already undergone the required physical examinations and the United States preferred not to disappoint their expectations at that late stage. Second, the fire department had requisitioned thirty-three additional fire captain appointments, and the United States felt that the appropriate number of black candidates could be assured immediate promotions.

. The court’s December 19, 1978, order read as follows:

IT is * * * ORDERED * * * that the City of St. Louis utilized best efforts to develop a properly validated examination for the position of fire captain by January 1, 1979. The Court shall assist in the expedition of resolving any parties’ objections to said examination.

Its January 26, 1978, order read:

If defendants wish to assert the validity of an examination, they shall submit to counsel for all parties at least sixty days prior to any intended use, evidence of the validity of the elements of the process, including a copy of the validation study and all underlying documents or data concerning the development of the selection process and its validity. If the parties are unable to agree upon the validity of the process, the process shall not be utilized unless and until the Court determines, upon motion and such evidentiary hearing as it deems appropriate, that the process has been properly validated.

The report to the court, filed February 1, 1979, detailed the parties’ agreement to proceed with the administration of the examination while reserving the right to challenge its validity. For some unexplained reason, the report was not brought to the court’s attention. Given the specific language in the court’s orders, the United States and FIRE were remiss in their failure to file their objections prior to the date the examination was scheduled to be administered. There is, however, some justification for this failure. An after-the-fact examination of the test results is essential to a determination of the examination’s adverse impact, and a selection process which has no adverse impact is usually not held to be in violation of Title VII and does not need to be validated. See Uniform Guideline 1B, 43 Fed.Reg. 38,296 (1978); Albemarle Paper Co. v. Moody, 422 U.S. 405, 425, 95 S.Ct. 2362, 45 L.Ed.2d 280 *355(1975); Moore v. Southwestern Bell Telephone Co., 593 F.2d 607, 608 (5th Cir. 1979).

The agreement to delay a challenge to the validity of the examination, however, does not justify FIRE’s failure to participate in the development of the examination or to encourage its members’ participation. While we recognize that FIRE has only limited funds and while we understand that it is expensive to employ an expert in industrial psychology or a related field, FIRE could have asked the court to require St. Louis to provide money for it to hire an expert. We also find unpersuasive FIRE’s argument that its participation in the process would have operated, as a waiver of its right to challenge the examination. FIRE could certainly have participated while preserving its right to challenge the use of the examination in light of any adverse impact.

We agree with the district court that the hardening of attitudes by St. Louis, FIRE and the white intervenors has exacerbated the already serious problems of tension and low morale within the fire department, but this hardening of attitude is not a reason to deny the black firefighters relief if the examination does not comport with the guidelines.

. Four government agencies, including the Equal Employment Opportunity Commission, have adopted a revision of those guidelines since our earlier opinion. Uniform Guideline 1A, 43 Fed.Reg. 38,296 (1978). The parties agree that these revised Uniform Guidelines control this action.

. In deciding to deny a preliminary injunction, the district court applied the “traditional standard” under which the plaintiffs were obligated to establish (1) a strong probability that they would succeed on the merits at trial, and (2) a likelihood that they would suffer irreparable harm without preliminary injunctive relief. Doran v. Salem Inn, Inc., 422 U.S. 922, 931, 95 S.Ct. 2561, 45 L.Ed.2d 648 (1975); Young v. Harris, 599 F.2d 870 (8th Cir. 1979). We accept this standard for the purposes of this appeal. We reverse the district court because we believe that certain of its factual findings were unsupported in the record and because it made errors of law which are set forth in detail in the opinion.

. These figures include the twenty-four promotions which have already been made on the basis of this examination.

. Section 4D, 43 Fed.Reg. 38,297 (1978), provides:

A selection rate for any race * * * which is less than four-fifths (%) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact.

We find no merit to St. Louis’s contention that the pool of black applicants was too small to exhibit adverse impact under the Guidelines. Harper v. Trans World Airlines, Inc., 525 F.2d 409 (8th Cir. 1975), is inapposite because the selection pool in that case consisted of only five persons while sixty-five blacks took the written examination in this case. Moore v. Southwestern Bell Telephone Co., 593 F.2d 607 (5th Cir. 1979), relied on by the appellees, is also inapposite because no adverse impact under the 80% rule was shown where the black selection rate was approximately 93% that of the whites. We likewise find no merit to St. Louis’s contention that the selection pool was inappropriate or atypical because of the prior court-ordered appointment of twelve blacks. Twelve whites were also appointed and the pool of black qualified applicants may have been larger had blacks not been discriminated against in hiring by the fire department for many years.

. If the written portion of the test alone is considered, the selection rate for blacks would only be 20.1% that of whites. If only the assessment center portion is considered, the black selection rate would be 54.3% that of whites.

. If the written portion of the test is considered alone, the selection rate for blacks would only be 42.7% that of whites. If the assessment center portion only is considered, the black selection rate would be 75.6% that of whites.

. Although the selection procedure included physical, year-in-service and satisfactory grade requirements as well as the examination, we do not understand the plaintiffs to be challenging any portions of the procedure other than the two-part examination.

. Neither St. Louis’s good intentions nor the size of its expenditure is determinative of the issue of whether the examination is content valid. See United States v. San Diego County, 20 EPD 30,154 (S.D.Cal. July 6, 1979).

. “[As] the setting and manner of the administration of the selection procedure less resemble the work situation, * * * the less likely the selection procedure is to be content valid, and the greater [is] the need for other evidence of validity." Uniform Guideline 14C(4) (emphasis added). 43 Fed.Reg. 38,302 (1978).

. A test of knowledges and abilities may be used if it measures a representative sample of knowledges, skills or abilities that (a) are necessary to performance of that job, and (b) are operationally defined in Guideline 14C(4). Uniform Guideline 14C(1). 43 Fed.Reg. 38,302 (1978). If a knowledge is to be measured, it must be defined in terms of behavior and it must be part of a body of learned information that is actually used in and necessary for critical or necessary job behaviors that are observable. Uniform Guideline 14C(4), 43 Fed.Reg. 38,302 (1978). If an ability is to be measured, the ability must be defined in terms of observable aspects of job behavior and should be an ability actually used in and necessary for the performance of critical or important work behaviors. Id. Any selection procedure measuring an ability should closely approximate an observable work behavior. Id.

. The plaintiffs do not contest the adequacy of the description of job tasks and behaviors under the Guidelines.

. “Where a selection procedure supported solely or primarily by content validity is used to rank job candidates, the selection procedure should measure those aspects of performance *358which differentiate among levels of job performance.” Uniform Guideline 14C(9), 43 Fed. Reg. 38,303 (1978).

“The evidence of both the validity and utility of a selection procedure should support the method the user chooses for operational use of the procedure, if that method of use has a greater adverse impact than another method of use. . Evidence which may be sufficient to support the use of a selection procedure on a pass/fail (screening) basis may be insufficient to support the use of the same procedure on a ranking basis under these guidelines.” Uniform Guideline 5G, 43 Fed.Reg. 38,298 (1978).

. These Questions and Answers were adopted by the four agencies that promulgated the Uniform Guidelines. Their intent is to “interpret and clarify, but not to modify, the provisions of the Uniform Guidelines.” 44 Fed.Reg. 11,996 (1979). Such interpretations are entitled to great deference by the court, especially where, as here, the Guidelines themselves are highly technical and somewhat difficult for those untrained in test construction to comprehend.

“Since this involves an interpretation of an administrative regulation a court must necessarily look to the administrative construction of the regulation if the meaning of the words used is in doubt. * * * [Tjhe ultimate criterion is the administrative interpretation, which becomes of controlling weight unless it is plainly erroneous or inconsistent with the regulation.”

Udall v. Tallman, 380 U.S. 1, 16, 17, 85 S.Ct. 792, 801, 13 L.Ed.2d 616 (1965), citing Bowles v. Seminole Rock & Sand Co., 325 U.S. 410, 413-414, 65 S.Ct. 1215, 89 L.Ed. 1700 (1945). See generally K. DAVIS, Administrative Law Treatise § 7.22 (2d ed. 1979).

. Webster’s New World Dictionary defines “empirical” as:

1. relying or based solely on experiment and observation rather than theory [the empirical method] 2. relying or based on practical experience without reference to scientific principles [an empirical remedy].

See Western Addition Community Organization v. Alloto, 360 F.Supp. 733, 736 n.5 (N.D. Cal. 1973).

. We do not rest our decision on the validity of the multiple-choice test on Dr. Richard Barrett’s testimony that the demonstrated reliability coefficient of .84 was not sufficiently high for a valid examination. Mr. Edmund Knowles, the expert witness for St. Louis, testified that a reliability coefficient of .84 was sufficient and we do not quarrel with the district court’s finding that the test was reliable. However, a showing of reliability does not end the matter. All such an analysis shows is that those persons who did the best on the examination tended to do the best on most of the same questions and that those who did less well on the test tended to do less well on most of the same questions. The principal criticism of the multiple choice test is that there is no showing that those persons who received the highest scores on the test would, in fact, perform the best as fire captains.

. For example, Questions 126 and 130 require knowledge of the diameter of certain water mains and of the number of pumping stations in St. Louis, and Questions 62 and 63 require knowledge of certain elements of window and stair construction. Captain Austin testified that these are not items of knowledge which are necessary for the performance of a fire captain’s duties. Examples of questions of which technical criticism were made include Question 90, which refers to the phases of a fire in terms that Captain Austin testified are not commonly used by fire captains, and Question 86, which contains double negatives that make it difficult to comprehend.

. The Guidelines provide that the validity of a selection procedure may be shown by criterion-related validity studies, content validity studies or construct validity studies. 29 C.F.R. § 1607.5 (1979). Here, the City did not rely on criterion-related studies or construct studies but attempted to establish content validity. As noted above, the circumstances of a multiple-choice examination are so dissimilar from the work situation as to make it impossible to establish job relatedness through a content validity study. Similarly, requiring written answers to the simulation test questions tends to vitiate any showing of content validity.

. Each assessor evaluated sixteen candidates, and three assessors simultaneously evaluated each candidate’s performance on the simulation. Each group of three assessors met one to three days later to attempt to reach a consensus on the candidate’s performance on each portion of the evaluation.

. In denying plaintiffs’ motion for a preliminary injunction, the district court assumed without deciding that the denial of preliminary relief would constitute irreparable harm. In our view, the plaintiffs have demonstrated irreparable injury as a matter of law. Twenty-three white firefighters have been promoted on the basis of an examination which has not been validated. The promotion of still more firefighters on the basis of the present eligibility list, nearly all of whom would be white, would work a great injustice on the black fire captain candidates whose promotions have been delayed for years. The effects of these continual delays in the promotions of blacks in the department do not end at the fire captain position. At oral argument, counsel for the United States pointed out that an additional five years of service is required as a prerequisite for eligibility for promotion to the battalion chief position. Since the promotional examination for that position is given fairly infrequently, any delay in the promotions of blacks will adversely affect their chances of further advancement in the department.

. Presently, 22% of the department’s approximately 900 uniformed personnel are black, and 27% of those in the entry-level firefighter position are black.

. The number of expected vacancies in the position in the near future is greater than usual *364because of a recent change in the department’s retirement rules.

. We extensively discussed the City’s recalcitrance in our last opinion, Firefighters Institute v. City of St. Louis, Mo., 588 F.2d 235, 240 241 (1978), cert. denied, 443 U.S. 904, 99 S.Ct. 3096, 61 L.Ed.2d 872 (1979).

. If all these promotions are made on this basis, blacks will hold approximately 26% of the fire captain positions.