This is a class action under Title VII of the Civil Rights Act of 1964 on behalf of the 39 women who failed the physical agility test given by the Evanston fire department to applicants for firefighting jobs in 1983. Eighty-five percent of the women who took the test failed (only seven percent of the men failed) and were thereby disqualified, and there are no women among Evanston’s 106 firefighters although at one time there were two. The test is conceded to have had a “disparate impact” on women. So unless the test (more specifically the method of scoring it — the focus of the plaintiff’s attack) serves a legitimate interest of the employer, it violates Title VII. The district judge found a violation and gave judgment for the class. 695 F.Supp. 922 (N.D.Ill.1988). The city appeals, challenging the finding of liability; the plaintiff also appeals, challenging the adequacy of the equitable relief that the judge ordered. The case was tried before the Supreme Court’s decision in Wards Cove Packing Co. v. Atonio, — U.S. -, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989), changed the ground rules for disparate impact litigation. In another case decided today, Allen v. Seidman, 881 F.2d 375 (7th Cir.1989), we have offered our interpretation of Wards Cove, and rather than repeat it here we shall simply apply it.
The physical agility test that the Evans-ton Fire Department used in 1983 (and also in 1981 and 1985) consisted of a group of tasks which were to be performed consecutively by each applicant without a break, while wearing a firefighter’s uniform. The tasks were: climbing to the top of a 70-foot ladder; climbing an extension ladder twice while carrying a hose pack; removing a ladder from a firetruck, carrying the ladder to a wall, leaning it up against the wall, and then removing it and returning it to the truck; connecting a hose to a fire hydrant, turning the hydrant on and off, and disconnecting the hose; and dragging a section of hose filled with water fifty *384feet, dragging a tarpaulin to the top of a hill, carrying the tarp through ten tires, and again dragging a section of hose filled with water fifty feet. The test was timed. The mean time in 1983 was 628 seconds, and the Fire Department chose one standard deviation above this mean as the passing score, with the result that anyone who took more than 767 seconds to complete the test flunked.
The physical agility test is only the first hurdle an applicant must clear to become a firefighter. Next come tests of intelligence and of psychological stability, and in the end only nine of the 839 persons who applied for firefighter jobs in 1983 were hired — all men. The fire department’s choice of one standard deviation above the mean as the passing score was not consistent. In 1985 the passing score was 915 seconds, which was 2.8 standard deviations above the mean for that year. In 1981 the passing score had been 890 seconds, which was 1.7 standard deviations above the mean, but had been raised in order to enable three of the four women who took the test to pass it.
The district judge found that the test itself was fine, and this is unquestionably correct under the relaxed standard of Wards Cove and is not seriously contested by the plaintiff. The test was designed by firefighters, consists of tasks that faithfully imitate the tasks that firefighters are called on to perform in their work, tests for speed, skill, endurance in — in a word, aptitude for — performing those tasks, and was pretested on the Evanston firefighter force before being given to applicants. It seems clearly related to the employer’s legitimate need for physically strong firefighters, and the plaintiff has suggested no alternative that would serve that need as well yet be less difficult for women. A similar test was upheld in Berkman v. City of New York, 812 F.2d 52, 59-60 (2d Cir.1987); see also the earlier opinion in that case reported at 705 F.2d 584, 592 n. 10 (2d Cir.1983).
The rub is in the scoring of the test. Since men are on average stronger and faster than women, the higher the passing score on a test such as Evanston’s physical agility test (that is, the shorter the time in which it must be completed) the smaller the percentage of women likely to pass it. To satisfy its burden of producing evidence that the test — which means all aspects of the test including the method of scoring it — served a legitimate employer purpose, the city was obliged to produce evidence that the method of determining who passed the test in 1983 was related to the city’s need for a physically capable firefighting force. Cf. Guardians Association v. Civil Service Commission, 630 F.2d 79, 105-06 (2d Cir.1980); Thomas v. City of Evanston, 610 F.Supp. 422, 431 (N.D.Ill.1985).
The city did produce evidence relating to this question but it consisted of little more than testimony that one standard deviation above the mean is a frequent cutoff point on tests and that the cut-off point for the physical agility test was generous to the candidates and quite possibly should have been lower. It is not surprising that Judge Zagel was not persuaded by this evidence. The choice of one standard deviation above the mean was a decision to pass 84 percent of the test takers, and this meant that the passing score would depend on the average performance of those who happened to take it. But the ability to perform firefighting tasks adequately depends not on relative but on absolute test performance. If one year all the applicants were superbly fit, it would be irrational to disqualify the entire bottom 16 percent. For it is not only physical abilities that the fire department is after — as is made plain by the fact that no preference is given to candidates who do exceptionally well on the physical agility test, as opposed to those who barely pass it. The department wants firefighters who are intelligent and stable, as well as strong and swift. If it cuts off from further consideration persons who are perfectly able physically — although less so than some other applicants who may, however, be their inferiors in intelligence and stability — it is shooting itself in the foot. No explanation was offered, moreover, for why different pass rates were selected in 1981 and 1985, the effect being to enlarge markedly the time allowed to complete the *385test compared to what it had been in 1983. There was some evidence that the weather was bad in 1985, but that would explain only why the mean would be higher — not why the department would allow a higher number of standard deviations above the mean.
One would think the rational way of scoring the physical agility test would be to determine the maximum time in which a firefighter who had no training or practice — for remember that the test is for applicants — ought to be able to complete the test, and make that the cut-off point. Applicants who passed the test would then take the other two tests (intelligence and stability), which presumably would have their own cut-offs. Among those who passed all three tests, those whose composite score, weighted by the relative importance of the tests, was the highest would be hired. This was not the procedure followed by the Evanston Fire Department (the record is unclear on what procedure was followed), and no satisfactory reasons for departing from it were presented.
So feeble was the city’s effort to justify the cut-off point for the physical agility test that it can be argued that the city did not even carry its burden of production. But we reject this argument. The plaintiff had challenged the entire test, and the city put in a great deal of justificatory evidence which succeeded in justifying everything about the test except the scoring method. That was enough to satisfy the burden of production and shift inquiry to whether, with all the evidence in, the plaintiff proved — since after Wards Cove it is the plaintiff that has the burden of persuasion — that the test, because of its method of scoring, did not serve the legitimate ends of the employer but instead unreasonably excluded women. On this issue the district judge should have first crack, for while it is quite likely that he thought the scoring method no good, we are given pause by the following statement in his opinion: “I do not say that the 1983 cut-off score of 767 seconds is unjustifiable. I say it was not justified by Evanston.” 695 F.Supp. at 929 n. 3. This is, or at least may be, the language of burden of persuasion; and if that is what it is, the judge (quite understandably, considering the state of the law when he decided the case) placed the burden on the wrong party. There must be a remand for reconsideration in light of Wards Cove, and Judge Zagel may if he wants take additional evidence to help him answer the question whether the plaintiff has shown that the method of scoring the test was unreasonable under the standard of that decision.
If Judge Zagel again finds the method of scoring to be unlawful, the issue of the appropriate scope of equitable relief will recur, so we shall consider the plaintiffs challenge to his order now. The judge ordered the city to submit for his consideration a new test (or rather a new method of scoring the old test), and neither side questions that relief. But he refused to order the city to hire any of the members of the plaintiff class or even to allow them to advance to the next test. The plaintiff argues that those class members whose times of completing the 1983 test were within the passing range on the 1985 test (915 seconds, compared to only 767 in 1983) should be excused from having to retake the physical agility test and be allowed to move on to the other tests.
Judge Zagel was within his remedial discretion in declining to take this step. When he issued his order, it was five years since the class members had taken the physical agility test, and they offered no evidence that their agility had not declined in the interim. The importance of competent firefighting to the safety of the people of Evanston, as well as of the firefighters themselves, whose safety depends in part anyway on each other’s physical fitness and agility, justified the city in insisting that all applicants have taken the test in the recent rather than remote past. An equity court must always consider the possible impact of a decree on innocent third parties. See United States v. City of Chicago, 870 F.2d 1256, 1262-63 (7th Cir.1989), and cases cited there.
While there is much talk in the cases about “make whole” relief, see, e.g., *386Albemarle Paper Co. v. Moody, 422 U.S. 405, 418-22, 95 S.Ct. 2362, 2372-74, 45 L.Ed.2d 280 (1975); Franks v. Bowman Transportation Co., 424 U.S. 747, 762, 96 S.Ct. 1251, 1263, 47 L.Ed.2d 444 (1976), this talk has reference to cases where it is reasonably clear that, had it not been for the discriminatory behavior, the plaintiff would have got (or retained) the job or other employment benefit in issue, and where making the plaintiff whole would not unduly injure innocent third parties. Cf. Martin v. Wilks, — U.S. -, 109 S.Ct. 2180, 104 L.Ed.2d 835 (1989). The second condition would be met in this case if the relief took the form of backpay, but not the first. As only 1.2 percent of the applicants who passed the Evanston fire department’s physical agility test were actually hired, what the class members lost was not a job but a long-shot chance at a job. They will be restored to the place they would have occupied if they pass a new physical agility test approved by the district court. Depending on their performance on that test and on the other tests required of applicants, they may eventually be in a position to show that but for unfair scoring of the 1983 test they would have been hired in 1983, and if so they can then claim additional backpay. The judge awarded a tiny amount of backpay, computed as follows: women were 4.68 percent of the applicants in 1983, so he awarded them an amount equal to 4.68 percent of the salary of those who were hired. In awarding this amount the judge was discounting backpay by (a measure of) the probability that the women would have obtained the pay but for the discriminatory scoring of the physical agility test; the analogy is to damages for loss of life expectancy, on which see DePass v. United States, 721 F.2d 203, 206-10 (7th Cir.1983) (dissenting opinion), and references cited there. Whether this is a proper form of equitable relief either generally or under Title VII, and if so whether the judge used a proper measure of the probability that the women would have been hired, are not issues we need resolve, since the defendants do not challenge the judge’s award of backpay. However, the additional relief that the women have sought and the judge denied probably is premature and certainly was not required as a matter of law.
The case is remanded for further consideration in light of this opinion. Circuit Rule 36 shall not apply on remand.
Vacated and Remanded.