dissenting.
I concur in this difficult case as to the scope of our remand in Brunet I. I believe, however, that there was insufficient evidence to support the district court’s finding that ranking applicants based on a hiring device that weighted the CAT more heavily than the PCT was not “substantially equally valid” to ranking applicants based on a hiring device that weights the two tests equally. I therefore DISSENT.
I. STANDARD OF REVIEW
On remand, the district court concluded that “a procedure that selects applicants who are ranked by giving the CAT greater weight than the PCT is not substantially equally valid as a procedure that chooses applicants ranked by giving equal weight to the CAT and PCT.” In support of its finding, the district court relied on Dr. Landy’s 1985 content validity study1 and Dr. Landy’s testimony at an evidentiary hearing on May 21, 1992. At the hearing, Dr. Landy restated his conclusions from the content validity study and explained that the firefighters’ test equally weighted the CAT and PCT because the job analysis (i.e., content validity study)2 indicated that the skills necessary to perform the job were equally divided between cognitive and physical attributes. Noting that the fifty-fifty split was not selected arbitrarily, Dr. Landy concluded that, if the CAT was weighted sixty or seventy percent, it “would be clearly open to objection because it [would] contradict[ ] the job analysis.” Though both the majority and district court opinions treat Dr. Landy’s testimony and content validity study as separate pieces of evidence, I find the substance of the testimony and study to be indistinguishable. Dr. Landy’s testimony added nothing to the content validity evidence in the job analysis, which merely showed that physical and cognitive skills are evenly represented in the job of a firefighter. The district court’s factual findings cannot be set aside unless clearly erroneous. Fed.R.Civ.P. 52(a). Clear error exists ‘“when although there is evidence to support [a finding], the reviewing court on the entire evidence is left with the definite and firm conviction that a mistake has been committed.’ ” Anderson v. City of Bessemer City, 470 U.S. 564, 573, 105 S.Ct. 1504, 1511, 84 L.Ed.2d 518 (1985) (quoting United States v. United States Gypsum Co., 333 U.S. 364, 395, 68 S.Ct. 525, 542, 92 L.Ed. 746 (1948)). This court cannot reverse the district court “simply because it is convinced that it would have decided the case differently.” Id. “Where there are two permissible views of the evidence, the factfinder’s choice between them cannot be clearly erroneous.” Id. at 574, 105 S.Ct. at 1511. Thus, we must affirm if the content validity evidence supports a reasonable inference that giving greater weight to the CAT would render the firefighters’ test less predictive of future performance than the present test.
II. SCOPE OF REMAND
At the outset, it is necessary to identify the precise issue that was before the district *259court on remand. It is undisputed that the PCT has an adverse impact on female applicants when used for rank-order hiring. Thus, under Uniform Guideline § 1607.3 A, the City had to demonstrate that ranking candidates based on their PCT scores was job related. It did so to the district court’s satisfaction, and we affirmed. See 1 F.3d 390, 410-11 (6th Cir.1993), cert. denied, — U.S.-, 114 S.Ct. 1190, 127 L.Ed.2d 540 (1994). As we noted in Brunet I, however, the inquiry does not end there. Under Uniform Guideline § 1607.3 B, the City had to demonstrate that ranking candidates based on a hiring device that weighted the CAT more heavily than the PCT was not “substantially equally valid” to a hiring procedure that weighted the CAT and PCT equally.
How, then, could the City satisfy its burden of proof? The district court concluded that the City had justified equal weighting because the evidence in the record showed that “the abilities necessary to properly perform the job of a firefighter are evenly distributed between physical and cognitive abilities.” I find the district court’s reasoning flawed as a matter of logic. Evidence of content validity only establishes that a hiring device “is representative of important aspects of performance on the job.” 29 C.F.R. § 1607.16 D (emphasis added). The Uniform Guidelines provide that
[ejvidence which may be sufficient to support the use of a selection procedure on a pass/fail ... basis may be insufficient to support the use of the same procedure on a ranking basis_ Thus, if a user decides to use a selection procedure on a ranking basis, and that method of use has a greater adverse impact than use on an appropriate pass/fail basis[,] ... the user should have sufficient evidence of validity and utility to support use on a ranking basis.
Id. § 1607.5 G. Therefore, where ranking is at issue, the focus is not on whether a hiring device is representative of the job but whether it properly measures those aspects of future performance that distinguish the superi- or employee from the average employee. See Gilbert v. City of Little Rock, 799 F.2d 1210, 1215 (8th Cir.1986). “A test may have enough validity for making gross distinctions between those qualified and unqualified for a job, yet may be totally inadequate to yield passing grades that show positive correlation with job performance.” Guardians Ass’n of New York City Police Dep’t v. Civil Serv. Comm’n, 630 F.2d 79, 100 (2d Cir.1980), cert. denied, 452 U.S. 940, 101 S.Ct. 3083, 69 L.Ed.2d 954 (1981). Simply because a hiring procedure facially mirrors the distribution of skills needed to perform a particular job does not necessarily indicate that the selection procedure will differentiate between those skills that contribute most to superior performance. Cf. Williams v. Vukovich, 720 F.2d 909, 924 (6th Cir.1983). Indeed, the Uniform Guidelines require that, “[wjhere a selection procedure supported solely or primarily by content validity is used to rank job candidates, the selection procedure should measure those aspects of performance which differentiate among levels of job performance.” 29 C.F.R. § 1607.14 C(9). Dr. Landy’s content validity study clearly supports the conclusion that only fifty percent of the skills needed to be a firefighter are cognitive. To reiterate, however, simply because cognitive and physical skills are both prerequisites to performance of the job does not negate the possibility that cognitive skills more often distinguish the superior firefighter from the merely qualified firefighter. Therefore, the City could not simply show on remand that the firefighters’ test, as it is presently configured, is representative of the job (i.e., content valid).
Likewise, the City could not rely on evidence which merely established that giving equal weight to the CAT and PCT was valid for ranking purposes. In other words, the City could not simply show that better performance on the present test correlates with better performance on the job. The issue on remand was not whether equal weighting was predictive of job performance, but whether equal weighting was the only weighting that would yield substantially the same degree of predictability. Even if the content validity study conclusively established — and I do not believe it did — that ranking candidates based on a hiring device that gives equal weight to the CAT and PCT is predictive of varying levels of job perfor-*260manee, the study is not probative as to whether ranking applicants based on a hiring device that gives greater weight to the CAT is less predictive.
Instead, the City had to demonstrate that ranking candidates based on a hiring device that weighted the CAT fifty-five or sixty percent, for example, is not as predictive of future performance as equal weighting. There was but one way to accomplish this end. The City had to offer evidence that cognitive and physical abilities equally contribute to superior performance as a firefighter.
III. CONTENT VALIDITY EVIDENCE
The district court approved equal weighting of the CAT and PCT because it found that giving greater weight to the CAT “would violate the concept that the content of the examination should be representative of the content of the job.” Implicit in the district court’s conclusion is the assumption that cognitive and physical ability equally contribute to superior performance on the job. Although Dr. Landy’s content validity study suggests that cognitive and physical skills are equally represented in the job of a firefighter, it does not demonstrate that cognitive and physical skills equally contribute to superior performance.
In Brunet I, we approved the use of rank-order hiring based on an applicant’s PCT score. See 1 F.3d at 411. In doing so, however, we expressly declined to rely on Dr. Landy’s 1985 content validity study. Id. at 410. Instead, we concluded that the results of Dr. Landy’s 1987 concurrent criterion-related study3 provided a sufficient statistical correlation between PCT scores and future job performance to permit the City to rank applicants based on their PCT scores. Id. at 411. Thus, the content validity study merely demonstrated that physical and cognitive skills are equally necessary to perform the job of a firefighter. I am aware of no evidence, however, that the job analysis established that physical and cognitive skills equally contribute to superior job performance. I believe that, in Brunet I, we declined to rely on the content validity study to support rank-order hiring, because the job analysis did not indicate which skills are more important in distinguishing superior performance.
Thus, while evidence of content validity is relevant to show that the firefighters’ test is representative of the job, it does not support the district court’s conclusion that ranking candidates based on a hiring device that gives greater weight to the CAT than the PCT is less predictive of future performance than the present test. As a consequence, I cannot agree that the content validity evidence was probative on remand.
I dissent with some reluctance because this controversy has been dragging on for far too long and my disagreement with the majority is on a technical and complex issue. Nevertheless, the City had an affirmative obligation, under Uniform Guideline § 1607.3 B, to demonstrate that giving greater weight to the CAT would render the firefighters’ test less predictive of future performance than the present test. Yet, the content validity evidence relied upon by the district court does not support that conclusion. Moreover, the majority acknowledges that there was substantial evidence in the record — evidence which the district court failed to even consider on remand — which tended to support the Brunet plaintiffs’ claim that cognitive skills are more predictive of future performance than physical skills. For example, the 1980 job analysis performed by the City’s chief of testing (the Ingram/Kriska Job Analysis) and the concurrent-criterion related validity report (Landy Report) seem to show that cognitive skills are more related to superior performance than either physical skills or a combination of both. Indeed, the Landy Report contradicts Dr. Landy’s own testimony.
Despite this empirical evidence, the majority affirms the district court’s finding because Dr. Landy testified that assigning greater weight to the CAT would violate the job analysis and, impliedly, render the test unrepresentative of the job. Dr. Landy’s testimony would indeed be probative if there was *261some evidence indicating that the job analysis demonstrated that physical and cognitive skills equally contribute to superior performance as a firefighter. I am unaware, however, of any evidence in the record to that effect. The 1985 job analysis only establishes which sldlls are prerequisites for the job and does not draw conclusions about which skills distinguish the superior from adequate firefighter. Furthermore, while we agreed in Brunet I that the Landy Report provided a sufficient basis for rank-order hiring from the PCT, we never held that the Landy Report supported equal weighting. Id. In fact, we noted, in ordering remand, that the CAT was arguably more predictive of superior job performance. Id. at 412.
Under the circumstances, I believe evidence of content validity was not enough. The problem in this case is not with the quantity of the evidence, but rather with the character of the evidence. Absent some proof that the job analysis showed that physical and cognitive ability equally contribute to superior performance, I would remand to the district court with instructions to hold a hearing, at which, the parties could present evidence on the single question of whether cognitive skills disproportionately contribute to superior performance as a firefighter.
. A content validity study establishes that a hiring device "is representative of important aspects of performance on the job for which the candidates are to be tested." 29 C.F.R. § 1607.5 B (emphasis added).
. The Uniform Guidelines define a job analysis as a "detailed statement of work behaviors and other information relevant to the job.” 29 C.F.R. § 1607.16 K.
. Criterion-related validity studies show that “the selection procedure is predictive of or significantly correlated with important elements of job performance.” 29 C.F.R. § 1607.5 B.