OPINION OF THE COURT
MANSMANN, Circuit Judge.In this appeal, we must determine the appropriate legal standard to apply when evaluating an employer’s business justification in an action challenging an employer’s cutoff score on an employment screening exam as discriminatory under a disparate impact theory of liability. We hold today that under the Civil Rights Act of 1991, a discriminatory cutoff score on an entry level employment examination must be shown to measure the minimum.qualifications necessary for successful performance of the job in question in order to survive a disparate impact challenge. Because we find that the District Court did not apply this standard in evaluating the employer’s business justification for its discriminatory cutoff score in this case, we- will reverse the District Court’s judgment and remand for reconsideration under this standard. In light of our decision to remand on this basis, we need not reach the parties’ other assertions of error.
I.
This appeal comes to us from a judgment entered by the District Court in favor of the Southeastern Pennsylvania Transportation Authority (“SEPTA”) after a twelve day bench trial in January of 1998. Although the parties generally do not dispute the facts relevant to this appeal, to the extent there are favorable inferences to be drawn, we must draw them in favor of SEPTA as the prevailing party. In addition, because we must not disturb the factual findings of the District Court unless clearly erroneous, much of the following background is adopted from the facts as found by the District Court in its extensive memorandum opinion. See Lanning v. Southeastern Pennsylvania Tmnsp. Autk, 1998 WL 341605, at *l-*52 (E.D.Pa. June 25,1998).
A.
SEPTA is a regional mass transit authority that operates principally in Philadelphia, Pennsylvania. In 1989, in response to a perceived need to upgrade the quality of its transit police force, SEPTA initiated an extensive program designed to *482improve the department. As part of this program, SEPTA dedicated its transit officers primarily to patrolling the subways and limited their responsibilities to serve as guards at other SEPTA property. In addition, SEPTA increased the number of its officers from 96 to 200 and introduced a “zone concept” for the areas they patrol.1 SEPTA also began to consider methods by which it might upgrade the physical fitness level of its police officers.
In 1991, SEPTA hired Dr. Paul Davis to develop an appropriate physical fitness test for its police officers.2 Dr. Davis initially met with SEPTA officials in order to ascertain SEPTA’s objectives. Dr. Davis determined that SEPTA was interested in enhancing the level of fitness, physical vig- or and general productivity of its police force. Once Dr. Davis had determined SEPTA’s objectives, he went on a ride-along with SEPTA transit police and, over the course of two days and approximately twenty hours, rode the SEPTA trains in order to obtain a perspective on the expectations of SEPTA transit officers.
Dr. Davis next conducted a study with twenty experienced SEPTA officers, designated “subject matter experts” (SMEs), in an effort to determine what physical abilities are required to perform the job of SEPTA transit officer. From the responses Dr. Davis received in this study, he determined that running, jogging, and walking were important SEPTA transit officer tasks and that SEPTA officers were expected to jog almost on a daily basis.
Dr. Davis then asked the SMEs to determine what level of physical exertion was necessary to perform these tasks. ' The SMEs estimated that it was reasonable to expect them to run one mile in full gear in 11.78 minutes. Dr. Davis rejected this estimate as too low based upon his determination that any individual could meet this requirement. , Ultimately, Dr. Davis recommended a 1.5 mile run within 12 minutes. Dr. Davis explained that completion of this run would require that an officer possess an aerobic capacity of 42.5 mL/kg/min, the aerobic capacity that Dr. Davis determined would be necessary to perform the job of SEPTA transit officer.3
Dr. Davis recommended that SEPTA use the 1.5 mile run as an applicant screening test. Dr. Davis understood that SEPTA officers would not be required to run 1.5 miles within 12 minutes in the course of their duties, but he nevertheless recommended this test as an accurate measure of the aerobic capacity necessary to perform the job of SEPTA transit police officer. Based upon Dr. Davis’ recommendation, SEPTA adopted a physical fitness screening test for its applicants which included a 1.5 mile run within 12 minutes. Beginning in 1991, the 1.5 mile run was administered as the first component of the physical fitness test; if an applicant failed to run 1.5 miles in 12 minutes, the applicant would be disqualified from employment as a SEPTA transit officer.
It is undisputed that for the years 1991, 1993, and 1996, an average of only 12% of women applicants passed. SEPTA’s 1.5 mile run in comparison to the almost 60%
*483of male applicants who passed.4 For the years 1993 and 1996; the time period in question in this litigation, the pass rate for women was 6.7% compared to a 55.6% pass rate for men. In addition, research studies confirm that a cutoff of 12 minutes on a 1.5 mile run will have a disparately adverse impact on women.5 SEPTA concedes that its 1.5 mile run has a disparate impact on women.
In conjunction with the implementation of its physical fitness screening test, SEPTA also began, testing incumbent officers for aerobic capacity in 1991. SEPTA policy requires any officer who fails any portion of the incumbentfitness test to retest on the failed element within three months. For each portion of the physical fitness test' that an incumbent officer fails, an interim goal is set for that officer.
SEPTA initially disciplined those incumbent officers who failed the fitness test. Due to protests by the incumbent officers’ union, however, SEPTA discontinued its discipline policy and instead implemented an incentive program that rewarded incumbent officers for passing their interim fitness goals.
According to SEPTA’s internal documents, significant percentages of incumbent officers of all ranks have failed SEPTA’s physical fitness test.6 By 1996, however, 86% of incumbent officers reached SEPTA’s physical fitness standards. SEPTA has never taken any steps to determine whether incumbent officers who have failed the physical fitness test have adversely affected SEPTA’s ability, to carry out its mission.
SEPTA has promoted incumbent officers who have failed some or all of the components of the physical fitness test. SEPTA has also given special recognition, commendations, and satisfactory performance evaluations to incumbent officers who have failed the physical fitness test. SEPTA has never disciplined, terminated, removed, reassigned, suspended or demoted any transit officer for failing to -perform the physical requirements of the job.
In addition, due to a clerical error, SEPTA hired a female officer in 1991 who failed the 1.5 mile run. This officer has subsequently been “decorated” by SEPTA and has been nominated repeatedly for awards such as Officer of the Year and Officer of the Quarter. SEPTA has commended her for her outstanding performance as a police officer and has chosen her to serve as one of SEPTA’s two defensive tactics instructors.
SEPTA employs an extremely low number of women in its transit police force. The District Court found that, as of July 1997, SEPTA employed only 16 women in its 234 member police force. Only two of these women hold ranks higher than that of patrol officer. See Lanning, 1998 WL 341605 at *27.
*484B.
On January 28, 1997, after satisfying all administrative prerequisites, five women who failed SEPTA’s 1.5 mile run brought a Title VII class action against SEPTA on behalf of all 1993 female applicants, 1996 female applicants and future female applicants for employment as SEPTA police officers who have been or will be denied employment by reason of their inability to meet the physical entrance requirement of running 1.5 miles in 12 minutes or less. On February 18, 1997, the Department of Justice, after conducting the appropriate investigation of SEPTA’s employment practices and meeting all conditions precedent under Title VII, also filed suit on behalf of the United States challenging SEPTA’s entire physical fitness test, including the 1.5 mile run. The District Court properly exercised jurisdiction over these Title VII actions challenging SEPTA’s hiring practices pursuant to 28 U.S.C. § 1331. On April 21, 1997, the District Court consolidated the two actions for all purposes up to and including trial.
After litigation commenced, SEPTA hired expert statisticians to submit reports examining the statistical relationship between the aerobic capacity of SEPTA’s officers and their number of arrests, “arrest rates”7 and number of commendations. In these reports, the statisticians concluded that there was a státistically significant correlation between high aerobic capacity and arrests, arrest rates and commendations. In addition, one expert prepared a report that estimated that 51.9% of the persons arrested for serious crimes between 1991 and 1996 had an aerobic capacity of 48 mL/kg/min and 27% of those arrested had an aerobic capacity of less than 42 mL/kg/min.8 Based upon these reports, the District Court held that SEPTA established that its aerobic capacity requirement is job related and consistent with business necessity. See Lan-ning, 1998 WL 341605 at *35.
The District Court also found support for this conclusion in an expert report submitted on behalf of SEPTA by Dr. Robert Moffatt. Dr. Moffatt simulated a training course and concluded that officers with aerobic capacities of 45 mL/kg/min or better had a 7-8% decrement in their ability to perform physical activities after a run of approximately three minutes; officers with an aerobic capacity of less than 45 mL/kg/min exhibited a 30% decrement in physical ability after the same run. The District Court found that Dr. Moffatt’s study demonstrates “the manifest relationship of aerobic capacity to the critical and important duties of a SEPTA transit police officer.... ” Id. at *68.
The District Court entered judgment in favor of SEPTA on all claims. Both the individual plaintiffs and the United States have taken appeals from the District Court’sfinal judgment, over which we have jurisdiction pursuant to 28 U.S.C. § 1291. On appeal, the individual plaintiffs assert that the District Court applied incorrect legal standards in evaluating SEPTA’s business necessity defense and that the District Court made erroneous findings of fact in determining that SEPTA’s 1.5 mile run does not violate Title VII. Although the United States initially challenged SEPTA’s implementation of its entire physical fitness test, on appeal the United States joins the individual plaintiffs in asserting error solely with respect to the District Court’s determination that SEPTA’s 1.5 mile run is not violative of Title VII. Because the issue of whether the District Court applied the correct legal *485standard is one of.law, oúr review is plenary.
II.
Under Title VII’s disparate impact theory of liability, plaintiffs establish a prima facie case of disparate impact by demonstrating that application of a facially neutral standard has resulted in a significantly discriminatory hiring pattern. See Dothard v. Rawlinson, 433 U.S. 321, 329, 97 S.Ct. 2720, 53 L.Ed.2d 786 (1977). Once the plaintiffs have established a pri-ma facie case, the burden shifts to the employer to show that the employment practice is “job related for the position in question and consistent with business necessity....” 42 U.S.C. § 2000e-2(k). Should the employer meet this burden, the plaintiffs may still prevail if they can show that an alternative employment practice has a less disparate impact and would also serve the employer’s legitimate business interest. See Albemarle Paper Co. v. Moody, 422 U.S. 405, 425, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975).
Because SEPTA concedes that its 1.5 mile run has a disparate impact on women, the first prong of the disparate impact analysis is not at issue in this appeal.9 Rather, this appeal focuses our attention on the .proper standard for evaluating whether SEPTA’s 1.5 mile run is “job related for the position in question and consistent with business necessity” under the Civil Rights Act of 1991. Because the Act instructs that this standard incorporates only selected segments of prior Supreme Court jurisprudence on the business necessity doctrine, we examine the history of this doctrine in order to resolve this threshold issue.
A.
The disparate impact theory of discrimination under Title VII was judicially created in the seminal case of Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971). In embracing disparate impact, the Court recognized that Title VII was meant not only to proscribe overt discrimination, but also to prohibit “practices that are fair in form, but discriminatory in operation.” Griggs, 401 U.S. at 431, 91 S.Ct. 849. The Court made clear that what is required by Title VII is “the removal of artificial, arbitrary, and unnecessary barriers to employment when the barriers operate invidiously to discriminate on the basis of racial or other impermissible classification.” Id. Accordingly, the Court announced that in evaluating practices fair in form but discriminatory in operation, “[t]he touchstone is business necessity.” Id.
The Court, however, was unclear in articulating what an employer must show to demonstrate business necessity. The Court couched the employer’s burden in terms of showing that its practice is “related to job performance”; “bear[s] a demonstrable relationship to successful performance of the jobs for which it was used”; has “a manifest relationship to the employment in question”; and is “demonstrably a reasonable measure of job performance.” Id. .at 431, 432, 436, 91 S.Ct. 849. In applying this standard, however, the Court rejected the employer’s justification in Griggs that its standardized intelligence tests and diploma requirements generally would improve the overall quality of the work force in its power plant. The Court held that, although these requirements may be useful, they could not be used to exclude disproportionately a protected group when the employer failed to show that they do not test an applicant’s ability to perform the job in question. Id. at 431-33, 91 S.Ct. 849.
*486The Court next spoke to the issue of business necessity in Albemarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975). In Albemarle, an employer sought to justify the use of verbal exam and high school diploma requirements in determining whether to promote employees to more skilled positions in its paper mill. Albemarle, 422 U.S. at 408-11, 95 S.Ct. 2362. In preparation for trial, the employer hired an industrial psychologist to complete validation studies showing that the tests were job related because they had a statistically significant correlation with supervisorial ratings in several groups of the jobs in question. Id. at 429-30, 95 S.Ct. 2362. The Court, nevertheless, rejected the employer’s contention that its requirements were job related.
The Court held that “discriminatory tests are impermissible unless shown, by professionally acceptable methods, to be ‘predicative of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.’” Id. at 431, 95 S.Ct. 2362 (quoting 29 CFR § 1607.4(c)). In so holding, the Court noted that the Equal Employment Opportunity Commission (EEOC) Guidelines for professional standards of test validation are entitled to great deference in determining whether an employer has demonstrated that its requirements are job related. Id. at 430-31, 95 S.Ct. 2362. The Court rejected the employer’s validation studies as inadequate in several respects under the EEOC Guidelines. For example, the Court rejected the studies because they focused on the most qualified employees near the top of the line of progression, stating:
The fact that the best of those employees working near the top of a line of progression score well on a test does not necessarily mean that that test, or some particular cutoff score on the test, is a permissible measure of the minimal qualifications of new workers entering lower level jobs.
Id. at 434, 95 S.Ct. 2362. The Court accordingly held that consideration must be given to the possible use of testing as a promotion device rather than as a screen for entry into lower level jobs. Id. Due to several inadequacies of the employer’s validation studies, the Court held that the employer had failed to show that its requirements were job related to the position in question. Id. at 435-36, 95 S.Ct. 2362.
The next Title VII case to raise the business necessity issue for the Court’s consideration was Dothard v. Rawlinson, 433 U.S. 321, 97 S.Ct. 2720, 53 L.Ed.2d 786 (1977).10 In Dothard, female applicants challenged a prison’s minimum height and weight requirements for its prison guard positions as violative of Title VII. On the issue of business necessity, the Court made clear that “a discriminatory employment practice must be shown to be necessary to safe and efficient job performance to survive a Title VII challenge.” Dothard, 433 U.S. at 332 n. 14, 97 S.Ct. 2720. The Court rejected the prison’s assertion that height and weight requirements have a relationship to the unspecified amount of strength essential to effective job performance, holding that if strength is a bona fide job related quality, the prison could test for it directly by adopting and validating a fairly administered strength test. Id. at 331-32, 97 S.Ct. 2720.
The Court’s next definitive statement on the business necessity doctrine is found in Wards Cove Packing Co., Inc. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989), where a majority of the Court deviated from its previous business necessity jurisprudence in adopting a more lib*487eral test for business necessity.11 According to the Court:
[T]he dispositive issue is whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer. The touchstone of this inquiry is a reasoned review of the employer’s justification for his use of the challenged practice. A mere insubstantial justification in this regard will not suffice, because such a low standard of review would permit discrimination to be practiced through the use of spurious, seemingly neutral employment practices. At the same time, though, there is no requirement that the challenged practice be “essential” or “indispensable” to the employer’s business for it to pass muster....
Wards Cove, 490 U.S. at 659, 109 S.Ct. 2115 (citations omitted). In addition, the Court made clear that at the business necessity stage of Title VII litigation, the employer bears only the burden of production; the burden of persuasion remains on the disparate impact plaintiff at all times. Id. As we have previously recognized, the Wards Cove standard may reasonably be viewed as a departure from the more stringent business necessity standard under Griggs and its progeny. See Newark Branch, N.A.A.C.P. v. Town of Harrison, New Jersey, 940 F.2d 792, 803 (3d Cir.l991)(noting that Wards Cove “arguably diluted the business necessity burden” under Griggs).
B.
In response to Wards Cove, Congress enacted the Civil Rights Act of 1991. One of the primary purposes of the Act was “to codify the concepts of ‘business necessity’ and ‘job related’ enunciated by the Supreme Court in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and in the other Supreme Court decisions prior to Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).” Civil Rights Act of 1991, Pub L. No. 102-166, § 3, 105 Stat. 1071, 1071 (1992). As part of this codification of Griggs, the Act made clear that both the burden of production and the burden of persuasion in establishing business necessity rest with the employer. See 42 U.S.C. § 2000e-2(k).
In addition, the Act codified the business necessity doctrine, by using the following language:
An unlawful employment practice based on disparate impact is established under this subchapter only if—
(i) a complaining" party demonstrates that a respondent uses a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin and the respondent fails to demonstrate that the challenged practice is job related for the position in question and consistent ivith business necessity; or
(ii) the complaining party makes the demonstration described in subpara-graph (C) with respect to an alternative employment practice and the respondent refuses to adopt such alternative employment practice.
*48842 U.S.C. § 2000e-2(k)(l)(A)(emphasis added). The Act further instructs that in interpreting its business necessity language, “[n]o statements other than the interpretive memorandum ... shall be considered legislative history of, or relied upon in any way as legislative history....” Civil Rights Act of 1991, Pub L. No. 102-166, § 105(b), 105 Stat. 1071, 1075 (1992). The interpretive memorandum referenced in this portion of the Act states in relevant part:
The terms “business necessity” and “job related” are intended to reflect the concepts enunciated by the Supreme Court in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and in the other Supreme Court decisions prior to Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).
137 Cong. Rec. 28,680 (1991). After the passage of the Act, proponents of both a strict test for business necessity and a more liberal requirement claimed victory in the standard adopted by the Act.12
III.
The Supreme Court has yet to interpret the “job related for the position in question and consistent with business necessity” standard adopted by the Act. In addition, our sister courts of appeals that have applied the Act’s standard to a Title VII challenge have done so with little analysis. See, e.g., Fitzpatrick v. City of Atlanta, 2 F.3d 1112, 1117-18 (11th Cir.1993)(noting that Civil Rights Act of 1991 statutorily reversed Wards Cove but ruling in favor of employer because practice was demonstrably necessary to meet an “important business goal”); Bradley v. Pizzaco of Nebraska, Inc., 7 F.3d 795, 797-98 (8th Cir.1993)(noting that Griggs standard was reinstated by the Act and holding that employer failed to meet Griggs standard).
Because the Act proscribes resort to legislative history with the exception of one short interpretive memorandum endorsing selective caselaw, our starting point in interpreting the Act’s business necessity language must be that interpretive memorandum. The memorandum makes clear that Congress intended to endorse the business necessity standard enunciated in Griggs and not the Wards Cove interpretation of that standard. By Congress’ distinguishing between Griggs and Wards Cove, we must conclude that Congress viewed Wards Cove as a significant departure from Griggs. Accordingly, because the Act clearly chooses Griggs over Wards Cove, the Court’s interpretation of the business necessity standard in Wards Cove does not survive the Act.13
*489We turn now to articulate the standard for business necessity' — one most-consistent with Griggs and its pre Wards Cove progeny. The laudable mission begun by the Court in Griggs was the'eradication of discrimination through the application of practices fair in form but discriminatory in practice by eliminating unnecessary barriers to employment opportunities. In the context of a hiring exam with a cutoff score shown to have a discriminatory effect, the standard that best effectuates this mission is implicit in the Court’s application of the business necessity doctrine to the employer in Griggs, i.e., that, a discriminatory cutoff score is impermissible unless shown to measure the minimum qualifications necessary for successful performance of the job in question. Only this standard can effectuate the mission begun by the Court in Griggs; only by requiring employers to demonstrate that their discriminatory cutoff score measures the minimum qualifications necessary for successful performance of the job in question can we be certain to eliminate the use of excessive cutoff scores that have a disparate impact on minorities as a method of imposing unnecessary barriers to employment opportunities.
The evolution of the Court’s articulation of the business necessity doctrine in both Albemarle and Dothard reinforces the conclusion that this standard is both implicit in Griggs and central to its mission. In Albemarle, the Court explained that discriminatory tests must be validated to show that they are “predictive of ... important elements of work behavior which comprise ... the job ... for which candidates are being evaluated” and that the scores of the higher level employe.es do not necessarily validate a cutoff score for the minimum qualifications to perform the job at an entry level. Albemarle, 422 U.S. at 431, 434, 95 S.Ct. 2362. This is simply another way of saying that discriminatory cutoff scores must, be validated to show they measure the minimum qualifications necessary for successful performance of the job. Similarly, in Dothard, the Court made clear that “a discriminatory employment practice,” such as a discriminatory cutoff score on an entry level 'exam, “must be shown to be necessary to- sáfe ' and efficient job performance to survive a Title VII challenge.” Dothard, 433 U.S.' at 332 n. 14, 97 S.Ct. 2720. ■ '
Taken together, Griggs, Albemarle and Dothard teach that in order to show the business necessity of a discriminatory cutoff score an employer must demonstrate that its cutoff measures the minimum qualifications necessary for successful performance of the job in question. Furthermore, because the Act instructs us to in-, terpret its business necessity language in conformance with Griggs and its preWards Cove progeny, we must conclude that the Act’s business necessity language incorporates this standard.
Our conclusion that the Act incorporates this standard is further supported by .the business necessity language adopted by the Act. Congress chose the terms “job related for the position in question” and “consistent with business necessity.”- Judicial application of a standard focusing solely on whether the qualities measured by an entry level exam bear some relationship to the job in question would imper-missibly write out the business necessity prong of the Act’s chosen standard. With respect to a discriminatory cutoff score, the business necessity prong must be read to demand an inquiry into whether the score reflects the minimum qualifications necessary to perform successfully the job in question. See also EEOC Guidelines, 29 C.F.R. § 1607.5(H) (noting that cutoff scores should “be. set so as to be reasonable and consistent with -normal expectations of acceptable proficiency within the work force.”).
In addition, Congress’- decision to emphasize the importance of the policies underlying the disparate impact theory of discrimination through its codification supports application of this standard to discriminatory cutoff scores. The disparate *490impact theory of discrimination combats not intentional, obvious discriminatory policies, but a type of covert discrimination in which facially neutral practices are employed to exclude, unnecessarily and disparately, protected groups from employment opportunities. Inherent in the adoption of this theory of discrimination is the recognition that an employer’s job requirements may incorporate societal standards based not upon necessity but rather upon historical, discriminatory biases.14 A business necessity standard that wholly defers to an employer’s judgment as to what is desirable in an employee therefore is completely inadequate in combating covert discrimination based upon societal prejudices.
Only a business necessity doctrine that examines discriminatory cutoff scores in light of the minimum qualifications that are necessary to perform the job in question successfully can address adequately this subtle form of discrimination.15
Accordingly, we hold that the business necessity standard adopted by the Act must be interpreted in accordance with the standards articulated by the Supreme Court in Griggs and its pre-Warcfe Cove progeny which demand that a discriminatory cutoff score be shown to measure the minimum qualifications necessary for the successful performance of the job in question in order to survive a disparate impact challenge.16
*491IV.
Although the District Court purported to apply the Act’s “job related to the position in question and consistent with business necessity” standard to SEPTA’s cutoff score on its 1.5 mile run, it is clear from the District Court’s memorandum opinion that it did not apply the standard we have found to be implicit in Griggs and incorporated by the Act. The District Court rejected the formulation of the Griggs standard found in Dothard, characterizing it as dicta, and relied instead upon language found in New York City Transit Auth. v. Beazer, 440 U.S. 568, 99 S.Ct. 1355, 59 L.Ed.2d 587 (1979). As our prior discussion makes clear,17 the Beazer language is dicta and the Dothard standard is binding under the Act. Moreover, the Beazer dicta upon which the District Court relied mirrors the standard adopted by Wards Cove. Compare Banning, 1998 WL 341605 at *54 (noting that in Beazer, the Court “implicitly approves employment practices that significantly serve, but are neither required by nor necessary to, the employer’s legitimate business interests”) with Wards Cove, 490 U.S. at 659, 109 S.Ct. 2115 (stating . that standard is “whether a challenged practice serves, in a significant way, the legitimate employment goals of the employer” and noting that there is no requirement that the practice be essential). As we previously stated, the-Wards Cove standard does not survive the Act.
The District Court’s application, of its understanding of business necessity to SEPTA’s business justification further illustrates that the District Court did not apply the correct legal standard. As an initial matter, the District Court seemed to conclude that Dr. Davis’ expertise alone is sufficient to justify the 42.5 mL/kg/min aerobic capacity cutoff measured by the 1.5 mile run.18 .This conclu*492sion disregards the teachings of Griggs, Albemarle and Dothard in which the Court made clear that judgment alone is insufficient to validate an employer’s discriminatory practices.19 More fundamentally, however, nowhere in its extensive opinion did the District Court consider whether Dr. Davis’ 42.5 mL/kg/min cutoff reflects the minimum aerobic capacity necessary to perform successfully the job of SEPTA transit police officer.
Instead, the District Court upheld this cutoff because it was “readily justifiable.” Lanning, 1998 WL 341605 at *57.20 The validation studies of SEPTA’s experts upon which the District Court relied to support this conclusion demonstrate the extent to which this standard is insufficient under the Act. The general import of these studies is that the higher an officer’s aerobic capacity, the better the officer is able to perform the job. Setting aside the validity of these studies, this conclusion alone does not validate Dr. Davis’ 42.5 mL/kg/ min cutoff under the Act’s business necessity standard.21 At best, these studies show *493that aerobic capacity is related to the job of SEPTA transit officer. A study showing that “more is better,” however, has no bearing on the appropriate cutoff to reflect the minimal qualifications necessary to perform successfully the job in question.
Dr. Siskin’s testimony is particularly instructive on this point. Dr. Siskin testified that in view of the linear relationship between aerobic capacity and the arrest parameters, any cutoff score can be justified since higher aerobic capacity levels will get you more field performance (ie., “more is better”). See Lanning, 1998 WL 341605 at *41. Under the District Court’s understanding of business necessity, which requires only that a cutoff score be “readily justifiable,” SEPTA, as well as any other employer whose jobs entail any level of 'physical capability, could employ an unnecessarily high cutoff score on its physical abilities entrance exam in an effort to exclude virtually all women by justifying this'facially'neutral yet discriminatory practice on the theory that more is better.22 This result contravenes Griggs and demonstrates why, under Griggs, a discriminatory cutoff score must be shown to measure the minimum qualifications necessary to perform successfully the job in question,23
*494V.
For the foregoing reasons, it is clear to us that the District Court did not employ the business necessity standard implicit in Griggs and incorporated by the Act which requires that a discriminatory cutoff score be shown to measure the minimum qualifications necessary for successful performance of the job in question in order to survive a disparate impact challenge. We will therefore vacate the judgment of the District Court and remand this appeal for the District Court to determine whether SEPTA has carried its burden of establishing that its 1.5 mile run measures the minimum aerobic capacity necessary to perform successfully the job of SEPTA transit police officer.2'1 Because this is the first occasion we have had to clarify the Act’s business necessity standard, on remand the District Court may wish to exercise its discretion to allow the parties to develop further the record in keeping with the standard announced here.
. Under the zone concept, SEPTA designated eight separate zones covering the subway system. In a typical .zone, one Lieutenant is assigned to command the zone. Two Sergeants are also assigned to the zone. Three shifts of officers per day tour the zone. Beats within the zones are assigned to the individual officers. Beats are reassigned periodically to familiarize the officers with the entire zone. Officers patrol their beats alone and on foot.
. Dr. Davis is an expert exercise physiologist who has extensive experience in designing physical fitness employment tests for various law enforcement agencies.
.Dr. Davis initially decided that an aerobic capacity of 50 mL/kg/min was necessary to perform the job of SEPTA transit police officer. After determining that institution of such a high standard would have a draconian effect on women applicants, however, Dr. Davis decided that the goals of SEPTA could be satisfied by using a 42.5 mL/kg/min standard.
. SEPTA contends that it did not seek applicants in 1992. Credited testimony was of- ' fered, however, that each' of the six or seven women who ; took the 1.5 mile test in 1992 failed. Relying on this testimony, the District Court found that the disparate impact on women was slightly more pronounced than the 1991, 1993, and 1996 figures reflect. See Lanning, 1998 WL 341605 at *28.
. For example, one proffered study showed that approximately 47% of men between the ages of 20 to 29 can perform a 1.5 mile run in 12 minutes where only 12% of women in the same age category can achieve this time. As noted by the District Court, testimony was offered that this study may not be entirely reliable because the women who participated in the study were predominately white women of higher socioeconomic status.' Other research studies, however, were offered which show that men generally have a higher- aerobic rate than women due to physiological differences between the sexes.
.The District Court pointed to one document, for example, indicating that between July 1, 1994 and August 22, 1995, the percentage of uniformed personnel who failed- the fitness test was as follows: a) Age group 20-30: 10% of all officers; b) Age group 30-40: 30% of all officers and 12% of all supervisors; c) Age group 40-50: 45% of all officers and 52% of all supervisors; d) Age group 50-60: 55% of all officers and 40% of all supervisors. See Lanning, 1998 WL 341605 at *31.
. “Arrest rates" were tabulated by expressing the number of arrests made by an officer as a percentage of the number of incident reports involving that officer. See App. at 3040-41 (Siskin Expert Report).
. The category of “serious crimes" includes homicide, rape, robbery, aggravated assault, burglary, theft, and auto theft. This category of arrests accounts for approximately ten percent of all reported incidents and seven percent of all reported arrests. See App. at 3040. (Siskin Expert Report).
. On appeal, SEPTA offered evidence to establish that the individual female applicants who failed SEPTA’s 1.5 mile run demonstrated a cavalier attitude in preparing for and taking the test. As aptly noted by plaintiffs’ counsel at oral argument, this evidence has no bearing upon our analysis in this appeal because SEPTA has conceded that its test has a severe disparate impact on women.
. Prior lo Dothard, the Court included some language related to the business necessity doctrine in Washington v. Davis, 426 U.S. 229, 96 S.Ct. 2040, 48 L.Ed.2d 597 (1976), an equal protection case. Because Washington is not a Tide VII case, however, we cannot treat the language in Washington as reflective of the prc-Wards Cove business necessity doctrine applicable to Title VII cases.
. Two cases prior to Wards Cove forecast some of the changes to come. In New York City Transit Auth. v. Beazer, 440 U.S. 568, 99 S.Ct. 1355, 59 L.Ed.2d 587 (1979), the Court disposed of a Title VII case by holding that the plaintiffs failed to establish a prima facie case of disparate impact. The Court, however, commented on the business necessity doctrine in dicta. In a footnote, the Court stated that even if a prima facie case had- been established, the employer would have shown business necessity by establishing that its practice significantly serves its legitimate business goals of safety and efficiency. Beazer, 440 U.S. at 587 n. 31, 99 S.Ct. 1355. Similarly, a plurality opinion in Watson v. Fort Worth Bank & Trust, 487 U.S. 977, 108 S.Ct. 2777, 101 L.Ed.2d 827 (1988), suggested that employers could meet their burden of establishing business necessity simply by advancing a legitimate business reason for the practice in question. Watson, 487 U.S. at 998, 108 S.Ct. 2777. While the language in these cases clearly foreshadowed the Court’s holding in Wards Cove, this language had never been embraced by a majoriLy of the Court as the binding standard for business necessity prior to Wards Cove.
. See Andrew C. Spiropoulos, Defining the Business Necessity Defense to the Disparate Impact Cause of Action: Finding the Golden Mean, 74 N.C. L.Rev. 1479, 1516-20 (1996)(outlining the respective positions of both sides to the debate); compare also Michael Carvin, Disparate Impact Claims Under the New Title VII, 68 Notre Dame L.Rev. 1153 (1993)(arguing that Wards Cove is still good law after Civil Rights Act of 1991); with Susan S. Grover, The Business Necessity Defense in Disparate Impact Discrimination Cases, 30 Ga. L.Rev. 387 (1996)(arguing for a strict business necessity standard under the Act); Note, The Civil Rights Act of 1991: The Business Necessity Standard, 106 Harv. L.Rev. 896 (1993)(asserting that Wards Cove does not survive the Act).
. We are cognizant that a contrary argument has been advanced in which it is asserted that Wards Cove remains the controlling standard. See Carvin, supra note 12, at 1157-64. Pursuant to the argument, the business necessity standard announced in Wards Cove simply clarified Griggs and therefore is not inconsistent with the Act’s command to apply the standard enunciated in Griggs. In addition, it is asserted that due to the legislative history of the Act, it would be improper to apply a strict business necessity standard. This argument, however, ignores two important aspects of the Act which constrain our interpretation of the standard adopted. First, the interpretive memorandum’s distinction between Griggs and Wards Cove casts significant doubt on the assertion that Congress read Wards Cove as simply a clarification of Griggs. Second, the Act precludes us from considering the legislative history upon which this argument relies for support. Accordingly, we find this argument to be devoid of merit.
. For an interesting discussion on male-oriented biases in the labor market see Maxine N. Eichner, Getting Women Work That Isn't Women's Work: Challenging Gender Biases in the Workplace Under Title VII, 97 Yale LJ. 1397 (1988). See also, Hurley v. The Atlantic City Police Dept., 174 F.3d 95, 104 n. 5 (3d Cir.1999)(noting egregious sexual harassment to which a female police officer was subjected by her male colleagues); Mazus v. Department of Transp., Com. of Pa., 629 F.2d 870, 876 (3d Cir.1980)(Sloviter, J., dissenting)(noting allegations demonstrating prevalent male attitude that construction work is not the "type of work” women should perform).
. We need not be concerned that implementation of this standard will result in forcing employers to adopt quotas, a result that would be inconsistent with the mandates of Title VII. If an employer can demonstrate that its discriminatory cutoff score reflects the minimum qualifications necessary for successful job performance, it will be able to continue to use it. If not, the employer must abandon that cutoff score, but is free to develop either a non-discriminatory practice which furthers its goals, or an equally discriminatory practice that can meet this standard. Nothing in the Griggs business necessity standard requires employers to hire employees in numbers to reflect the ethnic, racial or gender make-up of the community.
The following example based up'on the facts of this case illustrates this point. Assuming that SEPTA’s 1.5 mile run has a disparate impact on women and that SEPTA can not show that the 12 minute cutoff measures the minimum aerobic capacity necessary to -be a successful transit officer, it does not follow that SEPTA would then be required to hire women in equal proportion to men. Several options would be available to SEPTA. For example, SEPTA could: 1) abandon the test as a hiring requirement but maintain an incentive program to encourage an increase in the officers' aerobic capacities; 2) validate a cutoff score for aerobic capacity that measures the minimum capacity necessary to successfully perform the job and maintain incentive programs to achieve even higher aerobic levels; or 3) institute a non-discriminatoiy test for excessive levels of aerobic capacity such as a test that would exclude 80% of men as well as 80% of women through separate aerobic capacity cutoffs for the different sexes. Each of these options would help SEPTA achieve its stated goal of increasing aerobic capacity without running afoul of Title VII and none of these options require hiring by quota.
.Relying upon Spurlock v. United Airlines, Inc., 475 F.2d 216 (10th Cir.1972), and like cases from our sister courts of appeals, the dissent asserts that this standard should not apply to SEPTA because the job of SEPTA transit officer implicates issues of public safety. Under the Act, however, our interpretation of the business necessity language is limited to "the concepts enunciated by the Supreme Court in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), and in the other Supreme Court decisions prior to Wards Cove Packing Co. v. Atonio, 490 U.S. 642, 109 S.Ct. 2115, 104 L.Ed.2d 733 (1989).” See 137 Cong. Rec. 28,680 (1991)(emphasis added). Because the Supreme Court never adopted the holding of Spurlock prior to Wards Cove, its is clear that, under the Act, we are not to consider Spurlock as authoritative. Furthermore, if Congress had intended to endorse *491the holding of Spurlock, it could have done so affirmatively. Accordingly, because the Act limits our interpretation to Supreme Court jurisprudence and does not otherwise endorse Spurlock, we are not at liberty to adopt the holding of Spurlock at this juncture. Moreover, to the extent that Spurlock and other cases from our sister courts of appeals can be read to suggest that minimum qualifications do not apply to certain types of employment, these cases are inconsistent with the teachings of Griggs and are accordingly uninformative under the Act.
Furthermore, to the limited extent that the Supreme Court’s pre-Wanis Cove jurisprudence instructs that public safety is a legitimate consideration, application of the business necessity standard to. SEPTA is consistent with that jurisprudence because the standard itself takes public safety into consideration. If, for example, SEPTA can show on remand that the inability of a SEPTA transit officer to meet a certain ■ aerobic level would significantly jeopardize public safety, this showing ■ would be relevant to • determine if that level is necessary for the successful performance of the job. Clearly a SEPTA officer who poses a significant risk to public safety could not be considered to be1 performing his job successfully. We are accordingly confident that application of the business necessity standard to SEPTA is fully consistent with the Supreme Court's pre-Wards Cove jurisprudence as required by the Act.
. See supra note 11.
. While relying predominately upon Dr. Davis’ expertise, the District Court does point to a study which Dr. Davis completed for Anne Arundel County, Maryland in which he concluded that a 42.5 mL/kg/min.aerobic capacity predicted success as an Anne Arundel County police officer. Absent a finding that the work of an Anne Arundel County police officer is comparable to SEPTA transit officer work, a finding the District Court did not ■ make, reliance on this validation study is misplaced. See 29 C.F.R. § 1607.7(B)(2); see also .29 C.F.R. § 1607.7(B)(3)(explaining that validation studies created for other employers must also include a study of "test fairness”). Furthermore, it is unclear from Dr. Davis’ report whether the Anne Arundel study's 42.5 mL/kg/min cutoff actually measures for qualities significant to SEPTA transit police performance. Compare App. at 3134 (Davis Report) (noting that 42.5 mL/kg/min level for Anne Arundel study is significant for carrying an unspecified amount of weight and generally effecting arrests) with App. at 3132 (Davis Report) (stating ”[t]ransit police officers are more likely to have incidents come to them, as opposed to responding to the scene of an event. By mission, the presence of the officer *492is that of a deterrent, maintaining maximum visibility. Occasionally, officers will come upon criminal activities such as assaults or robberies, but for the most part, the officer will attempt to control a situation such as disorderly conduct or force compliance (paying fares) without having to make an arrest.”); see also App. at 3139 (Davis Re-porL)(quoting experienced officer as stating "[t]he most important factors in my opinion of being a good officer is to be able to think clearly at all times an [sic] verbalize and or articulate when dealing with all people.... Running quickly is physically demanding, although in the transit system, most dealings are close, physical altercations.”). In addition, it is unclear from the record whether the Anne Arundel study itself was properly validated.
.The danger of allowing an employer to carry its burden by relying simply upon an expert’s unvalidated judgment as to an appropriate cutoff score in a testing device is illustrated by this case. In determining an appropriate cutoff for aerobic capacity, Dr. Davis rejected the SMEs' estimate of the minimal qualifications necessary to perform the job even though these SMEs were experienced transit officers. Dr. Davis then determined that "a SEPTA transit officer needs an aerobic capacity of 50 ml/kg/min to successfully perform a number of tasks.” tanning, 1998 WL 341605 at *16 (emphasis added). Dr. Davis, however, revised this requirement, finding that "the goals of SEPTA could be satisfied by using a 42.5 mL/kg/min standard” after determining that the higher limit would have a "draconian” effect on women. Id. There is no indication in the District Court’s opinion as to how Dr. Davis determined that the lower standard would be sufficient. Where, as here, the cutoff score chosen has a discriminatory disparate impact, Griggs prohibits the establishment of exactly this type of arbitrary barrier to employment opportunities.
. The District Court seems to have derived this standard from the Principles for the Validation and Use of Personnel Selection Procedures ("SIOP Principle"), principles published by the Society for Industrial and Organizational Psychology as a professional guideline for conducting validation research and personnel selection. To the extent that the SIOP Principles are inconsistent with the mission of Griggs and the business necessity standard' adopted by the Act, they are not instructive.
. The Court has cautioned that studies done in anticipation of litigation to validate discriminatory employment tests that have already been given must be examined with great care due to the danger of lack of objectivity. Albemarle, 422 U.S. at 433 n. 32, 95 S.Ct. 2362. We also have warned in a disparate impact context that "the story statistics tell depends, not unlike beauty, upon the eye and ear of the beholder” and that "we must apply a critical and cautious ear to one dimensional statistical presentation.” Bryant v. International Sch. Servs., Inc., 675 F.2d 562, 573 (3d Cir.1982). A critical evaluation of the statistical studies relied upon by the District Court in this case, reveals several aspects of these studies that we find to be, at a minimum, disconcerting.
The following concerns are only a representative sample of possible deficiencies in these studies: 1) While the ability to make an arrest may be an important aspect of the job, the absolute number of arrests or "arrest rates” do not necessarily correlate with successful job performance. See App. at 3132 (noting that SEPTA officer should generally attempt to control a situation without having to make an arrest); 2) The study on arrests and arrest rates examined a disproportionately large number of officers with an aerobic capacity over 42 mL/kg/min compared to the number of officers with an aerobic capacity under that *493level which likely skewed the results. See, e.g., App. at 3053 (comparing arrests of 231 officers with aerobic capacities under the 42 mL/kg/min with arrests of 813 officers with aerobic capacities over the 42 mL/kg/min)'; see also, 29 C.F.R. § 1607.14(B)(6)(noting that “[r]eliance upon a selection procedure which is significantly related to a criterion measure, but which is based upon a study involving a large number of subjects and has a low correlation coefficient will be subject to close review if it has a large adverse impact.”); 3) The comparison of aerobic, capacity with commendations is not helpful absent finding as to the subjective considerations involved in awarding commendations. See Al-bemarle, 422 U.S. at 432-33,. 95 S.Ct. 2362; 4) The studies' emphasis on arrests for "serious crimes” is suspect; these arrests account for only 7% of all arrests and therefore represent only a small aspect of job. See generally 29 C.F.R. § 1.607.14(B)(6)(noting that reliance on single selection instrument which is related to only one of many job duties will be subject to close réview); 5) SEPTA’s table on the field performance of its officers belies the contention that there is a strict linear relationship of arrests to aerobic capacity; officers at less than 37 mL/kg/min had an average arrests of 13.6 compared to officers with at least a 48 mL/kg/min level who had average arrests of 13.9. See App. at 3065 (Defendant's Exhibit 52D); 6) The study on the average aerobic capacity of perpetrators has little meaning unless SEPTA can show that arrests of these perpetrators are typically aerobic contests; because SEPTA police are armed, such a showing is unlikely.
Because we are remanding for the District Court to reconsider this evidence in light of the Griggs standard, we need not rule on whether any of the District Court's prior findings as to these studies were clearly erroneous. We comment here on the validity of these studies only to draw the District Court's attention to these concerns and to encourage the District Court to take a critical look at these studies, if necessary, on remand.
. Such a result has the potential to have a significant detrimental impact on the amount and type of employment opportunities available to women. Obviously, under a "more is better” theory, employers such as police departments, fire departments and correctional facilities could develop physical tests with unnecessarily high cutoffs that would effectively exclude women from their ranks. Perhaps less obvious, however, is the impact that this result could . have on industries where strength even minimally related to the job in question. For example, all companies engaged in delivery, construction or any other type of physical labor would be permitted to develop unnecessary strength requirements on the theory that "more is better” or "the stronger the worker, the faster the job gets done.” This result is clearly unacceptable given the policies underlying both Title VII and the disparate impact theory of discrimination.
. This is not to say that studies that actually prove that "more is better” are always irrelevant to validation of an employer’s discriminatory practice. For example, a content validated exam, .such as a typing exam for the position of typist, which demonstrates that the applicants who score higher on the exam will exhibit better job performance may justify a rank-ordering hiring practice that is discriminatory. In such a case, a validation study proving that "more is better” may suffice to validate the rank-order hiring. This is true, however, in only the rarest of cases where the exam tests for qualities that fairly represent *494the totality of a job’s responsibilities. It is unlikely that such a study could validate rank-hiring with a discriminatory impact based upon physical attributes in complex jobs such as that of police officer in which qualities such as intelligence, judgment, and experh ence surely play a critical role. This is especially true in SEPTA’s case, where the record indicates that SEPTA patrol officers encounter “running assists,” the most strenuous task upon which SEPTA’s aerobic capacity testing predominately was justified, at an average rate of only twice per year. Compare Lan-ning, 1998 WL 341605 at *5 (finding that SEPTA has approximately 380 running assists per year) with id. at *27 (noting that SEPTA has 190 patrol officers).
. In addition to the law review commentaries cited by the majority, see also Rosemary Alito, Disparate Impact Discrimination Under the 1991 Civil Rights Act, 45 Rutgers L.Rev. 1011, 1033 (1993) ("Only ... cases requiring proof of job-relatedness and a reasonable need for the challenged practice accord[ ] with both the statutory language of the 1991 Act and the applicable Supreme Court precedent.”); Kingsley R. Browne, The Civil Rights Act Of 1991: A "Quota Bill," A Codification Of Griggs, A Partial Return To Wards Cove, Or All Of The Above?, 43 Case W. Res. L.Rev. 287, 349 (1993) ("business necessity” has the same meaning as the Wards Cove phrase "serves, in a significant way”); Linda Lye, Comment, Title VII’s Tangled Tale: The Erosion and Confusion of Disparate Impact and the Business Necessity Defense, 19 Berkeley J. Employment & Lab. L. 315, 358 (1998) (a challenged practice must be a "reasonable predictor of effective performance of job duties,” defined in light of "important business goals”).
. The fear of quota hiring was behind the President's refusal to sign earlier versions of the bill. See Statement of President George Bush Upon Signing S. 1745, reprinted in 1991 U.S.C.C.A.N. 768 (stating that the Act promotes the goals of ridding discrimination, allowing employers to hire, on the “basis of merit and ability without the fear of unwarranted litigation,” without leading to quotas or incentives for needless litigation). For a discussion of the drafting of the Civil Rights Act of 1991, see, 2 Lex K. Larson, Employment Discrimination § 23.04[1] (2d ed.1999). For ' analysis of the rejected 1990 bill, see Cynthia L. Alexander, The Defeat of the Civil Rights Act of 1990: Wading Through the Rhetoric In Search of Compromise, 44 Vand. L.Rev. 595 (1991).
. The District Court rejected as irrelevant the plaintiffs’ evidence that incumbent officers had failed the physical fitness test yet successfully performed the job and that other police forces function well without an aerobic capacity admission test. See Lanning, 1998 WL 341605 at *68-*70. Under the standard implicit in Griggs and incorporated into the Act, this evidence tends to show that SEPTA’s cutoff score for aerobic capacity does not correlate with the minimum qualifications necessary to perform successfully the job of SEPTA transit officer. Accordingly, this evidence is relevant and should be considered by the District Court on remand.