Roberta Ottaviani, Individually and on Behalf of All Other Persons Similarly Situated, Carolee Schneemann, Joan Marie De La Cova, Dorothy Jessup, Individually and on Behalf of All Other Persons Similarly Situated, Plaintiffs-Intervenors-Appellants v. State University of New York at New Paltz, and Clifton R. Wharton, Jr., in His Capacity as Chancellor of the State University of New York, Harriet Klapper, Plaintiff-Intervenor-Appellee v. State University of New York at New Paltz, Clifton R. Wharton, Jr., Alice Chandler, Peter Vukasin, and the Trustees of the State University of New York

875 F.2d 365

51 Fair Empl.Prac.Cas. 330,
50 Empl. Prac. Dec. P 39,019, 53 Ed. Law Rep. 1082

Roberta OTTAVIANI, Individually and on Behalf of all other
persons similarly situated, Plaintiff-Appellant,
Carolee Schneemann, Joan Marie de la Cova, Dorothy Jessup,
Individually and on behalf of all other persons
similarly situated,
Plaintiffs-Intervenors-Appellants,
v.
STATE UNIVERSITY OF NEW YORK AT NEW PALTZ, and Clifton R.
Wharton, Jr., in his capacity as Chancellor of the
State University of New York,
Defendants-Appellees.
Harriet KLAPPER, Plaintiff-Intervenor-Appellee,
v.
STATE UNIVERSITY OF NEW YORK AT NEW PALTZ, Clifton R.
Wharton, Jr., Alice Chandler, Peter Vukasin, and
the Trustees of the State University of
New York, Defendants-Appellants.

No. 49, Docket 88-7159.

United States Court of Appeals,
Second Circuit.

Argued Oct. 17, 1988.
Decided May 9, 1989.

Eleanor Jackson Piel, New York City, for plaintiffs-appellants.

Judith T. Kramer, Asst. Atty. Gen., New York City (Robert Abrams, Atty. Gen. State of N.Y., Jan P. Ryan, Marilyn T. Trautfield, Asst. Attys. Gen., of counsel), for defendants-appellees.

Before VAN GRAAFEILAND, CARDAMONE and PIERCE, Circuit Judges.

PIERCE, Circuit Judge:

This is an appeal from a judgment of the United States District Court for the Southern District of New York, Kram, J., in which the court found in favor of defendants on all of the Title VII claims asserted by individual faculty members and a class of similarly situated plaintiffs, following a lengthy bench trial.1 The decision of the district court is published in a thorough and lengthy opinion at 679 F.Supp. 288 (S.D.N.Y.1988), familiarity with which is assumed herein. Appellants contend the district court erred in its decision and principally attack the district court's treatment of the evidence presented in support of their Title VII claims. For the reasons that follow, we affirm.

BACKGROUND

This complicated Title VII suit was commenced by and on behalf of full-time, academic rank female faculty members at the State University of New York ("SUNY") at New Paltz ("the University") who were employed in the University's Division of Liberal Arts and Sciences at any time between academic years 1973 and 1984. The plaintiffs alleged that between 1973 and 1984, the University discriminated against female members of its faculty on the basis of gender in three separate categories: (1) placement in initial faculty rank at the University, (2) promotion into higher rank, and (3) salary. Judge Kram conducted a bench trial which extended over nine months on all of the plaintiffs' claims, and both parties presented extensive evidence to the court. For the sake of brevity, we will discuss only so much of the proceedings below as is relevant to our discussion of the key issues raised on appeal.

During the trial, the district court basically considered two types of evidence--objective statistical evidence and extensive "anecdotal" evidence. The statistical evidence presented by both sides consisted primarily of data produced by means of various "multiple regression analyses." Depending upon the party presenting the statistical evidence, the data was intended to either demonstrate or rebut the plaintiffs' claim of a pattern of ongoing discrimination against women within the University in all three of the contested categories.

A. The Statistical Evidence

Multiple regression analysis is a statistical tool commonly used by social scientists to determine the influence that various independent, predetermined factors (so-called "independent variables") have on an observed phenomenon (the so-called "dependent variable"). See Eastland v. Tennessee Valley Auth. 704 F.2d 613, 621 (11th Cir.1983), cert. denied, 465 U.S. 1066, 104 S.Ct. 1415, 79 L.Ed.2d 741 (1984); Fisher, Multiple Regression in Legal Proceedings, 80 Colum.L.Rev. 702, 702, 705-06 (1980). In disparate treatment cases involving claims of gender discrimination, plaintiffs typically use multiple regression analysis to isolate the influence of gender on employment decisions relating to a particular job or job benefit, such as salary. See, e.g., Sobel v. Yeshiva Univ., 839 F.2d 18, 21-22 (2d Cir.1988); EEOC v. Sears, Roebuck & Co., 839 F.2d 302, 324-25 & n. 22 (7th Cir.1988); Palmer v. Schultz, 815 F.2d 84, 90-91 (D.C.Cir.1987).

The first step in such a regression analysis is to specify all of the possible "legitimate" (i.e., nondiscriminatory) factors that are likely to significantly affect the dependent variable and which could account for disparities in the treatment of male and female employees. See Sobel, 839 F.2d at 20-21; Segar v. Smith, 738 F.2d 1249, 1261 (D.C.Cir.1984), cert. denied, 471 U.S. 1115, 105 S.Ct. 2357, 86 L.Ed.2d 258 (1985); Fisher, supra, at 713-14. By identifying those legitimate criteria that affect the decision making process, individual plaintiffs can make predictions about what job or job benefits similarly situated employees should ideally receive, and then can measure the difference between the predicted treatment and the actual treatment of those employees. If there is a disparity between the predicted and actual outcomes for female employees, plaintiffs in a disparate treatment case can argue that the net "residual" difference represents the unlawful effect of discriminatory animus on the allocation of jobs or job benefits. See Palmer, 815 F.2d at 90-91; D. Baldus & J. Cole, Statistical Proof of Discrimination Sec. 3.2, at 94 (1980); id. Sec. 8.02, at 245-46.2

In this case, the parties' statistical experts each determined what factors they thought were relevant to the setting of salaries and rank at the University, and used those factors as independent variables in their multiple regression analyses. By accounting for all of the "legitimate" factors that could affect salary and rank in general, the plaintiffs hoped to prove that there was a net "residual" difference or disparity between the predicted and actual salaries and rank of female faculty members that could only be attributed to ongoing gender discrimination within the University. Conversely, the defendants sought to attribute observed disparities in the pay and rank of male versus female faculty members to "legitimate" factors such as unequal job qualifications.

1. Plaintiffs' Proof of Salary Discrimination

a. Plaintiffs' Main Salary Study

The plaintiffs' main salary study was contained in Trial Exhibit 882 and purported to demonstrate the difference in salaries between male and female faculty members at New Paltz. According to the plaintiffs' statistical expert, Dr. Mary Gray, women actually earned from $1,036 to $2,277 less than their predicted salaries in each year of the class period. The defendants challenged these findings on several grounds, but principally attacked the plaintiffs' study for its failure to include certain independent variables which the defendants claimed were influential in the setting of faculty salaries at the University.

The plaintiffs' main salary study incorporated the following independent variables: (1) number of years of full time teaching experience prior to hire at New Paltz; (2) number of years' teaching experience in academic rank at New Paltz; (3) possession of a doctorate degree; (4) number of years since obtaining the doctorate degree; (5) number of publications; (6) other experience prior to hire at New Paltz; and (7) years of full-time high school teaching experience. The plaintiffs' statistical expert, however, did not include academic rank variables in her main salary study such as prior rank, current rank, and years in current rank. Although Dr. Gray conceded that these three factors may influence salary decisions, she maintained that academic rank itself was subject to discrimination at New Paltz, and that the use of rank variables would therefore be inappropriate.

In connection with this assertion, the plaintiffs attempted to demonstrate that female faculty members were placed in lower academic ranks at New Paltz than their male counterparts, and promoted more slowly into higher academic ranks than their male counterparts, solely because of their gender.3 The defendants' statistical expert, Dr. Judith Stoikov, responded by attempting to prove that rank at New Paltz was not discriminatory. After considering all the evidence as to rank, the district court rejected plaintiffs' proof as "unpersuasive," and concluded that plaintiffs had "failed to prove that rank at New Paltz was discriminatory." Ottaviani, 679 F.Supp. at 306.

The district court's rejection of plaintiffs' claims as to discrimination in rank at New Paltz had two important consequences for the plaintiffs' case. First, the court's ruling eliminated two of the contested categories of discrimination at New Paltz, and left the salary discrimination claim as plaintiffs' only remaining Title VII claim. Second, and equally important from the plaintiffs' perspective, the court's ruling "validated" academic rank as one of the legitimate factors to consider in accounting for salary disparities between male and female faculty members. Since the court considered the academic rank of faculty members to be a legitimate influencing factor on faculty salaries at New Paltz, and since the plaintiffs' main salary study failed to include academic rank variables, the court found the plaintiffs' principal study to be fundamentally flawed and less probative of discrimination than it otherwise might have been.

b. Plaintiffs' Other Salary Studies

Apart from their main salary study, the plaintiffs had also performed salary regressions which did include rank variables. Since these other studies did include what the court considered to be most of the relevant legitimate factors which could influence salary at New Paltz, the court accordingly looked primarily to these studies to determine whether the plaintiffs had made out a prima facie case of gender discrimination.

After considering and weighing all the evidence presented, the district court reached certain conclusions with respect to both the plaintiffs' and the defendants' statistical evidence. While the district judge found some of the plaintiffs' statistical evidence "persuasive," she thought that it was insufficient to establish a prima facie case of gender discrimination. On the other hand, the district judge did not believe that defendants' statistical evidence was sufficient to rebut the plaintiffs' discrimination claims altogether. Since the judge found the statistical evidence to be inconclusive one way or the other, she ruled that whether or not the plaintiffs could prevail on their discrimination claims would depend upon whether the totality of the evidence adduced at trial supported a finding of discrimination. Accordingly, the district judge next considered whether the extensive anecdotal evidence proffered by plaintiffs supported their claims of discrimination.

B. The Anecdotal Evidence

The anecdotal evidence at trial consisted of various narrative descriptions of events at the University which the plaintiffs contended illustrated or proved that the University had discriminated against its female faculty members. Specifically, the plaintiff class members sought to establish that: (1) the University did not have a viable affirmative action program; (2) New Paltz's methods for identifying and correcting existing salary inequities from 1973 to 1984 were either flawed or non-existent; (3) the University either retrenched or eliminated faculty positions to the detriment of its female faculty members; and (4) the University demonstrated a disdain for women's issues through its handling of the Women's Studies Program at New Paltz. Eleven witnesses also testified about individual instances of alleged salary discrimination at New Paltz, which the plaintiffs contended were illustrative of the administration's policies toward women as a whole.4

On rebuttal, the defendants sought to negate the plaintiffs' claims through the specific testimony of University administrators and faculty members, and other types of anecdotal evidence. The defendants contended that such evidence demonstrated that there were nondiscriminatory reasons for all of the actions taken by the University during the period in question which negatively affected its female faculty members, and that none of the employment practices at issue were motivated by discriminatory animus.

After reviewing the anecdotal evidence, the district judge held that the plaintiffs had not proven their Title VII claims against the University. Although she found that the anecdotal evidence supported an inference of prima facie discrimination in a few of the individual class members' cases, in each of those cases she either accepted the defendants' explanations for the pay disparities, or found that the isolated incidents of discrimination were insufficient to support the class' claim of a pattern or practice of gender discrimination. Accordingly, the district court entered judgment in favor of defendants on all of the Title VII claims.

On appeal, the appellants contend that the district court erred in its treatment and analysis of the evidence in several key respects, and that as a result, the court's finding of no discrimination was erroneous. First, appellants challenge the district court's determination that the statistical evidence was inconclusive. Appellants contend that the statistical evidence adduced at trial was more than sufficient to establish a prima facie case of gender discrimination as a matter of law. Moreover, they also contend that the district judge's decision to allow allegedly "tainted" variables such as "rank" to be used in the multiple regression analyses minimized the overall impact of defendants' alleged discriminatory treatment of female faculty members, and resulted in weaker statistical proof. Appellants also take issue with the district court's rejection of the proffered anecdotal evidence of discrimination. Finally, appellants contend that the district court erroneously excluded or ignored evidence of pre-Title VII discrimination, in contravention of the Supreme Court's decision in Bazemore v. Friday, 478 U.S. 385, 106 S.Ct. 3000, 92 L.Ed.2d 315 (1986).

For the reasons that follow, we hold that Judge Kram did not clearly err in finding in favor of the defendants, and we affirm the decision of the district court.

DISCUSSION

We begin by noting that the district court correctly stated the familiar legal standards to be applied in Title VII cases. Since the plaintiff class herein had raised a "disparate treatment" claim under Title VII, the claimants bore the burden of not only establishing discriminatory intent on the part of SUNY administrators, but that "unlawful discrimination [was] a regular procedure or policy followed by [the University]." International Bhd. of Teamsters v. United States, 431 U.S. 324, 360, 97 S.Ct. 1843, 1867, 52 L.Ed.2d 396 (1977); see Coser v. Moore, 739 F.2d 746, 749 (2d Cir.1984) ("In order to prevail on their claim of a pattern and practice of discrimination, plaintiffs had to show by a preponderance of the evidence that [the defendant]'s 'standard operating procedure--the regular rather than the unusual practice' is to discriminate on the basis of sex.") (quoting Teamsters, 431 U.S. at 336, 97 S.Ct. at 1855).

The Supreme Court has established a three-step process for evaluating disparate treatment claims brought pursuant to Title VII. McDonnell Douglas Corp. v. Green, 411 U.S. 792, 802-04, 93 S.Ct. 1817, 1824-25, 36 L.Ed.2d 668 (1973); Woodbury v. New York City Transit Auth., 832 F.2d 764, 769 (2d Cir.1987). First, the plaintiffs bear the burden of establishing a prima facie case of discrimination by a preponderance of the evidence, McDonnell Douglas Corp., 411 U.S. at 802, 93 S.Ct. at 1824, which gives rise to a rebuttable presumption of unlawful discrimination, Texas Dep't of Community Affairs v. Burdine, 450 U.S. 248, 254 & n. 7, 101 S.Ct. 1089, 1094 & n. 7, 67 L.Ed.2d 207 (1981). Once they have done so, the burden of production shifts to the defendants to "articulate some legitimate, nondiscriminatory reason" for the challenged employment practice. McDonnell Douglas Corp., 411 U.S. at 802, 93 S.Ct. at 1824. In cases where the plaintiff has relied on statistical evidence to establish a prima facie case of discrimination, the defendants may also attempt to undermine the plaintiffs' prima facie case by attacking the validity of that statistical evidence, or by introducing statistical evidence of their own showing that the challenged practice did not result in disparate treatment. See Berger v. Iron Workers Reinforced Rodmen Local 201, 843 F.2d 1395, 1412 (D.C.Cir.1988) (citing Teamsters, 431 U.S. at 360, 97 S.Ct. at 1867); Coble v. Hot Springs School Dist. No. 6, 682 F.2d 721, 730 (8th Cir.1982). If the defendants meet this burden, the plaintiffs must then show either that the defendants' statistical proof is inadequate, or that the defendants' explanation for the challenged practice is merely a pretext for discrimination. See Zahorik v. Cornell Univ., 729 F.2d 85, 92 (2d Cir.1984).

Even though the burden of production shifts to the defendants during the second stage of the process, the ultimate burden of persuasion rests always with the plaintiffs to prove their claims of discrimination. See United States Postal Serv. Bd. of Governors v. Aikens, 460 U.S. 711, 716, 103 S.Ct. 1478, 1482, 75 L.Ed.2d 403 (1983); Burdine, 450 U.S. at 253, 101 S.Ct. at 1093.5 As stated earlier, appellants contend herein that they met their burden of proof, and that the evidence at trial clearly established a pattern and practice of discrimination by University administrators with respect to faculty rank and salary. We will consider plaintiffs' principal claims of error seriatim, beginning with the district court's treatment of the statistical evidence adduced at trial.

A. Significance of Plaintiffs' Statistical Evidence

At trial, the plaintiffs herein contended that the statistical evidence alone was sufficient to establish a prima facie case of discrimination. According to plaintiffs, female faculty members were clearly treated less favorably than their male counterparts, and that unfavorable, disparate treatment was due solely to gender bias. The district court, however, found that the plaintiffs' statistical evidence was not "statistically significant" enough to establish a prima facie case of discrimination. For the reasons that follow, we conclude that the district court did not clearly err in ruling that the plaintiffs' proffered statistical evidence was not dispositive of their Title VII claims.

As discussed earlier, plaintiffs in a disparate treatment case frequently rely on statistical evidence to establish that there is a disparity between the predicted and actual treatment of employees who are members of a disadvantaged group, and to argue that such disparities exist because of an unlawful bias directed against those employees. Not all disparities, however, are probative of discrimination. Before a deviation from a predicted outcome can be considered probative, the deviation must be "statistically significant."

Statistical significance is a measure of the probability that a disparity is simply due to chance, rather than any other identifiable factor. See Segar, 738 F.2d at 1282; see also Castaneda v. Partida, 430 U.S. 482, 496 n. 17, 97 S.Ct. 1272, 1281 n. 17, 51 L.Ed.2d 498 (1977); Mister v. Illinois Cent. Gulf R. Co., 832 F.2d 1427, 1430-31 (7th Cir.1987), cert. denied, --- U.S. ----, 108 S.Ct. 1597, 99 L.Ed.2d 911 (1988). Because random deviations from the norm can always occur, see Frazier v. Consolidated Rail Corp., 851 F.2d 1447, 1451-52 (D.C.Cir.1988); Palmer, 815 F.2d at 91; Mister, 832 F.2d at 1430-31; Fisher, supra, at 705, statisticians do not consider slight disparities between predicted and actual results to be statistically significant. See Eubanks v. Pickens-Bond Constr. Co., 635 F.2d 1341, 1347-48 (8th Cir.1980); Fisher, supra, at 706. As the disparity between predicted and actual results becomes greater, however, it becomes less likely that the deviation is a random fluctuation. When the probability that a disparity is due to chance sinks to a certain threshold level, statisticians can then infer from the statistical evidence, albeit indirectly, that the deviation is attributable to some other cause unrelated to mere chance. See D. Baldus & J. Cole, supra, Sec. 9.42, at 191-93 (Supp.1987); Berger, 843 F.2d at 1412.

One unit of measurement used to express the probability that an observed result is merely a random deviation from a predicted result is the "standard deviation." D. Baldus & J. Cole, supra, Sec. 2.224, at 63; id. Sec. 9.03, at 294-95; see also Berger, 843 F.2d at 1412; Coates v. Johnson & Johnson, 756 F.2d 524, 536 n. 11 (7th Cir.1985). The standard deviation "is a measure of spread, dispersion or variability of a group of numbers." D. Baldus & J. Cole, supra, at 359. Generally, the fewer the number of standard deviations that separate an observed from a predicted result, the more likely it is that any observed disparity between predicted and actual results is not really a "disparity" at all but rather a random fluctuation. Conversely, "[t]he greater the number of standard deviations, the less likely it is that chance is the cause of any difference between the expected and observed results." Coates, 756 F.2d at 536 n. 11; see Woodbury, 832 F.2d at 770; Segar, 738 F.2d at 1282. A finding of two standard deviations corresponds approximately to a one in twenty, or five percent, chance that a disparity is merely a random deviation from the norm, and most social scientists accept two standard deviations as a threshold level of "statistical significance." See Castaneda, 430 U.S. at 496 n. 17, 97 S.Ct. at 1281 n. 17; Melani v. Board of Higher Educ., 561 F.Supp. 769, 774 (S.D.N.Y.1983); Cooper v. University of Tex. at Dallas, 482 F.Supp. 187, 194 (N.D.Tex.1979) (citing N. Nie, C. Hull, J. Jenkins, K. Steinbrenner & D. Bent, Statistical Package for the Social Sciences 222 (2d ed. 1975)), aff'd, 648 F.2d 1039 (5th Cir.1981) (per curiam). When the results of a statistical analysis yield levels of statistical significance at or below the 0.05 level, chance explanations for a disparity become suspect, and most statisticians will begin to question the assumptions underlying their predictions.

Cognizant of the important role that statistics play in disparate treatment cases, the Supreme Court has held that "[w]here gross statistical disparities can be shown, they alone may in a proper case constitute prima facie proof of a pattern or practice of discrimination." Hazelwood School Dist. v. United States, 433 U.S. 299, 307-08, 97 S.Ct. 2736, 2741-42, 53 L.Ed.2d 768 (1977). The threshold question in disparate treatment cases, then, is: "[A]t what point is the disparity in selection rates ... sufficiently large, or the probability that chance was the cause sufficiently low, for the numbers alone to establish a legitimate inference of discrimination[?]" Palmer, 815 F.2d at 92 (emphasis added); see Frazier, 851 F.2d at 1452. In answer to this question, "most courts follow the conventions of social science which set 0.05 as the level of significance below which chance explanations become suspect." D. Baldus & J. Cole, supra, Sec. 9.02, at 291; see, e.g., Castenada, 430 U.S. at 496 n. 17, 97 S.Ct. at 1281 n. 17; Frazier, 851 F.2d at 1452; Berger, 843 F.2d at 1412; Segar, 738 F.2d at 1262. The existence of a 0.05 level of statistical significance indicates that it is fairly unlikely that an observed disparity is due to chance, and it can provide indirect support for the proposition that disparate results are intentional rather than random.6 By no means, however, is a five percent probability of chance (or approximately two standard deviations) considered an "exact legal threshold." Palmer, 815 F.2d at 92.

In the present case, the three salary studies which the district court considered most probative of a pattern or practice of discrimination produced a range of standard deviations between approximately one and five, and of the total thirty-three standard deviation measures cited, twenty-four exceeded two standard deviations.7 Significantly, however, nine of the measures cited fell below two standard deviations. Also, the negative residuals associated with being female were not significant in every year of the liability period.

Given the range of standard deviations associated with their salary regressions, the plaintiffs contended that the statistical evidence clearly gave rise to a presumption of discrimination. As discussed earlier herein, however, although the district judge found the studies to be "persuasive," she nevertheless held that these levels of "statistical significance" alone were "not sufficiently high to support a prima facie claim of salary discrimination." Ottaviani, 679 F.Supp. at 309.

On appeal, appellants argue inter alia that, as a matter of law, a finding of two standard deviations should be equated with a prima facie case of discrimination. According to appellants, the district court therefore erred in finding that they had not met their burden of establishing a prima facie case. In support of this argument, appellants point out that several courts have accepted two standard deviations as prima facie proof of discrimination. See, e.g., Berger, 843 F.2d at 1412 ("if the likelihood that a fluctuation from expected results occurred by chance is five percent or less, a statistically significant difference is proved, and a prima facie case of discrimination is established") (citing Segar, 738 F.2d at 1282-83); Eldredge v. Carpenters 46 N. Cal. Counties JATC, 833 F.2d 1334, 1340 n. 8 (9th Cir.1987) (. 045 level of statistical significance (approximately two standard deviations or 1 chance in 22) sufficient to give rise to an inference that discriminatory system rather than chance is responsible for women's lower admission rates to apprenticeship program), cert. denied, --- U.S. ----, 108 S.Ct. 2857, 101 L.Ed.2d 894 (1988); Dalley v. Michigan Blue Cross/Blue Shield, Inc., 612 F.Supp. 1444, 1451 n. 18 (E.D.Mich.1985) ("Most courts and commentators have accepted the .05 level," or 1 in 20 probability, as indicative of statistical significance). While appellants' argument that a finding of two standard deviations should be equated with a prima facie case of discrimination under Title VII is not without initial appeal, we are constrained to reject such a formal "litmus" test for assessing the legitimacy of Title VII claims.

It is certainly true that a finding of two to three standard deviations can be highly probative of discriminatory treatment. See Segar, 738 F.2d at 1282-83. As tempting as it might be to announce a black letter rule of law, however, recent Supreme Court pronouncements instruct that there simply is no minimum threshold level of statistical significance which mandates a finding that Title VII plaintiffs have made out a prima facie case. See, e.g., Watson v. Fort Worth Bank & Trust, --- U.S. ----, 108 S.Ct. 2777, 2789 n. 3, 101 L.Ed.2d 827 (1988) ("We have emphasized the useful role that statistical methods can have in Title VII cases, but we have not suggested that any particular number of 'standard deviations' can determine whether a plaintiff has made out a prima facie case in the complex area of employment discrimination."); see also Palmer, 815 F.2d at 92 (noting that Supreme Court has not established "an exact legal threshold at which statistical evidence, standing alone, establishes an inference of discrimination"); Coser v. Moore, 739 F.2d at 754 n. 3 (a significance level of 5% probability of chance "has no talismanic importance"); EEOC v. American Nat'l Bank, 652 F.2d 1176, 1192 (4th Cir.1981) ("courts of law should be extremely cautious in drawing any conclusions from standard deviations in the range of one to three"), cert. denied, 459 U.S. 923, 103 S.Ct. 235, 74 L.Ed.2d 186 (1982); D. Baldus & J. Cole, supra, Sec. 9.4, at 188-89 (Supp.1987) (courts should use tests of statistical significance only as "an aid to interpretation" and not as a "rule of law"). Accordingly, in accordance with Supreme Court pronouncements, we must reject appellants' suggestion that this court announce a rule of law with respect to what level of statistical significance automatically gives rise to a rebuttable presumption of discrimination.

Moreover, as a practical matter, the issue of whether or not two or more standard deviations establish a prima facie case of discrimination is not pivotal in this case. It is well-established that "once a Title VII case has been 'fully tried on the merits,' the question whether the plaintiff has established a prima facie case 'is no longer relevant.' " Mitchell v. Baldrige, 759 F.2d 80, 83 (D.C.Cir.1985) (quoting Aikens, 460 U.S. at 714-15, 103 S.Ct. at 1481-82). As the Supreme Court stated in Bazemore v. Friday, "the only issue to be decided at that point is whether the plaintiffs have actually proved discrimination." 478 U.S. at 398, 106 S.Ct. at 3008 (emphasis added). Despite appellants' efforts to deemphasize on appeal what transpired at trial, the district court herein actually proceeded as though the plaintiffs had made out a prima facie case of discrimination, and it cannot be gainsaid that the plaintiffs' claims were fully litigated on the merits.

At the close of the plaintiffs' case-in-chief, the defendants moved to dismiss the action specifically on the grounds that plaintiffs had not met their burden of establishing a prima facie case. The court denied the motion without explanation, however, and directed the defendants to proceed with their case. Later, at the close of defendants' rebuttal case, the court also allowed the plaintiffs to respond to the defendants' rebuttal evidence. Accordingly, the question remaining for the district court at the close of the entire trial was whether the plaintiffs should ultimately prevail on their Title VII claims. See Bazemore, 478 U.S. at 398, 106 S.Ct. at 3008. Since the plaintiffs fully litigated those claims, any arguments on appeal directed at whether or not plaintiffs established a prima facie case are arguably misleading and misplaced. See EEOC v. Sears, Roebuck & Co., 839 F.2d at 312 n. 9.

The net import of Judge Kram's rulings regarding the significance of plaintiffs' statistical evidence is that she found the evidence to be "persuasive" but not dispositive. Contrary to appellants' assertions, it is clear from the district judge's rulings that she did not simply ignore the statistical evidence of discrimination presented by plaintiffs. The court found this evidence sufficient to cause her to deny the defendants' motion to dismiss at the end of plaintiffs' case, and to accept rebuttal evidence from the defendants. On rebuttal, however, the defendants were able to successfully undermine the plaintiffs' case by attacking the validity of the plaintiffs' statistical evidence, and by introducing statistical evidence of their own to negate the inference of discrimination that had been raised. Cf. Berger, 843 F.2d at 1416 ("Mere conjecture or general assertions of inadequacies in the opponent's statistical case, without demonstrating their effect on the results, will not suffice."); Sobel, 839 F.2d at 34 (same).

Specifically, the defendants criticized the plaintiffs' most probative studies for excluding one factor which they claimed exerted a "highly significant positive influence[ ] on current salary," namely, whether a faculty member had held a prior, full-time administrative position at SUNY New Paltz before returning to full-time teaching. The defendants also criticized these studies because the salary regressions were "fitted" only to male faculty members, i.e., they used independent variables that were derived only from the male population. The district court noted in its opinion that a "males only regression" based exclusively on values existing only in the male population might have tended to overestimate the predicted salaries of certain female faculty members, because it might not have taken into account legitimate factors existing solely in the female population which could have affected the rate of pay for women teachers at the University. See Ottaviani, 679 F.Supp. at 307. If the predicted salary for a female faculty member was overestimated, this type of regression arguably would have overestimated the discrepancies between male and female salaries at the University. Finally, the defendants criticized these studies because they inappropriately aggregated Instructors and Assistant Professors into a single "rank." The defendants pointed out at trial that when the two ranks were combined into a single rank, the predicted salary of a female Instructor would essentially be based on the higher salary of an Assistant Professor, and hence the net residual difference between the predicted and actual salaries of a female Instructor would be overstated. Apart from these criticisms of plaintiffs' statistical evidence, the defendants also offered persuasive anecdotal evidence to negate the plaintiffs' claims of discriminatory animus. After considering all of the evidence presented, both statistical and anecdotal, the district court simply found that plaintiffs had failed to preponderate on their claims.

Recent Supreme Court precedent has made it clear that this court can reverse such a factual determination "only if it is clearly erroneous in light of all the evidence in the record or if it rests on legal error." Palmer, 815 F.2d at 101 (citing, inter alia, Bazemore, 478 U.S. 385, 106 S.Ct. 3000). Especially in cases where statistical evidence is involved, " 'great deference is due the district court's determination of whether the resultant numbers are sufficiently probative of the ultimate fact in issue.' " EEOC v. Sears, Roebuck & Co., 839 F.2d at 310 (citation omitted); see Griffin v. Board of Regents of Regency Univ., 795 F.2d 1281, 1289-90 (7th Cir.1986). As the Supreme Court cautioned in the Teamsters case, "statistics are not irrefutable; they come in infinite variety and, like any other kind of evidence, they may be rebutted. In short, their usefulness depends on all of the surrounding facts and circumstances." 431 U.S. at 340, 97 S.Ct. at 1857. The district judge herein gave due consideration to all of the evidence presented, and after reviewing the record, we do not perceive a convincing basis for finding her interpretation of that evidence to be clearly erroneous. Accordingly, we affirm her rulings with respect to the statistical evidence presented.

B. Use of Rank Variables

In conjunction with their attack on the district court's assessment of the sufficiency of plaintiffs' statistical evidence, appellants also challenge the district court's determination that "rank" was an appropriate factor to consider in assessing pay disparities between male and female faculty members. According to appellants, if the court had rejected the rank variables and considered only those salary studies which excluded rank, then the number of standard deviations associated with their findings of discrimination would have been much greater, and their statistical proof would have been even more probative.

Although we recognize that the use of rank variables in testing for salary discrimination against women faculty members is not universally accepted, see Finkelstein, The Judicial Reception of Multiple Regression Studies in Race and Sex Discrimination Cases, 80 Colum.L.Rev. 737, 741-42 (1980); D. Baldus & J. Cole, supra, Sec. 8.23, at 113-14 (Supp.1987), in Sobel v. Yeshiva University, this court specifically upheld the use of rank variables in a multiple regression analysis, stating that rank could be used as a legitimate factor in explaining pay disparities so long as rank itself was clearly not tainted by discrimination, 839 F.2d at 35. As the plaintiffs' statistical expert, Dr. Mary Gray, explained in her own report: "In a bias-free system, one could use rank as a measure of productivity since the review process for promotion or hire should evaluate teaching, scholarship and service." (Emphasis added). See D. Baldus & J. Cole, supra, Sec. 8.2, at 114. The question to be resolved, then, in cases involving the use of academic rank factors, is whether rank is tainted by discrimination at the particular institution charged with violating Title VII. Although appellants reiterate on appeal their claim that rank at New Paltz was tainted, it is clear that the district judge accepted and considered evidence from the parties on both sides of this issue, and that she rejected the plaintiffs' contentions on this point.

At trial, the plaintiffs failed to adduce any significant statistical evidence of discrimination as to rank. As the district court stated in its opinion, the plaintiffs' studies of rank, rank at hire, and waiting time for promotion "were mere compilations of data" which neither accounted for important factors relevant to assignment of rank and promotion, "nor demonstrated that observed differences were statistically significant." Ottaviani, 679 F.Supp. at 306. The defendants, on the other hand, offered persuasive objective evidence to demonstrate that there was no discrimination in either placement into initial rank or promotion at New Paltz between 1973 and 1984, and the district court chose to credit the defendants' evidence. Upon review of the record, we cannot state that the court's rulings in this regard were clearly erroneous. Accordingly, the district court's decision to focus primarily on those studies which included rank as an essential independent variable was not improper, and appellants' contentions to the contrary must be rejected. See Presseisen v. Swarthmore College, 442 F.Supp. 593, 614, 619 (E.D.Pa.1977) (inclusion of rank variable appropriate when evidence showed no discrimination with respect to hiring and promotion), aff'd mem., 582 F.2d 1275 (3d Cir.1978); see also EEOC v. Sears, Roebuck & Co., 839 F.2d at 327 (court's decision to focus generally on those regression analyses which did not omit "major factors" was proper); Rossini v. Ogilvy & Mather, Inc., 798 F.2d 590, 603-04 (2d Cir.1986) (trial court's reliance on studies which incorporated controversial variables not clearly erroneous, where court's decision came after "extensive testimony from experts on both sides of the issue").

C. Anecdotal Evidence

Appellants also contend on appeal that the district court did not give sufficient weight to the anecdotal evidence adduced at trial, and that the court should have rejected the explanations proffered by University administrators to explain pay and rank inequities as "pretextual." Our review of the anecdotal evidence, however, is limited to ascertaining whether the district judge committed clear error in making her findings. See Anderson v. City of Bessemer, 470 U.S. 564, 573, 105 S.Ct. 1504, 1511, 84 L.Ed.2d 518 (1985); Pullman-Standard v. Swint, 456 U.S. 273, 287, 102 S.Ct. 1781, 1789, 72 L.Ed.2d 66 (1982). It is not the function of this court to reweigh the evidence anew, particularly when findings by a district court are based on in-court credibility determinations. Anderson, 470 U.S. at 575, 105 S.Ct. at 1512. Rather, under the clearly erroneous standard, we may only reject findings by the trial court when we are left with the "definite and firm conviction that a mistake has been committed." United States v. United States Gypsum Co., 333 U.S. 364, 395, 68 S.Ct. 525, 542, 92 L.Ed. 746 (1948).

In this case, the district court found that the defendants had successfully rebutted the plaintiffs' anecdotal proof, and that, in any event, the anecdotal evidence on its face was too limited to prove class-wide discrimination. After reviewing the entire record, we do not think that the court's decision to credit the testimony of the defendants rather than that of the plaintiffs was clearly erroneous. Since the district court's "account of the evidence is plausible in light of the record viewed in its entirety," we may not overturn the findings of the court even if we might "have weighed the evidence differently," had we been sitting as the trier of fact. Anderson, 470 U.S. at 573-74, 105 S.Ct. at 1510-11. Accordingly, we affirm the findings of the district court with respect to the anecdotal evidence presented.

D. Bazemore Claim

In Bazemore v. Friday, 478 U.S. 385, 106 S.Ct. 3000, 92 L.Ed.2d 315 (1986), the Supreme Court held that employers have an obligation to eradicate employment discrimination that began prior to the effective date of Title VII (1972), if the discrimination continues into the post-1972 liability period. Id. at 397, 106 S.Ct. at 3007. The Supreme Court also stated that statistical evidence of pre-Act discrimination can be probative of ongoing, post-Act discrimination. Id. at 402, 106 S.Ct. at 3010.

On appeal, appellants contend that the district court erroneously excluded evidence of pre-Act discrimination in violation of the Supreme Court's dictates in Bazemore. In particular, appellants claim that the district judge improperly excluded Exhibit 990, which purported to document statistically significant evidence of discrimination as to initial faculty rank. This claim is without merit, however. At trial, the defendants objected to the admission of Exhibit 990 not because it was offered to prove pre-Act discrimination, but because it was unreliable and incomplete. While the weakness of statistical evidence should not ordinarily preclude its admission, see Bazemore, 478 U.S. at 400, 106 S.Ct. at 3009, the Supreme Court has recognized that some statistical evidence may be so unreliable as to be irrelevant, see id. at 400 n. 10, 106 S.Ct. at 3009 n. 10; see also Penk v. Oregon State Bd. of Higher Educ., 816 F.2d 458, 465 (9th Cir.) ("Bazemore ... does not give blanket approval to the introduction of all evidence derived from multiple regression analyses."), cert. denied, --- U.S. ----, 108 S.Ct. 158, 98 L.Ed.2d 113 (1987). Apparently the district judge herein thought that to be the case with respect to this particular exhibit, because she sustained the defendants' objection to its admission on the grounds that it was irrelevant and unduly confusing. Upon review of the record, we do not find the district court's decision to exclude the study to be clearly erroneous, and therefore we affirm the evidentiary ruling.

Moreover, we note that appellant's reliance on this court's decision in Sobel v. Yeshiva University as support for their more generalized, Bazemore -type claims is misplaced. In Sobel, the plaintiffs introduced evidence specifically designed to prove that women were discriminated against prior to the effective date of Title VII, and argued that "Yeshiva had a legal obligation to equalize women's salaries immediately upon application of Title VII to universities." 839 F.2d at 27. In the present case, even though the Supreme Court handed down its decision in Bazemore the same month that plaintiffs' trial was commenced, the plaintiffs did not introduce any statistical evidence of substance to prove that there was discrimination at New Paltz prior to the effective date of Title VII. Instead, nearly all of the plaintiffs' studies focused on the class liability period, which covered the years 1973 to 1984. This is in marked contrast to Sobel and Bazemore, wherein the plaintiffs offered direct, independent proof of pre-Act discrimination. See Bazemore, 478 U.S. at 401, 106 S.Ct. at 3009; Sobel, 839 F.2d at 30. Accordingly, we find appellants' arguments on this point generally to be without merit.

CONCLUSION

In sum, the burden of persuasion was on the plaintiffs to prove by a preponderance of the evidence that there was a pattern or practice of discrimination at SUNY New Paltz, and they failed to meet that burden. We have considered all of the arguments presented on appeal, and find them to be without merit. For the reasons stated above, the judgment of the district court is affirmed.

The district court ruled in favor of plaintiff-intervenor Harriet Klapper on her Equal Pay Act claim. That decision is not challenged on appeal by defendants, and appellants rely on the favorable ruling only insofar as it supports their class-wide claims of discrimination

Another way in which statisticians can measure the influence of gender on a particular employment decision is by using gender as one of the independent variables in a regression analysis. For each independent variable in a multiple regression analysis, the statistician calculates a coefficient, which is a measure of the effect that the variable has on the dependent variable being examined. If the regression coefficient for gender is sufficiently large, then it is probative of the impact that gender has on the employment decision at issue. D. Baldus & J. Cole, supra, Secs. 8.01 to 8.02, at 240-45; see, e.g., Segar, 738 F.2d at 1261-62

There are four types of "academic rank" at New Paltz: (1) professor, (2) associate professor, (3) assistant professor, and (4) instructor. Faculty members in one of these academic ranks either hold tenure or are on a "tenure track."

The plaintiffs presented testimonial evidence of discrimination against: three New Paltz faculty members who were neither class members nor individual plaintiffs (i.e., Nancy Schniedewind, Susan Puretz, and Sheila Schwartz); four individual members of the class who had not brought individual Title VII claims (i.e., Susan Lehrer, Barbara Scott, Johanna Sayre, and Samantha Joe Mullen), and four members of the class who had asserted individual Title VII claims as well (i.e., Roberta Ottaviani, Dorothy Jessup, Joan Marie de la Cova, and Carolee Schneemann)

The Supreme Court's recent decision in Price Waterhouse v. Hopkins, --- U.S. ----, 109 S.Ct. 1775, 104 L.Ed.2d 268 (1989), does not affect our analysis of these plaintiffs' claims. See --- U.S. at ----, 109 S.Ct. at 1788 ("the situation before us is not the one of 'shifting burdens' that we addressed in Burdine ")

The commentators are careful to point out, however, that no matter how great the number of standard deviations is, statistical tests can never entirely rule out the possibility that chance caused the disparity. See Palmer, 815 F.2d at 91; D. Baldus & J. Cole, supra, Sec. 9.42, at 191-93 (Supp.1987)

As discussed supra, two standard deviations corresponds roughly to a 1 in 20 chance that the outcome is a random fluctuation. Three standard deviations corresponds to approximately a 1 in 384 chance of randomness. Finally, a range of four to five standard deviations corresponds to a probability range of 1 chance in 15,786 to 1 chance in 1,742,160. M. Abramowitz & I. Steigan, Handbook of Mathematical Functions, National Bureau of Standards, U.S. Government Printing Office, Applied Mathematics Series No. 55 (1966) (Tables 26.1, 26.2)

Related Cases