Dickerson v. United States Steel Corp.

472 F.Supp. 1304 (1978)

Moses DICKERSON et al.
v.
UNITED STATES STEEL CORP. et al.
Curtis C. WORTHY et al.
v.
UNITED STATES STEEL CORP. et al.

Civ. A. Nos. 73-1292, 74-2689.

United States District Court, E. D. Pennsylvania.

August 2, 1978.

*1305 *1306 Prather G. Randle, PILCOP, Alice Ballard, Philadelphia, Pa., for plaintiffs.

Henry T. Reath, Duane, Morris & Heckscher, Philadelphia, Pa., S. G. Clark, Gen. Counsel, U. S. Steel Corp., Pittsburgh, Pa., for defendants.

Robert Weinberg, Julia Penny Clark, Washington, D. C., for union.

MEMORANDUM AND ORDER

NEWCOMER, District Judge.

After a protracted 85-day trial, this Court has finally reached the conclusion of the liability stage of this employment discrimination case. Plaintiffs, a class of production and maintenance (P & M) workers, are suing their employer, United States Steel (USS) and their unions (Union) for alleged violations of Title VII, 42 U.S.C. §§ 1981, 1985(3), and 2000e et seq. At the close of plaintiffs' evidence, the Court issued a lengthy opinion dated July 25, 1977. Dickerson v. United States Steel, 439 F.Supp. 55 (E.D.Pa.1977). In that opinion, the Court dismissed certain class claims and enunciated certain factual and legal conclusions in support. Those dismissals are incorporated by reference into this final opinion. There remain for decision four class claims of discrimination: initial assignment, access to management, access to crafts, and transfer to new facilities. Furthermore, the Court must decide the claims of the two named class representatives, Moses Dickerson and Eddie Williams, as well as the related case of Curtis Worthy.

*1307 Since many of the procedural issues were discussed in July, and no significant new facts or arguments presented, the Court need not reiterate all the cases and legal arguments here. Defendant USS has moved for the fourth time since the class was certified for decertification. The Court will deny this without further discussion other than to refer to its numerous previous opinions.

In the July opinion, the Court also made findings of facts and conclusions of law concerning the issues of jurisdiction and standing to represent the class. The Court has examined the arguments of defendants on these issues and finds no persuasive new argument raised. Therefore, the Court includes the discussion on jurisdiction from that opinion in its final findings and conclusions for purposes of this opinion. However, a recent decision by this circuit's Court of Appeals has shed further light on these issues. On the issue of standing, the Court notes that the holding of the Court of Appeals in Hicks v. Abt, Inc., 572 F.2d 960 (3d Cir. 1978), supports this Court's earlier conclusion that the individual plaintiffs can represent all the claims presented in the class action. In Hicks, the appellate court ruled that the scope of a Title VII action is controlled not only by the charges actually included in an EEOC charge, but those which could have "reasonably been expected to grow" out of the charges made. Since the charges of Eddie Williams and Moses Dickerson involved a large number of issues, this Court finds that the EEOC could reasonably have been expected to launch a full-scale investigation of all of USS' employment practices as regards race, as well as the Union's possible involvement in them. Therefore, even as to areas not the subject of specific charges by Williams and Dickerson, this Court concludes that Hicks sanctions the plaintiffs' role as representatives in this broad-ranging class action. Furthermore, Hicks lends support to the Court's allowance of the equitable tolling concept on jurisdiction. Hicks had filed a complaint with another federal agency. The appellate court equated this complaint with a "charge" under § 704(a) of Title VII, which protects an employee from being fired for seeking a remedy for violations of the Act. This Court believes that under Hicks, as read with the cases relied upon earlier, 439 F.Supp. at 68-69, it has equitable jurisdiction over this case for the period dating from Dickerson's March, 1970 letter to the Department of Labor.

Finally, the Court incorporates by reference its discussion of the burden of proof in this action. As noted below, in a specific discussion of proof on the testing issue, the Court has carefully considered the arguments presented at this stage by all the parties. In choosing to remain with the July statement of law, the Court specifically rejects USS' argument that Title VII requires proof of discriminatory intent. In support of this position, the Court cites Washington v. Davis, 426 U.S. 229, 96 S.Ct. 2040, 48 L.Ed.2d 597 (1976); General Electric v. Gilbert, 429 U.S. 125, 97 S.Ct. 401, 50 L.Ed.2d 343 (1976) and Teamsters v. United States, 431 U.S. 324, 97 S.Ct. 1843, 52 L.Ed.2d 396 (1977). Recent decisions of circuit courts also support this conclusion. Richardson v. Penna. Dept. of Health, 561 F.2d 489 (3d Cir. 1977); Rule v. Int'l Association of Bridge Workers, 568 F.2d 558 (8th Cir. 1977); Davis v. County of Los Angeles, 566 F.2d 1334 (9th Cir. 1977); United States v. City of Chicago, 573 F.2d 416 (7th Cir. 1978); James v. Stockham Valves, 559 F.2d 310 (5th Cir. 1977). The Court feels that these cases, many of which refer to the Congressional purpose not to require plaintiffs to prove intent, adequately deal with USS' argument on the legislative history of Title VII in general. Furthermore, they all support the proposition of Griggs v. Duke Power, 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), that facially neutral practices having a disparate impact are violations of Title VII. The Court has not found a single case holding to the contrary.

Defendant USS, in a last-minute brief, has argued that the recent Supreme Court decisions in Furnco Construction Corp. v. Waters, 438 U.S. 567, 98 S.Ct. 2943, 57 L.Ed.2d 957 (1978) and University of California Regents v. Bakke, 438 U.S. 265, 98 S.Ct. 2733, 57 L.Ed.2d 750 (1976), compel this Court to hold that Title VII requires a *1308 showing of intent. This Court cannot find any language in those cases which would lead to such a holding. Bakke, decided under Title VI, did not reach the question of intent on the part of the employer. In Furnco, the Supreme Court dealt with the burden of proof in an individual suit under McDonnell-Douglas Corp. v. Green, 411 U.S. 792, 93 S.Ct. 1817, 36 L.Ed.2d 668 (1973). The Court described what is included in a prima facie case. The Supreme Court's opinion, in fact, supports this Court's conclusion that the plaintiffs need not prove intent.

"A prima facie case under McDonnell Douglas raises an inference of discrimination only because we presume these acts, if otherwise unexplained, are more likely than not based on the consideration of impermissible factors . . . And we are willing to presume this largely because we know from our own experience that more often than not people do not act in a totally arbitrary manner, without any underlying reasons, especially in a business setting. Thus when all legitimate reasons for rejecting an applicant have been eliminated as possible reasons for the employer's actions, it is more likely than not the employer, whom we generally assume acts only with some reason, based his decision on an impermissible consideration such as race." 438 U.S. at 577-578, 98 S.Ct. at 2949-2950.

Since the prima facie case need only raise an inference of discrimination, which can be rebutted by showing a permissible reason for the employment decision, it is clear that proof of intent need not enter into a Title VII case. The inference of impermissible motive which is shown by a disparity of treatment or impact is clearly enough to prove plaintiffs' case, if not met by defendants with rebuttal evidence.

The Court will now, in narrative form, present its findings of facts and conclusions of law on the remaining class and individual actions.

INITIAL ASSIGNMENT

The plaintiffs' first remaining claim is that blacks at Fairless are disproportionately assigned to certain jobs. This is not a claim of absolute segregation, but of stereotypical assignment to certain allegedly unpleasant or undesirable jobs. In the July opinion, the Court opined that, at that point, plaintiffs had established a prima facie case on this issue. Their case is primarily based on statistical analyses, the Matched Pair Study (MPS) and related corroborative and substantive studies. At that time, of course, the defendants had not yet presented any expert testimony attacking the statistics. Plaintiffs have argued that the defendants now have the burden of disproving plaintiffs' statistics with affirmative evidence, such as complete studies of their own. While doing this certainly would have simplified the Court's task, as in Presseisen v. Swarthmore College, 442 F.Supp. 593 (E.D.Pa.1977), the plaintiffs' statistics need only be shown incompetent and unreliable to defeat them. Markey v. Tenneco Oil Co., 439 F.Supp. 219 (E.D.La. 1977). Defending statistical proof is part and parcel of plaintiffs' prima facie case; the burden does not shift to a defendant for affirmative proof if the prima facie case is defeated by defendant's evidence. The plaintiffs' case does not become invulnerable after they complete their case-in-chief and survive a motion to dismiss. The Court must reexamine it at the close of all evidence to ascertain if it still meets the prima facie test. Croker v. The Boeing Co. (Vertol Division), 437 F.Supp. 1138, 1183 (E.D. Pa.1977); Neloms v. Southwestern Elec. Power Co., 440 F.Supp. 1353 (W.D.La.1977); EEOC v. E.I. duPont de Nemours and Co., 445 F.Supp. 223 (D.Del.1978). This point of view is also supported by the very recent decision of the Fifth Circuit Court of Appeals in EEOC v. Datapoint Corp., 570 F.2d 1264 (5th Cir. 1978). There, the EEOC presented a statistical comparison of the defendant's workforce with that of the surrounding community. That was the only evidence in the prima facie case. The trial judge concluded that the plaintiff had failed to establish a prima facie case. The appellate court upheld this conclusion, saying there was ample evidence to support it. The appellate court stated that the defendant's statistical expert had "cast grave doubt upon the credibility of the EEOC's *1309 statistics." 570 F.2d at 1270. His criticisms included some of those expressed in this case, such as inaccurate data base and possible distortion of the results by selective use of information. The court concluded that the defense expert's "general opinion was that the EEOC's statistician's conclusions were not justified and were not evidence of good statistical methodology." Ibid. Here, defendants have challenged the accuracy of statistics by presenting similar testimony. Therefore, the Court must now weigh all the evidence, including evidence presented by defendants' witnesses, to see if plaintiffs' prima facie statistical case remains intact now that all the proof is in.

The Court must conclude that it does not. As in Datapoint, defendants produced highly credible experts who severely criticized both the sampling and analysis of the MPS. Although all experts on both sides were very impressive and helpful witnesses, in addition to being outstanding in their fields, the Court must give greatest weight to the testimony of Dr. Paul Meier, USS' witness and Chairman of the University of Chicago's statistical department, and Dr. Samuel Shapiro, the Union's expert from the math-science department at University of Miami.[1] Although many of their criticisms were theoretical only, having not been affirmatively established by contradictory studies, the Court nevertheless finds that these criticisms mount up into a massive challenge to the study's credibility that cannot be ignored.[2] In United States v. International Union of Elevator Constructors, 538 F.2d 1012 (3d Cir. 1976), the Court of Appeals considered challenges to the reliability of statistics as an attack on the prima facie case, rather than as a portion of defendant's burden. There, the appellate court felt that the statistics' flaws were not so great as to defeat the initial proof. In the case before this Court however, the attacks on the statistics are so significant that the Court has determined that the statistical evidence is unreliable.[3] As the *1310 Supreme Court noted in Teamsters v. United States, supra, statistics are not irrefutable and must be analyzed as to their methodology's propriety and accuracy of execution. Dickerson, supra at p. 66.

The defendants' challenges to the MPS fall into two categories: the sample selection and the methodology of the study's conclusions. Based on the numerous attacks on the sampling, which determines whether or not a study of part can in fact reflect the reality of the whole, the MPS' reliability is very questionable. However, when the critiques of methods used for drawing conclusions are added onto the sampling criticisms, it becomes clear that the study must be rejected, and the related studies supported by the MPS then fail as well.

The study was designed by Dr. Peter E. Haimes, who was qualified by the Court to testify as an equal-employment opportunity specialist. However, he did not qualify as a statistician. He was initially assisted by plaintiffs' counsel, who apparently did not have a statistical background, although she proved to have gained an impressive grasp of this field as the trial progressed. No statistician aided in designing the original MPS sample or directed its implementation. The MPS was a theoretically proper way of studying initial assignments; Dr. Siskin admitted that. However, the improperly drawn sample, designed and executed by non-statisticians, eliminates its ability to establish the probative facts.

The first problem was that the selection rules were changed in the midst of drawing the sample, as of 1968. The study was theoretically designed so that each black had white "matches;" these persons could be compared as to seniority, being hired at about the same time. Since seniority was thus controlled for, the "matches" could be used to see if contemporaneous white and black hires were assigned to jobs on a racial basis. The selection rules first matched up to five whites for each black; halfway through the study, Dr. Haimes lowered this to only two matches per black. This change caused an imbalance which could have built back in the seniority bias. Both Drs. Siskin and Shapiro testified that the disproportion of the whites in the pre- and post-1968 periods[4] placed the weight of the white seniority in the earlier periods. Since the seniority, taken as a whole, seems to shift from one time period to the next, the Court cannot conclude that this variable was properly controlled. Since Dr. Haimes and defendants' experts all agreed that this variable must be eliminated in a study of initial assignments, and the Court concludes it was not properly controlled here, the MPS cannot be considered probative.[5]

Other studies, which were considered because their conclusions closely tracked the MPS results, do not even attempt to control for seniority. The "inactives" study (discussed below), and the Rod and Wire Mill and Pipe Mill studies, and other studies based on the Affirmative Action Program reports (AAP's) do not control for this factor. Since the MPS fails on this and other grounds, these other studies must also be disregarded as to the initial assignment issue. Plaintiffs argue that the AAP's roughly approximate where persons were initially assigned. All sides agree that the AAP's represent only a "snapshot" view of Fairless; they show where each employer was working on a given date, rather than precisely where they were initially assigned. A "rough approximation" is not enough alone to carry plaintiffs' burden.

To establish that the AAP studies reflected initial assignment, plaintiffs attempted *1311 to show that transfers were so rare that most employees stayed in their initial assignments. Plaintiffs introduced Appendix J, by which Dr. Litwin estimated that at most only 2½% of the employees transferred each year. Dr. Siskin testified that, based on a USS study, one out of three employees transferred from their departments between 1964 and 1975. After considering both of these analyses, the Court has determined that neither of them are reliable. Dr. Litwin's "estimate" was just that—a last minute effort made by the statistician to justify the AAP's.[6] Dr. Siskin's study suffers from an over-inclusive definition of "transfer," as plaintiffs made clear in cross-examination. The professor testified that he was not sure whether a temporary assignment counted as a "transfer" or not. In fact, Dr. Siskin was not at all sure how the term was defined for purposes of the study. Ruling out both these studies as unreliable, the Court is left only with the uncontrolled AAP's. Although in the July 25 opinion, the Court accepted the studies as "further evidence of initial assignment discrimination," the weight given to them and to Dr. Litwin's "study," was in large part dictated by the acceptance of the MPS, the results of which they corroborated. Standing alone as they do now, seniority controlled only by the weak proof of Dr. Litwin's estimate, the Court now feels that they are not competent evidence on the issue of initial assignment.

A second, and perhaps even more damning criticism of the MPS was the failure to sample the entire population, or universe, about which conclusions were drawn. The MPS was drawn from the Master Employee File (MEF) from 1975. The file contained the records of each employee then working at Fairless. Drawing the sample from that source meant, of course, that those employees who had left USS' employ prior to that date could not be included. While Dr. Haimes, who is not a statistician, seemed unaware of the problem that omission of those former employees presented, Dr. Litwin recognized it immediately and took steps to remedy it. He sought to study these "inactives" to complete an entire study of the universe.

A sample can only predict for those populations from which it is drawn. This basic statistical principle was explained by Dr. Meier and recognized by Judge Bechtle in Presseissen, supra. Since plaintiffs seek to prove that initial assignments are made on a discriminatory basis, they had to sample all initial assignments, not just those of still-employed individuals. The MPS failed to sample the inactives, and thus cannot really instruct the Court on the entire workforce. The "inactives" might possess different demographic characteristics than the MPS sample. For example, if whites and blacks were equally assigned to unpleasant or underpaid jobs, the whites might quit those jobs at a higher rate than blacks, since they might perceive better outside job opportunities than blacks, who might stay with the jobs rather than face unemployment due to discrimination elsewhere. Furthermore, a number of other variables might differentiate the distribution of initial assignments of the actives and inactives. In order to be able to extend the study's conclusions to the entire population, Dr. Litwin attempted to study the inactives, by drawing a sample from USS' termination file for the years 1970 through 1975.

This study was shown by the defendants to be improperly devised and insufficient to complete the sample of the universe. The sample was drawn by pulling white and black files from the USS termination files at prescribed intervals.[7] However, in prescribing the intervals, Dr. Litwin apparently *1312 either miscalculated or was not given the accurate total number of files. As a result, the white files were more heavily sampled in the early years, since the sampler went all the way through the file and started over in order to pull 200 whites' files. The blacks, however, were not entirely sampled, since 200 files were pulled before the end of the file was reached. Based only on Dr. Litwin's expertise in July, the Court concluded that the validity of the sample was not affected by this "wraparound." 439 F.Supp. at 79.[8] However, the Court now has the benefit of additional expert testimony, most notably that of Dr. Meier, and has decided that the sampling techniques used to devise Appendix M were improper. Dr. Litwin testified that he was not a specialist in sampling theory; plaintiffs required him to be a multi-purpose statistical witness and his testimony suffers from that pressure. Dr. Meier sharply criticized the over- and under-sampling of the two groups. This was echoed by Dr. Shapiro. He concluded that the 400 files were not a statistically competent sample of the inactives.

Second, Dr. Meier pointed out that Appendix M did not solve the problem of an incomplete sampling of the universe. The file used only contained those inactives whose termination dates were between 1970 and 1975. It did not study all those hired and terminated since 1952, when the plant opened. Therefore, Appendix M and the MPS sampled only two "frames" of the total initial assignments: those of persons still employed in 1975 (from the MEF) and those terminated from 1970 until 1975. It did not even sample the entire universe of initial assignments during the limitations period. Those frames were not tied in any clear way to the population being studied, to show that they all shared the same characteristics. This type of error, according to Dr. Meier is a classic statistical flaw, and accounted for the inaccurate predictions of Thomas Dewey's victory in 1948. Dr. Meier concluded that since the total number from which the two samples were drawn consisted of only one-half of all initial assignments made, plaintiffs should have made efforts to establish that these frames accurately reflected the entire universe. Any number of factors could cause this partial sample to be biased. For example, a shutdown of one unit might distort the inactives study. Furthermore, as Dr. Shapiro pointed out, the inactive years studied were not randomly selected. Therefore, even though Appendix M sampled a portion of the inactives, it cannot be considered a statistical sample of those years, since the time periods were not chosen randomly. Since plaintiffs failed to justify statistically this partial sampling of inactives, the Court cannot assume that these two frames can serve as a basis for drawing conclusions about the entire number of initial assignments.

Taking all these criticisms together, the Court has decided that Appendix M does not properly complete the study of the population. Plaintiffs argue that these criticisms are merely theoretical and may have only a small impact on the results. As Dr. Meier testified on cross, a theoretical bias might have a negligible effect on measured quantity, but a major impact upon a study's reliability. The Court concludes that Appendix M cannot be used to rescue the MPS from its fatal flaw of excluding the inactives.

Two other points of some significance should be dealt with on the issue of the reliability of the MPS sample. First, defendants performed an analysis which purported to show that the data base contained a 28.8% error rate in assigning individuals to clusters. On cross-examination, plaintiffs showed that some of the "errors" could be in fact disagreements about interpretation *1313 of the plaintiffs' data, and the definition of "clusters."[9] However, plaintiffs did not introduce any rebuttal evidence. At best, therefore, the Court can still find a significant error rate. While Dr. Shapiro agreed that there is no reason to believe that such errors are racially biased, slanting the study toward one direction more than another, the Court still feels that such errors cast further doubt on the reliability of the MPS. This is especially true when, as Dr. Shapiro testified, a small shift in any number would have a magnified effect on the statistical significance analyses. Defendants' other criticism that merits some discussion is the inclusion of the lower level crafts employees in the data on non-craft workers. Dr. Shapiro testified, as did Dr. Haimes, that the inclusion of crafts employees mixes the possible sources of discrimination of testing and initial assignment. This is because craftsmen have to pass certain tests, such as paper-and-pencil tests for the apprentices, which might in and of themselves be discriminatory. It was for this reason that Dr. Haimes originally decided to eliminate the crafts clusters, so that the issue of initial assignment discrimination would stand alone. However, the lower level crafts employees were not removed from the MPS by the plaintiffs' staff. Defendants claim that inclusion of this predominantly white group exaggerates the differences between any heavily black clusters and the remaining mass. Although this was proven by defendants to be true, it only changed the individual cluster statistics slightly. It did have greater changes on the overall picture, lessening but not eliminating the disproportions. If none of the other design flaws were present, the Court would not reject the MPS because of the crafts' inclusion. However, taken with all the other potential sources of distortion, this is simply one more reason to distrust the findings of the MPS.

Even if the Court were to assume that the MPS sample was proper in all respects, the methods used to reach conclusions appear now to be so suspect that the Court must reject them. At the time of the July 25 opinion, the Court was relying only on the plaintiffs' expertise. Here again, with the testimony of the defendants' experts to consider, the Court has weighed the evidence and has concluded that the plaintiffs have not carried their burden.

The plaintiffs' core contention was that blacks were disproportionately concentrated in five job areas—the Blast Furnace, the Open Hearth, Masonry, Janitorial, and Transportation and General Services. These clusters were "undesirable" by plaintiffs' definition, 439 F.Supp. at 77, n.19. However, they did not rank highest on the plaintiffs' undesirability ranking. The clusters were picked, by Dr. Haimes and plaintiffs' counsel, as the Court stated previously because they were large, active in hiring and heavily black. All three of defendants' experts criticized this selection as "post-hoc." This is a serious flaw in a statistical study, for it hints at data manipulation. This occurs when data is viewed after it is accumulated and further analyzed only as to its unique characteristics. This distorts the differences shown in the data by removing them from a proper statistical context. As Dr. Meier recognized, this was not caused by Dr. Litwin, who was immediately aware of the problem when he came into the study which was almost completed, but by Dr. Haimes, who may not have been sophisticated enough in statistics to understand the implications. The fatal flaw in choosing the five clusters was that they were not picked because they were the most undesirable. Only two of them ranked in the "top" ten by plaintiffs own study (G-10), and of the "top" 20, four were black clusters and four were white clusters. They were picked for a reason internal to *1314 the data, that is, because they were heavily black. Even a lay person can deduce that if five groups are selected because they are the most heavily concentrated with blacks, they will be those most likely to be disproportionately black.

When Dr. Litwin came into the case, he recognized this problem of post-hoc selection immediately. He therefore devised a study, now called the "14-cluster analysis," to rebut this argument. He took the 13 clusters with at least 25 assignments made, and grouped the remainder of the 96 original clusters into one "super-cluster." This was done so that insignificantly small clusters were not weighted evenly with large ones. He then ran statistical significance tests on these clusters. He concluded, based on these tests, that since three of the fourteen clusters were significantly black, to a statistician, and according to chance only one should be, that there was an association between race and initial assignment.

The problem was that plaintiffs could not justify the cut-off point of 25. Defendants established that if smaller clusters were taken out of the "super-cluster" and counted in (with a cut-off of 20 or 15) the occurrences of the significances would have been statistically attributable to chance. Since the cut-off of 25 was arbitrary and cannot be justified by plaintiffs, the Court cannot find that plaintiffs eliminated chance as a factor. Plaintiffs attempted to use USS exhibit 81-25 to bolster their analysis. The Court finds that the exhibits used in cross-examining Dr. Siskin on this exhibit, P-3129 and P-3130 were never properly based on record evidence or authenticated by any witness. This whole examination of Dr. Siskin is thus without any basis that could serve as rebuttal evidence and cannot be considered to bolster the MPS' conclusions.

Plaintiffs also attempted to rebut this criticism by conducting a second analysis of the MPS data. Dr. Litwin performed a simulation, or "Monte Carlo" technique. Although this might have eliminated the possibility that the post-hoc selection influenced the results, the Court has concluded that the simulation was not done properly. Defendants' experts testified that the simulation should have been repeated by the computer 1,000 times in order to be reliable; Dr. Litwin stated that much of it was simulated only 100 times. The Court concludes that this seriously hampered its predictive ability. Second, the data underlying the Monte Carlo analysis is not identical with the MPS, in that a different data base, with different cluster and gross totals was used. Therefore, it cannot simulate the MPS, since it is not based on the same facts. Finally, the Court finds that Dr. Litwin's program was improperly constructed. It was too narrowly defined and eliminated consideration of disparities greater than plaintiffs had originally alleged. Also, it relied on the "super-cluster" as if it were not artificially constructed—which in effect carried over one of the problems from the original 14-cluster analysis. The Monte Carlo analysis therefore cannot serve to justify the MPS' conclusions, because it is also not reliable evidence.

The Court has concluded that the MPS and all the related studies, after all the evidence is in, are not competent to raise an inference of discrimination in initial assignment. This conclusion has been reached with some trepidation, since the Court is not highly trained in statistical techniques. What it has attempted to do is to sift through all the expert explanations of the studies and decide which ones seem most credible. Plaintiffs' experts were unable at times to give convincing reasons for their opinions. Defendants' experts, while offering little data and mostly criticizing theory, presented fuller explanations of the statistical concepts involved and how they interact. Plaintiffs' inability to justify what they did in terms of statistical theories that seem proper to the Court caused the whole study to be viewed as unreliable. Taking all the knowledge the Court has acquired of this complex field of statistics and study construction, it has concluded that plaintiffs were unable to present a solid study for the Court's consideration.

Since the plaintiffs' statistical proof has failed, there is very little class-wide evidence for the Court to consider on this issue. Plaintiffs contend that defendants used subjective and standardless hiring *1315 procedures. Even if the Court were to find that this were true, that would be meaningless unless some discriminatory effect were shown. Subjective hiring procedures cannot be considered illegal unless they result in discrimination. 439 F.Supp. at 76. Plaintiffs' statistics are their only method of showing a disparate effect of these policies on a class-wide basis and the Court has decided that such evidence is not probative. Plaintiffs also point to their historical evidence of discrimination, as reviewed by the Court, 439 F.Supp. at 77. Since this evidence is outside the statute of limitations, it is meaningless unless it bolsters a claim inside it. United Air Lines v. Evans, 431 U.S. 553, 97 S.Ct. 1885, 52 L.Ed.2d 571 (1977). Standing alone as it does, it cannot establish a claim.

Plaintiffs finally point to their "smoking gun," as the Court had termed it. This is the comment by one of the interviewers that he assigned blacks to "hot" jobs because they could "take the heat." First, this comment has not been proven to have had its alleged effect, since the failure of the MPS eliminated plaintiffs' ability to show blacks were disproportionately assigned "hot" jobs. Second, the prime witness who testified to this hearsay comment (the speaker died before trial) seriously was damaged in her credibility by facts brought out in a mid-trial injunction action. She requested that a discipline be enjoined, since she alleged it was for the sole purpose of harassing her for testifying. After reviewing affidavits, the Court decided that she was not entitled to an injunction because she had not established that her discipline was discriminatory. This incident established her, in the Court's view, as a person with a grudge against USS and against her personnel department superiors in particular. Her credibility thus called into question, and the relevancy of the "smoking gun" remark now removed by plaintiffs' failure to otherwise prove their case, the Court cannot find that discriminatory animus was shown.

Plaintiffs therefore have failed to maintain their prima facie showing of discrimination in initial assignment under either Title VII or § 1981 against USS. Since the defendants, in their case, presented credible evidence rebutting the major portions of the plaintiffs' case-in-chief, plaintiffs have failed to carry their burden by a preponderance of the evidence. Since the evidence as a whole fails to raise an inference of discrimination as to USS' policies and practices, the vicarious liability case against the Union also must fall.

MANNING NEW FACILITIES

Plaintiffs claim that defendant USS has discriminated in the selection of initial crews for new facilities added to the Fairless Hills plant in 1968 and 1972. The plaintiffs also claim that the Union is liable for this alleged discrimination because of special memoranda of understanding that were negotiated regarding the manning of these new facilities. Although the Court earlier was impressed with plaintiffs' evidence as showing a prima facie case of discrimination on this issue, evidence presented by the defendants has altered this opinion. The Basic Labor Agreement (BLA) required that new facilities crews be manned by "qualified employees who apply for such jobs in the order of length of service . . ." 1965 BLA, § 13-N. That paragraph allowed local parties to agree on the criteria for judging qualifications and length of service. Plaintiffs claim that the BLA's provisions were diluted at the local level so as to allow unchecked management discretion which resulted in discrimination. Three facilities were placed into operation in the relevant period: Galvanizing, the Rod Mill, and Electric Furnace and Caster. The Union and the company had a memorandum regarding the Galvanizing line, which stated that the first and second crews would be selected on the basis of "work experience and employment records." Rod Mill crews were selected according to straight seniority[10] except for the first 14 *1316 members of the rolling crew. Management sought and obtained the right to hire those employees from outside sources in an effort to find experienced employees for the highspeed rolling crew. Plaintiffs claim that management used subjective and standardless criteria in picking the Galvanizing and Rod Mill rolling crews and that the resultant selection of the largely white groups was discriminatory.

The manning of the Galvanizing department and the Rod Mill occurred in late 1967 and 1968. The Court held previously that the Title VII statute of limitations extends only back to October 2, 1969. 439 F.Supp. at 69. Therefore, these incidents can only be considered as possible violations of § 1981.[11] In order to prove a claim of employment discrimination under that statute, illegal motive or intent must be shown. Croker, supra at 1181. The Court of Appeals has chosen not to decide this issue yet. Richardson v. Penna. Dept. of Health, 561 F.2d 489, 493 (3d Cir. 1977). It hinted, in Resident Advisory Board v. Rizzo, 564 F.2d 126, 140-145 (3d Cir. 1977), that § 1981 proof included evidence of discriminatory intent. This Court will follow its Croker holding, on the belief that these opinions and new authority supports this position. Lewis v. Bethlehem Steel, 440 F.Supp. 949 (D.Md.1977); Milburn v. Girard, 441 F.Supp. 184 (E.D.Pa.1977) (Luongo, J.); Delgado v. McTighe, 442 F.Supp. 725 (E.D. Pa.1977) (Broderick, J.); contra, Davis v. County of Los Angeles, 566 F.2d 1334 (9th Cir. 1977).

Section 1981 is not an affirmative action program, as Judge Celebrezze noted in Long v. Ford Motor Co., 496 F.2d 500, 505 (6th Cir. 1974). He stated: "It is an equalizing provision, seeking to ensure that rights do not vary according to race. It does not require that persons be accorded preferential treatment because of their race." In Lewis v. Bethlehem Steel, supra, the Maryland district court held that in order to prevail under § 1981, a black employee would have to prove either that his treatment was intentionally dissimilar or that the policy's effect was intentionally dissimilar. Here, the evidence shows similarity of treatment. In the case of the Galvanizing crew, both blacks and whites grieved their exclusion and both races received the same results. The arbitrator decided that none of the employees were entitled to a new facility job under the Basic Labor Agreement, because none of them had the ability to perform the new jobs without training. On the Rod Mill, two whites grieved that less senior employees had been chosen because of other mill experience. Although the whites were offered jobs at the Rod Mill during the fourth step of the grievance processing, they were not allowed to "bump" those employees with less job time but more qualifications. It appears from this evidence that both white and black workers were excluded from first crews because of the application of subjective standards. The Court finds that this evidence defeats any inference of dissimilar treatment. Also, the Court notes that blacks excluded from the first crew were given positions on later crews. This also rebuts an inference of intentional discriminatory treatment.

Plaintiffs allege that the company's emphasis of ability over length of service and the alleged use of subjective standards for judging ability constitute a policy with an intentionally discriminatory effect. As evidence of the effect, plaintiffs point to the all-white composition of the first crews and the substantially white population of the second Galvanizing crew. When the Court considered this evidence at the close of the plaintiffs' case, it concluded that disparate effect had been shown. However, defendants have demonstrated that very few blacks applied for the small number of jobs available. On the Galvanizing first crew, 34 employees bid for the first operating crew's six positions, and two of those were *1317 black. The Galvanizing maintenance crew, which had two openings, was bid by 18 employees of whom one was black. One black was hired for the second crew. For the Rod Mill first crew, it appears that only one black applied. He was passed over for the first crew and placed on the second crew.

It appears that, although the evidence of all-white first crews might raise an inference of disparity, this is rebutted when the Court looks at the paucity of black applicants. Although plaintiffs' evidence showed a statistical underrepresentation, this conclusion did not take into consideration the dearth of black applicants. Since the Court concludes that the plaintiffs have failed to show disparate effect, their case must fall.

Furthermore, even if the Court were to find that the composition of the Galvanizing and Rod Mill crews were such as would show a disparate effect, the evidence does not establish that such effect was intentional. In Croker, this Court held that statistical differences alone could demonstrate illegal intent under § 1981. Accord, Richardson v. Penna. Dept. of Health, supra. However, in order to prove intent by statistical proof alone, Croker held that the differences must be dramatic. Here, where the total numbers are so small—eight for each of the Galvanizing crews and 14 picked for the Rod Mill crew—the Court cannot find that statistics alone could be probative.

Plaintiffs attempted to buttress their statistical case by showing that historically all first crews at Fairless were predominately white. While historical evidence should be considered as part of the context in judging intent, Croker, supra at 1181, these events are significantly removed in time. The plant was first staffed in 1951, with the Blast Furnace, Open Hearth and Pipe Mill opening at about that time. All these first crews were white. The Court does not find this fact to be significant enough to show intentional discrimination. The only other evidence was presented by John Bysek, a former USS manager, who testified that the plant superintendent told him not to hire blacks for the first crew. The superintendent denied any such statement in his testimony and noted USS' early policies of integration. All the evidence from which the Court might infer intent is so weak that the Court will decline to make that inference.

Finally, the Court also notes that, even assuming the company had intentionally bargained for the right to select the earlier crews in order to discriminate, or that the subjective standards resulted in discrimination, the alleged incidents of discrimination are so sporadic and isolated that they cannot serve as the basis of a class-wide claim. Cf. Teamsters v. United States, supra.

The last facility to be opened at Fairless is the Electric Furnace and Caster division. The crews were picked in compliance with the seniority system. Plaintiffs apparently do not dispute this. Since the seniority system is the basis for such selections, and there has been no evidence of a lack of its bona fides or of intentional discrimination, the Court concludes that the staffing of this facility does not violate Title VII or § 1981. As noted above, the Teamsters case immunizes seniority-based decisions under § 703(h) of Title VII. The Court concludes that plaintiffs have failed to prove a case under § 1981 as to any new facility hiring and under Title VII as to the 1972 hiring.

ACCESS TO MANAGEMENT

Plaintiffs' third class-wide claim alleges that blacks are disproportionately excluded from the ranks of first-level management at Fairless Hills. During their case-in-chief, plaintiffs substantiated this claim by showing that in 1973 and 1974, blacks were seriously underrepresented in the first-level management positions.[12] As the Court noted in July,

*1318 "This is the type of evidence that constituted a prima facie case in Croker [v. The Boeing Co. 437 F.Supp. 1138 (1977)]. The burden is now upon USS to show an affirmative defense, or to attack the validity of the analysis, both of which were done by Boeing in Croker. These statistics present a strong inference of discrimination at present." 439 F.Supp. at 81.

USS attempted both of these challenges to plaintiffs' evidence in defendant's case-in-chief. In the plaintiffs' case, Dr. Litwin based his conclusion that blacks were statistically underrepresented in management on evidence that showed that 75% of all first-level managers are drawn from the Fairless P & M workforce. To rebut this, USS introduced a study that showed that from 1972-1976, only 46.2% of all management employees were originally hourly employees. However, the Court does not find this USS exhibit persuasive, since it fails to limit itself to first-level managers, which is the category in question. Since it includes higher level managers, who may be more probably recruited through management training programs and other outside programs, this exhibit would not be effective rebuttal. Dr. Litwin, however, accepted USS' argument on rebuttal and assumed that only 50% of first-level managers were drawn from P & M. He still found their underrepresentation to be statistically significant, reinforcing the plaintiffs' prima facie case.[13] As the Court noted in its earlier opinion, 439 F.Supp. at 80-82, plaintiffs presented other evidence to buttress their statistics: testimonial evidence of a history of discriminatory informal policy, personal instances of exclusion from management jobs, and standardless criteria prone to discriminatory abuse. The evidence is very much the same as that found to uphold a similar claim in James v. Stockham Valves and Fitting Co., 559 F.2d 310 (5th Cir. 1977). First, new foremen are selected by incumbents and thus are a self-selected group. Second, the recommendations of the foremen are very discretionary with no built-in protections against bias. Third, as was shown by USS' own witnesses, there are no written guidelines and the criteria is not officially agreed upon. Fourth, as in Stockham, the criteria are quite subjective, even as spelled out in testimony by Fred Lafferty, the former superintendent of personnel. In general, as the Court noted before, one gets to be a foreman by "doing a good job." This is as subjective as the Stockham "best man" standard. See also Watkins v. Scott Paper Co., 530 F.2d 1159, 1193 (5th Cir. 1976). Finally, as the Court also noted before, although there are programs such as the Management Candidate program theoretically designed to bring minorities into management, they are so poorly publicized that most of the black witnesses did not know of them. There is no system for posting management vacancies, a practice which the appellate court disapproved in Stockham. The Court has considered this evidence also, and incorporates its earlier discussion of it, together with Dr. Litwin's rebuttal statistics, in determining that plaintiffs met their initial burden. Therefore, in order to avoid a finding of discrimination on these statistics and the other evidence, USS must show a legitimate business reason for this disparity.

USS first points to evidence presented by Dr. Seymour Wolfbein, a labor market analyst. Dr. Wolfbein's study showed that all of the experienced black foremen in the Fairless area labor pool were already employed. Therefore, the Court is asked to infer, no more black foremen could be hired. However, as the Court observed in its earlier opinion, an external comparison is not appropriate in a discussion of first-level management, since there is no evidence that USS hires these foremen off the street. Since the evidence shows that 50-75% of the managers are from USS' own P & M workforce, the proper comparison is internal. *1319 This would determine how many blacks from Fairless should be promoted to management. This is what the plaintiffs did, and the study showed that blacks were in higher proportions in the P & M workforce than in management. Such an internal comparison is proper, since it is keyed to the pool from which such employees are drawn, under Hazelwood School District v. United States, 433 U.S. 299, 97 S.Ct. 2736, 53 L.Ed.2d 768 (1977). The Court incorporates its earlier statements on Hazelwood and this study, at 439 F.Supp. 81-82 and finds Dr. Wolfbein's analysis irrelevant on this issue.

USS next argued that the comparison of the percentage of black managers to that of black workforce was improper because all members of the workforce were not qualified to become foremen. The most often-quoted figure showed that only 20% of all P & M workers possessed the requisite qualities to become managers. However, in terms of the criteria elicited from USS' own witnesses, the Court cannot find that blacks overall are less likely to be qualified than whites. Therefore, they should be represented in the same proportions as in the P & M workforce. While USS' wage regressions raise an inference that Fairless blacks have somewhat less formal education and are older when hired than whites, these are not the qualities that would bar one from management. On-the-job skills and knowledge and good safety and discipline records appear to be most important These qualities appear to be equally shared by blacks and whites. USS has presented no evidence to the contrary. In Croker, defendant showed that because blacks were not hired until the 1960's they had less time on the job and thus were less likely as a group to have enough tenure to meet Boeing's criteria for management. Here, USS has not shown by any measurable criteria that blacks as a group lacked any of USS' subjective qualifications to become first-level managers. The education gap between blacks and whites at Fairless is not such as to cause the whites' dominance of management positions, since the lower management position require only basic reading and writing skills. No USS official testified that advanced formal education was a first-level management prerequisite.

USS finally argues that it was unable to hire blacks as managers because qualified blacks had turned down offers to move into management. To buttress this, it presented the testimony of four blacks. However, USS did not establish how many other offers were made to blacks and rejected. Nor did it show that whites disproportionately accepted management positions. When this testimony is balanced with the evidence by plaintiffs' witnesses of their strong but unrequited desire to enter management, the Court is not convinced that the paucity of black managers can be explained by an overall black unwillingness to serve. Nor does the Court find it credible that the blacks who refused were the only ones in the plant qualified to become managers.

This Court must hold, based on this evidence, that USS has not met plaintiffs' prima facie case with evidence showing that the management disparities are a result of any legitimate business reason. USS has no set tenure or educational criteria for promotion; no job-related tests are presently administered to identify the qualified individuals on an objective basis. Because the criteria are so subjective, this area of employment practices is especially susceptible to discriminatory abuse. As the evidence shows, a "buddy-buddy" system of promotion, unsupervised from above and unstructured in application, has resulted in persons selecting members of their own race in a self-perpetuating elite. Unless and until blacks are admitted to that group in numbers proportionate to their presence in the workforce, this pattern of discrimination will probably continue.

However, the Court cannot find the level of intent required to hold USS liable for discrimination in management selection under § 1981. Although the statistics and testimony establish that blacks were discriminated against in the management selection process involving lower-level managers, the various programs engaged in by USS at a higher level rebut an inference that such discrimination was intentional to *1320 the extent required under the Civil Rights Statute. Fairless has had a number of formal programs designed to open up its management positions to all, such as the Foreman Candidate Program and the Management Candidate Program. While it is difficult for the Court to square this evidence of good faith attempts to draw in black management by the upper echelons of USS management with the informal practice of excluding blacks by those lower-level managers (who are responsible for the actual hiring into first-level positions), it is certainly evidence that would mitigate a charge of intentional discrimination by the firm as a whole. To be sure, a company is liable for lower-level practices if known and left unchanged by supervisors.[14]Croker, 437 F.Supp. at 1194 and cases cited therein. However, the Court feels that where a single firm enunciates policies going in one direction while its practices go in another, that firm is liable not for intentional discrimination under § 1981 but the lower standard allowed under Title VII. This Court has repeatedly held, and reiterates that holding, that intent is not a requisite under 42 U.S.C. § 2000e. As the Court said in Croker, "(G)ood intentions and the existence of affirmative action programs are not defenses to a Title VII action." 437 F.Supp. at 1182. In Croker, Boeing demonstrated more than good intentions in its promotions to management. The number of blacks promoted to the Vertol Division's management was greater than blacks' proportion of the workforce during the limitations period. Therefore, the Court inferred that Boeing was acting out its stated intentions to cease discrimination in management. USS has not demonstrated that its good intentions have borne fruit, and that blacks are achieving management positions at any greater rate than prior to the promotion programs. Therefore, although these programs are sufficient to negate the § 1981 intent by showing a desire to establish a race-blind policy, their failure to advance more blacks to management merely supports the Court's conclusions that there has been no effect in fact on hiring practices. Therefore, USS is liable under Title VII for discrimination in elevation to management, but is not liable under § 1981.

Plaintiffs have argued that USS used the position of vicing foremen as a stepping-stone to foreman and discriminated against blacks in promotion to this position as well as to a first-level management position. A vicing foreman has a supervisory position, but remains in the P & M bargaining unit. Instead of a salary, he receives his hourly pay plus 10 percent. Plaintiffs assert that these positions were widely sought after, and in fact were more desirable than the first-level management positions because of potential for higher pay.

Plaintiffs have presented very little evidence that such positions were used by USS to "groom" individuals for later management positions. Furthermore, they have not shown that blacks were under-represented among these "10 percenters." In fact, they have presented no class-wide proof on this issue. Indeed, the Court had thought at the time of the motion to dismiss, that this issue had been dropped from their case on management. The only evidence that the plaintiffs put forward on this was the testimony of a few individuals, one of whom, Arthur Johnson, has his own case pending before this Court. Since counsel agreed that the defendants need not put in evidence rebutting the individual cases, the Court will decline to comment on the merits of those individual cases at this point. However, in view of the lack of class-wide proof on the issue of vicing, the Court will not extend its holding that USS has discriminated on the basis of race in promotion to first-level management to the position of vicing foreman. Even if the individual cases were found to be valid, they could not sustain a class-wide verdict, for without some proof of a discriminatory policy or statistical proof of disparate impact, individual cases are no more than isolated incidents of discrimination. See Teamsters v. United States, supra.

*1321 ACCESS TO CRAFTS

Plaintiffs have asserted a class-wide claim that blacks at Fairless are discriminated against in access to the prestigious and high-paying crafts jobs. This claim is made against the Company and vicariously against the Union. This claim, as it was presented at trial, relates primarily to the barrier allegedly presented by the test battery which an individual must pass in order to become an apprentice. Therefore, although other class-claims were raised earlier on the issue of the crafts jobs,[15] plaintiffs' case at trial was a "testing" case that has its origins in Griggs v. Duke Power Co., supra.

The Court should first note that after most of this portion of the opinion was drafted, the Union submitted the results of a final arbitration (cited as "Award," infra) on the USS tests as a supplement to the record. Although this was based on an interpretation of the labor contract rather than the federal law, it still contained relevant information. It is interesting to note that both the arbitrators and the Court agree in the result, that the tests are not "job-related." The Court will, from time to time, use this award in its discussion of general aspects of USS testing.

A. Burden of Proof

Before considering the state of the evidence on the issue of testing, it would be helpful for the Court to discuss the legal standards for the type of proof and the respective burdens of proof that are unique to this area. The first major case on testing was Griggs v. Duke Power Co., issued by the Supreme Court in 1971. In Griggs, the High Court held that tests which have a disparate effect on minorities must be shown to be "job-related" in order to be found acceptable under Title VII. A non-job-related test was a discriminatory one if minorities passed it at a rate substantially lower than that of whites. Four years later, in Albemarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280, the Supreme Court discussed in somewhat more detail what it meant by "job-related." In that case, the defendant had engaged an industrial psychologist to perform a validity study to demonstrate that success on the tests used was related to success on the jobs for which it was a selection device. The Albemarle Paper court looked to the guidelines promulgated by the Equal Employment Opportunity Commission (EEOC) as standards for evaluating the propriety of the psychologists' work.

"The message of these Guidelines is the same as that of the Griggs case—that discriminatory tests are impermissible unless shown by professionally acceptable methods, to be `predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.'" Albemarle Paper, at 431, 95 S.Ct. at 2378, quoting from 29 CFR § 1607.4(c).

Since these two opinions were handed down, courts have depended more and more on the guidelines of the EEOC and more recently, some new guidelines issued by the Department of Justice. The Court will discuss the meaning of the two sets of guidelines more thoroughly below. However, the lesson of the cases is clearly that in a Title VII testing case, the plaintiffs first must show that the tests have a disparate impact. *1322 The defendant is then to demonstrate, through the use of professionally developed and acceptable "validation" studies, that the tests bear a statistically and practically significant relationship to the jobs for which they are used as a selector.

The next issue is the question of which party bears the burden of proving each one of these portions of a Title VII testing case. Plaintiffs argue that the burden of persuasion shifts to defendants once plaintiffs demonstrated that the test has a disparate impact on blacks. They claim that defendant then has the burden of persuading the Court, by a preponderance of the evidence, that the test is job-related. Only if that standard is met does the burden of persuasion shift back to the plaintiffs. Since this is a very complex but important issue of law, the Court has decided to discuss this in some depth.

In this Court's last major Title VII case, Croker v. The Boeing Co. (Vertol Division), supra, the issue of burden of proof was discussed in general, rather than in terms specifically applicable to a testing case. Citing General Electric Co. v. Gilbert, 429 U.S. 125, 97 S.Ct. 401, 50 L.Ed.2d 343 (1976), this Court said that "the plaintiffs retain at all times the burden of proving discrimination by a preponderance of the evidence." 437 F.Supp. at 1183. Therefore, as in the initial assignment claim supra, plaintiffs' prima facie case of disparate treatment or impact must be reexamined at the close of all evidence. Plaintiffs never lose the burden of proving that a disparity exists. However, under Title VII, the employer may defend not only by directly attacking plaintiffs' evidence of disparity, but also by raising an affirmative defense. These defenses are the "business necessity" arguments recognized in Albemarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975). That case noted that the employer has the "burden of proving" its affirmative defense. USS has argued that it does not have the burden of persuasion, but only the burden of going forward with the evidence on its affirmative defense. This Court, consistent with the Croker decision, does not agree. The Court of Appeals for the Third Circuit has most recently addressed the issue of burden of proof in Rodriguez v. Taylor, 569 F.2d 1231 (3d Cir. 1977). There, the court cited Ostapowicz v. Johnson Bronze Co., 541 F.2d 394 (3d Cir. 1976), and said, "(I)f an employer produces evidence of a non-discriminatory reason for an employment decision, he may bear the burden of proof on that ultimate issue." 569 F.2d at 1239. In Ostapowicz, the appellate court was interpreting McDonnell-Douglas Corp. v. Green, 411 U.S. 792, 93 S.Ct. 1817, 36 L.Ed.2d 668 (1973), which outlines the burden of proof rules for a non-class Title VII case. Unequivocally, the Ostapowicz court said that the burden shifts to the defendant on a "business necessity" defense. "The defendant must prove its justification by a preponderance of the evidence." 541 F.2d at 399. This Court believes that the principles of Rodriguez and Ostapowicz require USS to prove the affirmative defense by a preponderance of the evidence here. See also EEOC v. E. I. duPont de Nemours and Co., 445 F.Supp. 223 (D.Del.1978); Vulcan Society of New York City Fire Dept. v. Civil Service Comm'n., 490 F.2d 387 (2d Cir. 1973). In Croker, the Court said that the plaintiffs retained at all times their "traditional civil litigation burden" of proving their cause of action, being that of discrimination. Here, defendant likewise retains its "traditional" civil burden of proving its affirmative defense by a preponderance of the evidence. The requirement that defendant prove a business necessity defense under Title VII is no different than that imposed upon a defendant in a personal injury case to establish assumption of risk or contributory negligence. The Court sees no reason to differentiate between Title VII and other civil cases on the issue of defendants' burden, since it does not do so regarding plaintiffs' burden. As outlined above, the Court finds much support for this position both in this Circuit and by the Supreme Court.

Not only does defendant have the burden of proving that its testing procedure is job-related, by conducting validity studies, but that burden is a heavy one. Ever since Albemarle Paper, courts have closely scrutinized validity studies to ensure that they *1323 were conducted properly and fairly in all respects. In Richardson v. Penna. Dept. of Health, 561 F.2d at 491, this Circuit's appellate court said: "(S)uch validation requires a probing judicial review of the choices made by those responsible for the test." (Emphasis added.) The Seventh Circuit Court stated that defendant has the burden of showing that the test has a "manifest relation" to the job in question. United States v. City of Chicago, 549 F.2d 415 at 427, citing Griggs, 401 U.S. at 432, 91 S.Ct. 849. That court required "convincing proof" before accepting a study or expert testimony. 549 F.2d at 432.

B. Intent

Defendant USS next argues that plaintiffs must show, as part of their Title VII prima facie case, evidence that the testing procedure is intentionally discriminatory. Certainly, that is true under § 1981, as the Supreme Court held in Washington v. Davis, 426 U.S. 229, 96 S.Ct. 2040, 48 L.Ed.2d 597 (1976). However, the courts have been clear, since Griggs v. Duke Power Co., supra, that intent is not an element of a Title VII case, see discussion infra, and especially in a testing case. Defendant has put forth a number of arguments as to the issue of intent. Defendant first contends that since both Griggs and Albemarle involved situations where pre-Act segregation was admitted, that a test or selection procedure must have its "genesis" as a device intended to perpetuate such discrimination in order to invoke the holdings of those cases. In examining the language of Griggs itself, it is clear that the High Court did not base its decision on an inference that the tests were merely devices to subtly and legally continue discrimination. In fact, the Court specifically rejected intent as an issue. "(G)ood intent or the absence of discriminatory intent does not redeem employment procedures or testing mechanisms that operate as `built-in headwinds' for minority groups and are unrelated to measuring job capabilities." 401 U.S. at 432, 91 S.Ct. at 854. In fact, in spite of the power company's history of blatant discrimination, the Supreme Court felt that the company's policy of aiding employees to meet the job requirements suggested a lack of intent in instituting the questioned job criteria. "But Congress directed the thrust of the Act to the consequences of employment practices, not simply the motivation." Ibid. (Original emphasis). This "no-intent" theme was repeated in Washington v. Davis, 426 U.S. at 247, 96 S.Ct. at 2051, where the Court said:

"Under Title VII Congress provided that when hiring and promotion practices disqualifying substantially disproportionate numbers of blacks are challenged, discriminatory purpose need not be proved . . ."

Most recently, in the Teamsters case, the Court reaffirmed its decision in Griggs that the plaintiff need not show discriminatory motive in such a case. 431 U.S. at 336, n.15, 97 S.Ct. 1843, n.15.

The Court of Appeals for the Third Circuit has not read the factual limitation into Griggs that the defendant urges. In Richardson v. Penna. Dept. of Health, supra at 491, the appellate court discussed Griggs and said: "A discriminatory intent on the part of the employer need not be proved to show that a testing practice violates Title VII." Defendant claims that the decision in General Electric v. Gilbert, supra, requires that this interpretation of Griggs be altered. That argument was rejected quite recently by the Seventh Circuit appellate court in United States v. City of Chicago, 573 F.2d 416 (7th Cir., 1978). That court said that Gilbert held only that the maternity leave rule did not have a discriminatory effect, and that it reaffirmed Griggs on the issue of intent.[16] This Court must reject defendant's argument that the facts of Griggs and Albemarle require proof of a discriminatory intent or origin.

Defendant USS also claims that the testing procedure at Fairless is part of a "merit system," protected under § 703(h), 42 U.S.C. *1324 § 2000e-2(h) and that therefore, plaintiffs must prove intent. In the Teamsters decision, the Supreme Court held that a bona fide seniority system could not be attacked for its discriminatory impact because of that section's protection. The language of that section applies equally to "merit systems."[17]

Section 703(h) is a specific exception to Title VII. As construed by the Supreme Court in Teamsters, it protects certain employment practices from attack even if their effect is to discriminate so long as they are not put into operation with discriminatory intent. In Teamsters, the High Court held that a seniority system which perpetuated the effects of pre-Act discrimination was not a violation of Title VII, where plaintiffs could not show the system was designed to continue such discrimination. In the case at bar, defendant USS points out that Congress included a "merit system" in the same section. USS contends that a "merit system" is one that uses tests to find or advance the qualified applicants. Therefore, the company contends, a merit system which uses tests must be shown to have been intended to discriminate or it is protected by this section. While the Court finds that the defendant may be correct in classifying Fairless' battery as part of a "merit system" in light of the Act's legislative history, it does not agree with the defendant's conclusion that plaintiffs must show that the testing was intended to discriminate. The Court holds that the standard for finding that a "merit system" is not bona fide and thus not protected by § 703(h) is the same as finding that a test is not protected by that section's testing language under Griggs.

Section 703(h), as written, was not included in the original Civil Rights Act of 1963. Almost from the beginning of the extremely lengthy debate on the Act, the senators were concerned about the issue of testing. This concern was brought about in large part by a report of a hearing examiner in an Illinois Fair Employment Practices Commission case, Myart v. Motorola, (reprinted in full at 110 Cong.Rec. 5662). The examiner recommended that Motorola be required to cease using its applicant screening exam. He described it as "outmoded" and said that a test should be substituted "which shall reflect and equate inequalities and environmental factors among the disadvantaged and culturally deprived." Ibid. As the Supreme Court observed in Griggs, as a result of Myart "[Congressional] debate revolved around claims that [Title VII] as proposed would prohibit all testing and force employers to hire unqualified persons simply because they were part of a group formerly subject to job discrimination." 401 U.S. at 434, 91 S.Ct. at 855. Although Senators Humphrey of Minnesota, Case of New Jersey and Clark of Pennsylvania sought to reassure their colleagues that Title VII would not limit an employer's right to test applicants as to ability, the Senate went unpersuaded.

The testing issue was but one of many that formed a barrier to the passage of the Civil Rights Act. Therefore the two party leaders, Senators Mansfield of Montana and Dirksen of Illinois, submitted amendment 1052 ("the Mansfield-Dirksen substitute"). Included in that package was a § 703(h), but only the present first sentence exempting "bona fide seniority and merit systems." Texas' Senator Tower, still concerned about the testing issue, pressed for passage of his amendment 652. The amendment would have added a specific exemption for applicant and promotional testing, provided that the "test is designed to determine or predict *1325 whether such individual is suitable or trainable." 110 Cong.Rec. 13492 (June 11, 1964). Senator Case objected to the broad wording of the amendment. He stated, "Discrimination could actually exist under the guise of compliance with the statute." 110 Cong. Rec. 13504. (June 11, 1964). At the same time, Senator Humphrey opposed the amendment as redundant of the already-existing language on merit systems. He said, in response to a colleague's question, that a test covered by the Tower amendment would have to be included as part of a merit system. Therefore, a test which was part of a bona fide merit system was protected by § 703(h) without further amendment, he implied. The Senator said that the § 703(h) language of the Mansfield-Dirksen substitute was added specifically after review of Myart v. Motorola. Ibid. See also 110 Cong.Rec. 13650-51 (June 12, 1964). Amendment 652 was defeated.

Undaunted, and still concerned about the testing issue, Senator Tower introduced amendment 952, which is the present § 703(h) testing language. He explained that he had redrafted it to be more specific, agreeing that his first attempt had been too loosely drawn. This addition was not opposed. Senator Humphrey urged its adoption, stating that it was "in accord with the intent and purpose of the title." It was adopted by voice vote.

At first glance, this history might appear to be contradictory. On the one hand is Senator Case objecting to the first Tower amendment as not sufficient, still allowing for discriminatory testing. On the other hand is Senator Humphrey, who was working with Case for passage of the Act, claiming that Tower's amendment was too sufficient, because tests already were covered by the "merit system" exemption. What makes this history actually consistent is the fact of Senator Humphrey's support for the second Tower amendment. To the Court, it appears that while Senator Humphrey may have also viewed the second testing section as redundant, he recognized that explicit protection of job-related testing was necessary for eventual passage of the bill. It is this Court's opinion that Senator Humphrey acquiesced to amendment 952 because he knew that the end result would be the same—that only job-related tests would be allowed under § 703(h).

It is in light of this historical equation of the "merit system" language of § 703(h) and its testing provisions that this Court finds that Griggs' standard applies with equal force to all of § 703(h). Both sections were drafted in response to Myart, and reflected senatorial concern that valid tests might be outlawed altogether. Both seek to protect the employer's right to give valid tests. In Griggs, the Supreme Court looked at the statutory language on testing and concluded that non-job-related tests were unfair employment practices. In simple terms, a lack of job-relatedness indicates that the test is one "designed or used to discriminate" and thus not protected under Title VII. This same evidence also removes the label of "bona fide" from a merit system that uses non-job-related tests. Therefore, no matter which section is construed, the result is the same.[18] The Griggs legislative history of the testing language confirms this. Testing and merit systems are the same and are protected only if they serve a legitimate employer concern of screening for ability. It is this concern that Congress sought to protect—not just a general right to give tests.

In short, the defendant's argument that plaintiffs must show that Fairless' tests had their genesis in discrimination is clearly erroneous. The most recent precedent both in this Circuit and by the Supreme *1326 Court holds that a plaintiff need not prove a test is intentionally discriminatory in order to shift the burden to defendant. The legislative history of § 703(h) establishes that tests and merit systems having a disparate impact are only protected if they are shown to be job-related. To hold otherwise would be to reverse well-established precedent and to defeat the intent of Congress.

C. Disparate Impact

Having ruled out any need for the plaintiff to prove intent under Title VII, the Court must still reexamine the plaintiffs' prima facie case to see if they have met their burden of proof. There is presently remaining in the case direct evidence which plaintiffs contend establishes their prima facie case. First, their experts have analyzed and compared the pass rates for blacks and whites for the various tests used during the limitations period. Second, they have introduced evidence of the number of blacks and whites in the apprentice program in 1968 and 1969, and in the crafts jobs in 1973 and 1975.[19]

The Court finds that plaintiffs have produced evidence supporting an inference of discrimination in access to the crafts and Metallurgy and Inspection because of the test battery.[20] Until November 21, 1973, candidates for the apprentice program took a form of the Wonderlic exam, (Form D and F were used until June, 1971, after which the company used forms I and II), the Revised Minnesota Paper Forms Boards test AA and BB (RMPFB), and a form of the Bennett Mechanical Comprehension test (Form AA used until June, 1971, after which Forms S and T were used). The apprentice applicant was judged on his average of the three tests' scores. After November 21, 1973, the company substituted the Differential Aptitude Test-Numerical Ability (DAT-NA) for the Wonderlic, and required the applicant to pass all three tests.[21] From 1965 to 1970, candidates for Metallurgy and Inspection had to pass the battery.

Plaintiffs showed that prior to 1973, the ratio of whites to blacks passing the battery was four to one. This difference was testified to be statistically significant. Although the scores were considered as a total average for personnel use, the Court finds that the individual breakdown is also significant. This is because a single element may alone be discriminatory. On the Wonderlic, 8.7% blacks scored a passing score or better, while 39.5% of the whites received that score. These are statistically significant and the ratio of these rates is 4.5 to one. On the Minnesota exam, the rate of blacks was 21.8% receiving zero or better compared to 50.1% of the whites, which was a statistically significant differential. The ratio is almost two to one. On the Bennett, 28.3% of the blacks scored zero or better, while 61.7% of the whites did, again making a two to one ratio. This differential is also statistically significant.

After the changes in the battery in 1973, things improved only somewhat. The black-white pass ratio for the battery as a whole is a statistically significant 3.6 to one. Although the ratios are better on each test —1.85 to one on the DAT-NA, 1.6 to one on the RMPFB and 1.5 to one for the Bennett—the *1327 differential between the black and white rates is still statistically significant. Forty percent of the blacks passed the DAT-NA, as compared with 74% of the whites. On the Minnesota exam, 43% of the blacks and 69% of the whites passed. More were successful on the Bennett, where 61% of the blacks and 91% of the whites passed. However, to pass the entire battery, an applicant had to have a concurrence of pass rates on all three tests.

To complete their prima facie case, plaintiffs demonstrated that the jobs in question do not have a proportionate number of blacks as compared to the plant as a whole. In the main plant, 2.7% of the crafts employees were black, as compared with a plant black population of 12.2%. In 1975, the figures were 3.6% compared with 12.4%. In the Pipe Mill, the percentage of craftsmen who were black was 1.6% as compared with the plant-wide 13.8% rate. In 1975, the Pipe Mill crafts were 8.1% black and the Mill was 14.5% black. These differences are statistically significant. Dr. Litwin testified that the underrepresentation in the Rod and Wire Mill crafts is also statistically significant. Plaintiffs also showed significant underrepresentation in the apprentice program, with 2.1% of the apprentices being black in 1968 and 1.7% in 1969 compared with the plant total of 8.3%. Finally, in Metallurgy and Inspection jobs, for which the battery was a requirement until 1970, the disparities are very great. In 1973, three out of 97 employees were black. By 1975, this had changed to 11 out of 147, still less than the plant-wide percentage.

To attack the prima facie case directly, USS has put forward only a few arguments. First, they presented Dr. Wolfbein, who contended that blacks were actually overrepresented in the crafts as compared with their presence in the outside workforce. As in the case on access to management, the Court finds Dr. Wolfbein's testimony to be irrelevant. The testing issue here is not whether blacks applying off the street to Fairless get the crafts jobs, but really is whether internal access to these jobs is barred by the requirement of the apprentice battery. Thus, the relevant population to be studied is not the number of blacks already trained and employable on the outside, but the number of potential applicants at Fairless, which is presumed to be the entire P & M workforce. In James v. Stockham Valves, 559 F.2d at 341, that Court of Appeals rejected evidence similar to Dr. Wolfbein's, on a question of black representation in crafts. In Stockham Valves, primary access to these jobs was through the in-house apprentice program, as is true at Fairless. This Court rejects Dr. Wolfbein's evidence as irrelevant to the plaintiffs' prima facie case.

Second, defendant has attacked plaintiffs' statistics as inconclusive. It argues that the statistics are faulty because they do not allow for re-testing, which might distort the figures. Defendant argues that if blacks fail and re-take the test, and whites pass it on the first try, then the black statistics are distorted vis-a-vis the white. The problem with this argument is that it builds in an assumption of adverse impact, in that it assumes that blacks will fail (and fail again) more often than whites. An equal failure rate between the races would not affect the statistics, and the disparities would remain the same. Therefore, the Court either accepts the plaintiffs' statistics as evidence of disparate impact or accepts defendant's assumption of differential pass rates, which implies adverse impact. Either way, the burden shifts to the defendant.

Plaintiffs have shown that blacks are underrepresented to a statistically significant degree both in the apprentice programs and the crafts and other jobs in question. The Court believes that, even without the MPS figures, this fact is an adequate basis for inferring that the selection process, or one part of it, has a disparate impact. Defendant argues that in order to show a disparate impact in selection, plaintiffs cannot show only a disparate population in the crafts, but must show disparate hiring rates, as through the MPS' study of initial assignments. Although those data would have been preferable, their absence does not defeat plaintiffs' case. In Washington v. Davis, supra, the Supreme Court noted that the district could have relied on the underrepresentation of blacks in the police department, *1328 coupled with the high black failure on the test, as prima facie evidence. In James v. Stockham Valves, 559 F.2d 310 (5th Cir. 1977), plaintiffs met their burden by showing gross disparity between blacks and whites in jobs requiring high scores on tests along with evidence of disparate pass rates. In an earlier case, Watkins v. Scott Paper Co., 530 F.2d 1159, 1185-1186 (5th Cir. 1976) that appellate court found that a very small percentage of blacks were in the crafts. "(A) statistical showing of black exclusion from a particular kind of job establishes a prima facie case of discrimination," that court held.

Since there is disparity in the craft population as a whole, the Court must examine the portions of the selection process to judge if any of the parts are discriminatory. Rule v. Int'l Association of Bridge Workers, 568 F.2d at 565, n.10. "Where the total selection process has an adverse impact upon a substantial racial group in the labor market, the individual components of that process—such as a screening test—are also to be evaluated for adverse impact." NAACP, Ensley Branch v. Seibels, 13 EPD ¶ 11,504 (N.D.Ala. Jan. 10, 1977). As the Court noted earlier, a large portion of those in the crafts jobs have come through the apprentice program. Since there is no evidence that the non-apprentice route to crafts jobs is discriminatory, the Court must look to the apprentice program, and the selection process for that program, as the source of a possible discriminatory barrier. United States v. City of Chicago, 549 F.2d at 429, n.12. In order to show that the apprentice battery is the discriminatory element, the plaintiffs must show 1) that it is causally linked to the population disparity and 2) that it disparately selects persons for the apprentice program.

In Neloms v. Southwestern Elec. Power Co., supra, the court said that there must be a cause and effect relationship between a selection device and the alleged disparity in population. There, the Court found that no employee was denied advancement because of his failure to pass the test. In the instant case, the test battery is the primary, if not the sole, selector. This was recognized by the company in the arbitration over the testing program. Award, at 2. In the studies presented by USS, it was noted that although passage of the battery is not an absolute requirement, it was rare that employees who did not pass them were admitted into the program. In the first study, the Fairless psychologist noted that although an average standard score of zero was needed to pass the battery, a few apprentices in the validation study had scored below that. In the second Fairless study, the author said that the only persons selected for the program who had not passed the new battery were those who had qualified under the pre-1973 battery. The Court can infer from this that the successful applicant who does not pass the battery is the rare exception that makes the rule. Since the test impacts disparately and there is no other significant selection device, the Court finds that the test disparity is causally connected to, and was reflected in the under-representation of blacks in the crafts. See also Friend v. Leidinger, 446 F.Supp. 361 (E.D.Va.1977).

Second, the Court must consider if the differences in the pass rates are such that would imply an adverse impact. First, the Court observes that the differences are statistically significant, which means they cannot be attributed to chance or randomness. Second, the Court finds that the statistics meet the definition of "adverse impact" under the Department of Justice guidelines (DOJ), which have been accepted by some courts as appropriate criteria for judging such impact. In Friend, the court used 28 CFR § 50.14 not as "a hard and fast rule," but only as a "rule of thumb." At 368. In Seibels, Judge Pointer also relied on § 4b of the DOJ guidelines, which state:

"A selection ratio for any racial, ethnic or sex group which is less than four-fifths ( 4/5 ) (or eighty percent) of the rate for the group with the highest rate will generally be regarded as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded as evidence of adverse impact."

The pass rate of blacks on the pre-1973 battery is approximately one-fourth that of *1329 whites.[22] Post-1973, the rate is 27%.[23] This is clear evidence of severe adverse impact under the DOJ guidelines. Looking at recent cases, the Court finds further support for a conclusion of adverse impact. In United States v. City of Chicago, 549 F.2d at 428, the appellate court supported the district court's conclusion of adverse impact where minorities failed at twice the rate of whites. Here, the differences are almost double that.

Taking this evidence as a whole, the Court concludes that plaintiffs established a prima facie case of discrimination in access to crafts, Metallurgy and Inspection jobs because of the apprentice battery. The burden of proof shifted to defendant USS. To satisfy this burden, it had to show that the tests are job-related, a special version of the "business necessity defense" as first stated in Griggs.

D. The Guidelines

To accomplish this, USS placed into evidence three validation tests: the "South Works" study, conducted at the USS plant in Chicago, Illinois, dated May 31, 1973; the February 12, 1975 study at Fairless, referred to as "Fairless I," and the May 1, 1975 Fairless study, known as "Fairless III." To judge these studies, the Court has relied on the testimony of USS' experts, Drs. Frank Schmidt and Robert Guion, and plaintiffs' expert, Dr. James Kirkpatrick. Dr. Kirkpatrick relied on studies performed by plaintiffs' statistical expert, Dr. Litwin.

In addition to case law, the Court has looked for guidance to the EEOC "guidelines on Employment Testing Procedures," 29 CFR § 1607 et seq., the DOJ guidelines, 28 CFR § 50.14 et seq., and the psychological testing industry's own guidelines, "Standards for Educational and Psychological Tests and Manuals," published by the American Psychological Association (APA) in 1970 and revised in 1974. There are conflicts among these standards and the two sides have urged the Court to resolve them in different ways.

In Albemarle Paper v. Moody, supra, 422 U.S. at 431, 95 S.Ct. at 2378, the Supreme Court's test validation decision, the Court looked at the significance of the EEOC guidelines.

"The EEOC Guidelines are not administrative `regulations' promulgated pursuant to formal procedures established by the Congress. But, as this Court has heretofore noted, they do constitute `[t]he administrative interpretation of the Act by the enforcing agency,' and consequently they are `entitled to great deference.' Griggs v. Duke Power Co., 401 U.S., at 433-434, 91 S.Ct., at 854."

In Friend, Judge Warriner discussed the DOJ guidelines. He observed that they are the product of the Equal Employment Opportunity Coordinating Council (EEOCC), which was established to eliminate inconsistencies among various federal agencies which deal with discrimination legislation. 42 U.S.C. § 2000e-14. However, when the DOJ guidelines were promulgated, the EEOC refused to accept them and continues to use its own guidelines.

"The Court is of the opinion that whatever effect these various guidelines have upon the agencies promulgating them, they are not legally binding upon either the parties to this suit or the Court. The Court can and will use both sets of guidelines as an aid to determining if either party has carried its burden of proof under Title VII, but although these guidelines are of great value as the interpretation of the law by government agencies charged with the duty of enforcing the law, the Court will not be bound by one set of guidelines over another, but only by what the Court deems as reasonable in this case." 446 F.Supp. at 369.

The Court finds this approach, of combining and modifying the two sets of guidelines, as the most reasonable.[24]Accord, Detroit Police *1330 Officers Ass'n v. Young, 446 F.Supp. 979 (E.D.Mich.1978). The APA standards, which are incorporated by reference into the EEOC guidelines, 29 CFR § 1607.5(a), were also acknowledged in Washington v. Davis, supra, as the professionally recognized method for validating employment tests. United States v. City of Chicago, 549 F.2d at 430, n.14. The Court will look to all these guidelines, the case law and the witnesses' testimony in attempting to determine whether the batteries are job-related.

In accordance with its general discussion on burden of proof supra, the Court holds, based on a reading of Albemarle Paper and its progeny, that it is the defendant's burden to prove that its validity studies are adequate under the law as interpreted by the guidelines or by the courts. In Albemarle Paper, the Supreme Court rejected the validation study as "defective" when "measured against" the EEOC guidelines. 422 U.S. at 431, 95 S.Ct. 2362. This conclusion is also implied in the Courts of Appeals' decision in United States v. City of Chicago, 549 F.2d 415 (7th Cir. 1977) and Watkins v. Scott Paper Co., 530 F.2d 1159 (5th Cir. 1976), where those courts found validation studies to be inadequate. Finally, this conclusion is supported by the EEOC guidelines, specifically § 1607.4(b), and the DOJ guidelines generally.

The Court has decided, after reviewing the three studies and the testimony regarding them, that they are not sufficient individually or as a whole to carry defendant's burden of proving that the batteries are job-related. The defendant has characterized plaintiffs' criticisms of the studies as "nit-picking," as indeed some of them are. However, although some flaws found by the Court are individually not such as would defeat the study, taken together they cast such doubt upon the studies' adequacy that the court cannot rely upon the studies' results. As the Court found regarding plaintiffs' initial assignment case, these criticisms, even though simply theoretical in some instances, are sufficient to defeat defendant's case when it bears the burden of proof.

E. The Studies Generally

Before going into an analysis of the studies' shortcomings and case law, the Court first should briefly give the background on the tests and the validity studies. In 1936, USS began a program to train its own crafts workers. In 1940, a young Ph.D., Stanley Seashore, developed the first apprentice battery of three tests. By 1960, the first battery in this case was in use. This consisted of the Wonderlic Personnel Profile, which was intended to test general mental ability; the Bennett Mechanical Comprehension Test, theoretically to measure general mechanical ability; and the Revised Minnesota Paper Forms Board Test (RMPFB), to evaluate ability to visualize and manipulate objects mentally. Dr. Seashore devised a standardization of the three scores, based on a passing level of zero. Each test score was converted and all were summed. Applicants with a total of zero were judged to have passed the battery.

In 1970, the South Works study was undertaken, and the results were published in 1973. David Braithwaite, USS' director of corporate employment, testified that the validation was begun partially in response to the Union's pressures for job-related tests, partially in compliance with evolving government regulations and partially in response to the Griggs decision, which discussed and condemned the Duke Power Company's usage of the Wonderlic.[25] The South Works and Fairless I studies were conducted on this 1960 battery.

On November 21, 1973, the battery was changed to reflect the results of the South Works study. The Wonderlic was dropped and was replaced with the Differential Aptitude Test—Numerical Ability (DAT-NA), which was specifically aimed at testing *1331 mathematical abilities. The Fairless III study was conducted to validate the new battery. In order to pass this test group, an applicant had to receive passing scores on each test individually, rather than the battery as a whole.

All three of the validity studies are "criterion" validity studies. This is one of three methods of test validation recognized by the APA and mentioned in Washington v. Davis, 426 U.S. at 247, n.13, 96 S.Ct. 2040, n.13, 48 L.Ed.2d 597. A criterion validity study is one using some measure of job performance or success as a standard by which to correlate success on the test. Therefore, if a good test score correlates with success in the job for which the test selects individuals, then the test is valid or "job-related." In the three studies, the criteria used were grades in classroom apprentice training, and four supervisory ratings of on-the-job training after an average of one year in the apprentice program.

F. Samples

The three tests were performed on different samples of the apprentices. The South Works sample was the largest, consisting of 142 apprentices (91 white, 42 black, nine Hispanics). Fairless I included 122 whites, five blacks and one Hispanic. Fairless III was the smallest group studied, consisting of 45 whites, two blacks and one Hispanic. All of these were persons who were members of the apprentice program and thus had been selected out by the test battery.

Plaintiffs argue specifically that the South Works sample is unrepresentative because it is a concurrent study, which compares tests given to those already on the job with their performance, as opposed to a predictive study, which compares applicants' results with their later job performance. In Albemarle Paper, the court criticized that validity study since it dealt only with job-experienced workers. However, the Court finds that in the wake of that opinion, the courts have split on the appropriateness of concurrent validity studies. The EEOC guidelines appear to sanction them, as was observed in League of United Latin American Citizens v. Santa Ana, 11 EPD ¶ 10,818 at 7429. Courts have accepted use of concurrent studies, as in City of Chicago, 549 F.2d at 433, n.21, while noting that the predictive model is preferred. See also APA Standards at 26. In Vulcan Society of N. Y. Fire Dept. v. Civil Service Comm'n., 490 F.2d 387 (2d Cir. 1973), that court hinted that it found predictive studies to be more probative, but refused to limit the trial court to use of one method over another. Even Dr. Kirkpatrick conceded that a concurrent study is professionally acceptable. However, some courts have carried their preference for predictive analysis to rejection of concurrent studies. United States v. Georgia Power Co., 474 F.2d 906, 912 (5th Cir. 1974); Markey v. Tenneco, 439 F.Supp. 219 (E.D.La.1977).

This Court will choose what it sees as the middle ground enunciated in Santa Ana, supra at 7428. There, the court relied on plaintiffs' expert in this case, Dr. Kirkpatrick. The evidence was very different regarding the Santa Ana test than has been presented on the USS battery. In Santa Ana, Dr. Kirkpatrick testified that many of the test questions were knowledge questions, and that "those people with a year's experience on the job would as a matter of course be in a better position to answer those forty-five items than unexperienced applicants." The court there concluded that where a test "placed a premium upon experience," a sample of experienced employees could not be representative of applicants. This is what this Court believes is the true meaning of the Albemarle Paper language.

Here plaintiffs have not presented any such affirmative evidence to show that experience helps an applicant pass the test battery. Dr. Kirkpatrick did not so testify. The South Works study contained a sample of applicants, whose mean scores were lower than those of the apprentices. However, no one testified that these differences were statistically significant. From the lack of any expert testimony on these facts, the Court must assume that the importance of job experience that Dr. Kirkpatrick found in Santa Ana was not present here, and that the differences in mean scores were in fact not significant statistically. Therefore, *1332 on the basis of the use of a concurrent study method, the Court will not hold the South Works sample unrepresentative.

Plaintiffs also argue that the three studies' samples do not conform with the guidelines' requirement that the samples studied be racially "representative" of the applicant pool to whom the test will be administered. The Court concludes that under the EEOC or DOJ guidelines the samples were improper on a racial basis. Section 1607.5(b)(1) of the EEOC guidelines requires that the sample be "representative of minority groups currently included in the applicant population," where the test is given to persons already on the job. This applies to the South Works study, as a concurrent validity study. The sample there was over one-quarter black, but there is no indication that this figure was "representative" of the applicant pool at South Works, where the apprentices were 36% black. The two Fairless studies, which were predictive validity studies, were approximately four percent black. Defendant has no evidence of the number of blacks in the apprentice program at Fairless in 1975, nor of the number in the candidate group. Plaintiffs' evidence showed that in 1968 and 1969, the apprentice program was 2.3% black. The EEOC section requires that predictive studies have samples representative of the minority population available.[26] Similarly, the DOJ guidelines, § 12(B)(4), state:

"The sample subjects should insofar as feasible be representative of the candidates normally available in the relevant labor market for the job or jobs available, and should insofar as feasible include the racial, ethnic and sex groups normally available in the relevant job market."

The DOJ guidelines state that it is desirable[27] for the study to describe how the research sample compares with the racial composition of the relevant labor pool. § 13(b)(6).

The Court has not been supplied with any information in the studies of the racial composition of the normal applicant pool. If the Court takes the available information on the number of apprentices in the program and assumes that blacks apply to the program in at least the same percentage as they are present in it, the South Works sample is seriously underrepresentative and the Fairless studies are somewhat overrepresentative. However, since this is merely speculative on the Court's part, it does not decide the question of the racial representativeness of the sample.

The DOJ guidelines specifically do not require an employer to hire or promote persons for the purpose of being able to conduct a study. § 12(B)(1). Certainly, if there were not enough blacks to make a representative sample feasible, this Court would not require it. At South Works, where the appearance of serious underrepresentation is most convincing, blacks were present in the apprentice program in large numbers. Under the EEOC guidelines, § 1607.4(c), it is the responsibility of the person claiming the absence of sufficient minority samples "to positively demonstrate evidence of this absence." Furthermore, the requirements of a racially representative sample are described as "minimum standards" which "must be met in the research approach and in the presentation of results . . ." 29 CFR § 1607.5(b). In the South Works study, this burden was not met. Furthermore, in the other two studies, *1333 the company did not put forward evidence that the sample was racially representative. Since the evidence upon which the Court might infer that the samples were or were not proper is absent from the record or else greatly removed in time from the two Fairless studies, this casts great doubt on the propriety of the samples.

Looking at the evidence which implies that the South Works sample is racially underrepresentative, and the lack of any contrary information or evidence that it was not technically feasible to have a representative sample, the Court concludes that defendant has not met its burden on that study. Since the only evidence bearing on the propriety of the Fairless samples is from 1968-69 and the studies were performed in 1975, the Court does not find that this evidence alone meets defendant's burden of proving that it has met the guidelines. Albemarle Paper, 422 U.S. at 435, 95 S.Ct. 2362. Since the Court concludes from the evidence, or lack of it, that the Company has not met the guidelines as a whole on this issue, and it finds that they are reasonable requirements for such studies, it must hold that the samples are not appropriate on a racial basis. United States v. City of Chicago, 385 F.Supp. 543 (N.D.Ill.1974).

G. Job Analyses

In order to conduct a proper criterion-related study, there must first be a proper job analysis to determine appropriate measures of job performance. These job analyses are required so that the study's author may select the most important behaviors or measures of job performance for correlation to the test results. In United States v. City of Chicago, supra, the appellate court rejected a study which had no job analyses in it. In James v. Stockham Valves, that appellate court sent a study back to the district court for further findings on the adequacy of the job analyses. The EEOC guidelines require "careful job analysis." 29 CFR § 1607.5(b)(3). See also DOJ guidelines, § 12(B)(2).

In the three studies at bar, USS has not developed any specific job analyses for use in the validation studies. Instead, the authors have adopted existing corporate job descriptions and apprentice training syllabi. The Court finds first that the South Works job analysis is clearly inadequate. It is important to remember that all three of these studies seek to correlate scores on the apprentice test battery with performance at the end of a year of training and classroom instruction, rather than to ultimate performance as a journeyman craftsman. Dr. Ramsay relied on two documents as his job analyses. First, he incorporated the syllabi of the apprentice classroom training. However, these do not contain any description of each course; it is merely a listing of each course to be taken. Apprentices are to take 16 different courses in about 200 hours. There is no designation of the time for each, or the relative weight of each course within those lists. There is also no indication as to whether these hours are all to be completed in the first year of the apprenticeship.[28] The wiremen, along with four other electrical crafts workers, are to take an unknown number of hours of "AC and DC Fundamentals." That appears to be the only first-year classwork for those crafts. Only the roll-turner courses were broken down individually by hours and year. The Court does not find that these are adequate descriptions of the first-year apprentice course work. Assuming that the Court could ascertain from the course titles what each one covered, the Court is still totally in the dark as to how much is covered the first year and the relative importance of each course.

Even if this coursework analysis were adequate on the South Works study, there is no analysis of the first year on-the-job training. Dr. Ramsay included in the study a copy of the corporate master Job Description and Classification (JDC) manual describing the journeyman position. First, a job "description and classification" is included. This contains a few sentences on: primary function, tools and equipment, materials, *1334 source of supervision, persons supervised, and working procedure. Second, a job "classification" is included. This is a list of 12 categories, ranging from level of training needed for the job to the externalities of the job. These are each assigned a numerical rating, which are totalled to compute the job class, which sets the worker's pay rate. Neither the descriptions or classifications from the JDC indicate which skills are expected to be mastered by an apprentice by the end of the first year. Indeed, the JDC does not appear to have any relation to an apprentice's work at all. The Court finds that this information, while it may constitute a superficial description of a journeyman's work, certainly does not constitute an analysis of a first-year apprentice's job.

Therefore, it is clear that the South Works study does not contain a "careful job analysis" of the job studied, that of a first-year apprentice. The course outlines are mostly uninformative and there is no information regarding the on-the-job work. Therefore, as in United States v. City of Chicago, the Court could reject the South Works study as inadequate on the lack of proper job analyses alone.

For the Fairless studies' job analyses, Dr. Wolz included not only excerpts from the JDC manual, which are irrelevant to the apprentice program, but also the USS Apprentice Training manual. This is a detailed description of 14 apprenticeship programs.[29] This information is broken down into three areas: trade analysis, job performance and related instruction. Again, the only descriptions of the first two elements is that which an apprentice should know at the end of his full apprentice term of three or four years. These read like a final exam for the apprentice program. To term these listings as job analyses of first-year apprentices is as inappropriate as judging a first-year law student on results of a bar examination, if taken at the end of the first year. Certainly, the law student could use some of his knowledge and skills acquired during his first year, but it would not be an accurate reading of first-year performance, because it would be beyond his training. The Court cannot accept these "final exams" as descriptions or analyses of on-the-job work of the first-year apprenticeships. On the classwork issue, the Court has again been supplied with a bare list of courses taken, with no indication of how much is covered in the first year. Without some indication of the relative importance of each, the Court is not really given an "analysis" of the apprentice classroom work. In Watkins v. Scott Paper Co., job analyses were made by a study's author by observing the work at the mill. The court of appeals, which did not have the analyses available (they were either lost or destroyed) noted that the district court could have determined from the evidence what "particular skills" were needed. Here, there is no evidence that any participant in drafting the studies has observed first-year apprentices, although apparently the authors did observe crafts workers. Also, there is no basis on which the court may determine any particular skills needed by the apprentices-in-training. Therefore, on the basis of these inappropriate analyses, the Court could also reject the two Fairless studies on this ground alone.

Since the job analyses are either completely deficient or unrelated to the first-year apprentice work, the Court could stop its analysis of the validation studies at this point. However, in order to develop a full record, and to complete its "probing judicial review" of the studies, the Court will continue its findings of facts and conclusions of law.

H. Pooling the Crafts

Perhaps one reason why the job analyses are inadequate and the authors of the three studies could not write their own is that the samples used contain a large variety of crafts. To the Court's eye, there is no broad basis of similarity from which a single job analysis could be composed. From bricklayer to electronic repairmen, from welder to rigger, the USS psychologists *1335 "pooled" the crafts for study. Although defendant's expert, Dr. Schmidt, opined that this lumping-together of seemingly disparate jobs had no effect on the studies' outcome, the Court must view this procedure very skeptically.

In other reported cases, such as Watkins v. Scott Paper, supra, courts have criticized efforts to generalize all craft skills in a single unit. In Watkins, the defendant relied on an informal, unwritten job analysis by an expert as justification for extending a validation study for three crafts to all maintenance crafts. The court there rejected such attempts, saying "it seems unwise to indulge a presumption, admittedly founded upon expert testimony, that all maintenance crafts require the same job skills and that a test significant or valid for one is valid for another." 530 F.2d at 1188.

USS' "pooling" was also the basis for the arbitrators rejecting the tests as "job-related" under the contract language. They stated:

"(N)o detailed review of the respective 26 separate [crafts] job descriptions is necessary to establish that widely different skills and ability are required among them." Award, at 32.

They concluded that since the tests were not "developed in light of the specific requirements of [each] given craft," they could not be job-related to any single craft under the contract's language.

The EEOC guidelines allow tests validated for one job to be used for another only if there are "no significant differences" between the jobs. Albemarle Paper Co. v. Moody, 422 U.S. at 432, 95 S.Ct. 2362; 29 CFR § 1607.4(c)(2). It follows logically that dissimilar jobs cannot be grouped together in a sample for validation. Here, the USS studies include various crafts jobs—16 in South Works, 12 in Fairless I and eight in Fairless III. Not only are the jobs in each study different among themselves, but the three studies do not all contain the same jobs. Also, there are crafts jobs at Fairless, which were not included in any of the three validation studies.

Proper job analyses are required if a company seeks to use tests for similar jobs not included in the validation study. In Albemarle Paper, the Supreme Court held that without job analyses, it could not determine whether the jobs were similar. Therefore, the failure of the job analyses undermines any effort to justify this "pooling." However, even if here the Court were to accept USS' job analyses as appropriate for apprentices, the conclusion of similarity of apprenticeships in the study is not well-founded. As in Watkins, all the Court really has to support this conclusion is the statements of experts. In each study, the authors made statements that the crafts apprenticeships were similar. However, the evidence cited to support those conclusions reveal mostly dissimilarity.

In the South Works study, Dr. Ramsay did not attempt directly to demonstrate the apprenticeships' similarities. He simply made a conclusory statement that "based on similarity of first year apprentice instruction" he decided to combine the data. Dr. Ramsay referred to the South Works apprentice classroom outlines as evidence of the alleged similarity. No description was attempted of on-the-job instruction, but he included the Master Job Description and Classification (JDC) ratings for each craft as a journeyman. As the Court held above, these do not inform the Court as to what is included in first-year on-the-job training, which was the subject of the validity studies. Furthermore, since the JDC descriptions are unique for each craft, the Court finds they are no support for Dr. Ramsay's conclusion of similarity in on-the-job instruction. Therefore, the Court must assume that he based his conclusion of similarity on an analysis of the classroom content. However, after examining each of the course outlines, the Court finds no across-the-board similarities in course content of all the apprenticeships. The apprentice courses break down into two different types: one group requires basic skills in mathematics, print reading and drawing[30]; and the other does not require all *1336 these skills but requires mastery of fundamentals of electricity.[31] These two different types of apprenticeships requiring different basic knowledge, coupled with the wide variety of other courses offered in the first year, contradict Dr. Ramsay's conclusory statement of "similarity."

In the Fairless studies, Dr. William Wolz, the author, stated:

"All apprenticeship programs are similar in the sense that each craft has a period of related technical course instruction, each has a series of job performance requirements that must be performed satisfactorily, and each has a trade analysis requirement which includes knowledge of tools and machines used, materials used and common operations performed." USS Exhibit 105-2 at 2.

This statement was repeated in Fairless III. To support a conclusion of similarity, the studies referred to the descriptions of course content and on-the-job requirements in the corporate Apprentice Training manual. The Court has performed an analysis of the classroom course content in Table 1 which reveals minimal similarity in content. The 13 different apprenticeships included in the two studies share only two subjects— print reading and practical arithmetic. The differences in basic courses taught far outweigh the similarities. Therefore, the "similarity" is not in the classroom training. Also, the required on-the-job trade knowledge and skills reflected in the Apprentice Training manual vary so widely that the Court cannot even begin to compare them. Dr. Wolz's conclusion resolves itself to this: all apprentices must have classroom knowledge, on-the-job knowledge and specific trade skills, but the content of these three elements varies significantly.[32] To say that such a statement meets the "similarity" requirement of the guidelines would be to reduce it to a meaningless statement. The Court interprets the guidelines as requiring content or skill similarity in order to combine jobs for validation. Since the various apprentice jobs do not reflect any such singularity or commonality of content, it would appear that "pooling" would not be proper.

*1337
                                                      TABLE 1
                             Analysis of Course Similarity in Fairless Apprenticeships Studied
                 Arithmetic    Drawing   Print Reading   Physics   Chemistry   Geometry-Trig.   Mechanics   Basic Electricity
-----------------------------------------------------------------------------------------------------------------------------
Fairless I
Boilermaker          x                        x                        x
Bricklayer[*]     x                        x
Wireman              x           x            x                                      x              x                x
Inst. Repair         x           x            x                        x             x              x                x
Machinist[*]      x                        x                                      x              x
Millwright[*]     x           x            x                                                                      x
Mobile Equip.        x                        x             x                                                        x
Motor Insp.[*]                                                                                                    x
Pipefitter           x           x            x                                      x              x
Rigger                                         NOT IN APPRENTICE MANUAL
Roll Turner[*]    x           x            x             x
Welder[*]         x           x            x                        x             x
-------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------
Fairless III
Blacksmith           x           x            x                        x             x

*1338 The Board of Arbitration, in considering USS' testing program also noted the differences between the crafts that militated against "pooling." The arbitrators observed that some crafts, such as painter, required far less hours of training than others. "At one extreme perhaps, in terms of skill and training requirements, are the Painter and Coremaker jobs while the Electronic Repairmen and Toolmaker jobs are at the other extreme." Award at 32.

However, Dr. Schmidt presented evidence which indicated that this "pooling" problem was theoretical only. He performed three statistical analyses which he said supported a proposition that the "pooling" did not affect the results of the studies. These three analyses were designed to show that there were not statistical inter-craft differences in validities, and that the results were not inflated or manipulated by this grouping. These analyses were buttressed by two sub-samples in the Fairless I study, in which the test scores of both millwrights and combined electrical crafts correlated to classroom grades and potential performance. These validity similarities indicate that the distinct differences in the crafts' content may not reflect themselves in varying levels of job-relatedness.

Dr. Wolz, in the Fairless III study, acknowledged that there were differences in the crafts. He said that separate validation studies were then underway for the different apprenticeships. Two years elapsed between that promise and the filing of defendants' pretrial statement with these "pooled" validations. It appears to the Court that USS could have presented stronger evidence supporting the "pooling," such as some individual validity studies, than Dr. Schmidt's theoretical reconstructions. The Court is impressed with his expertise and his analyses. However, the failure to show any similarity of job skills, under the guidelines, is still most troublesome. Therefore, although the Court would not base its ultimate decision as to those crafts included in the study solely on this issue, the Court will consider the "pooling" as a weakness in the validity studies. This doubt might have been conclusively rebutted by post-1975 validations conducted for each craft, but USS chose not to do so. For the Fairless crafts jobs not included in any validation "pool," the Court holds that the test battery has not been validated. This failure to include those crafts is conclusive against USS, since the Court has rejected the conclusory evidence of all-craft similarity. Furthermore, it is clear that the tests have not been validated as to use in Metallurgy and Inspection. Judgment on those jobs will clearly be for plaintiff.

I. Transportability

Both guidelines allow for use of studies "borrowed" from another unit, such as the South Works study, under certain conditions. The EEOC guidelines, in § 1607.4(c)(2), allow one plant to use another's study where "no significant differences exist between units, jobs and applicant populations." The DOJ guidelines permit cooperative use of a validation study, under § 6(b), if "the studies pertain to a job which has substantially the same major job duties as shown by appropriate job analyses." Psychologists call this "transportability."

Under both guidelines, the inadequacy of the job analyses in the two plants' studies is a fatal flaw. In Albemarle Paper, the Supreme Court held that the lack of job analyses meant that there was "no basis" for concluding that there were "no significant differences" between jobs under § 1607.4(c)(2). 422 U.S. at 432, 95 S.Ct. 2362. The DOJ guidelines, therefore, explicitly require job analyses to be the bases for determining if jobs in the two plants are similar. Here, where the Court has rejected the job analyses as inadequate, there remains no competent basis for comparing the two studies.

If, however, the Court were to accept the job analyses as adequate, and to assume that the JDC excerpts describe apprentices as well as journeymen crafts workers, there is still a major problem. Although there are variations in the apprentice classroom requirements between the South Works and the Apprentice Training syllabi, the Court is willing to accept the company's argument that apprentices for each craft at any USS plant are similarly trained. The evidence *1339 showed that a crafts position is "standard-rated." Its job description is part of the industry's Basic Labor Agreement and cannot be altered at the local level. David Braithwaite, USS' corporate personnel officer, testified that a fully-trained crafts worker was expected to be able to work at any USS plant without further instruction. Therefore, the Court may infer that training for this job generally is the same throughout the corporation. Cf. Friend v. Leidinger, supra. However, USS has failed to show that there are no differences between the applicant populations at South Works and Fairless. As noted above, USS has not provided the Court with any information regarding the applicants whatsoever. This failure, just as the absence of the job analysis in Albemarle Paper, leaves the Court with no basis for drawing conclusions on the similarity of the applicant pools. Without such similarity, under § 1607.4(c)(2), the South Works study cannot be used.

USS seeks to use the South Works study only as corroborative evidence of the two Fairless studies on most issues. However, it was included in this litigation primarily because of the large black sample, which enabled the author to test for racial fairness. A study of the tests' racial fairness, or a "differential validity analysis," is designed to discover if the validation of the study is racially biased. The EEOC guidelines, 29 CFR § 1607.5(c)(2), require that tests be validated separately for minorities and for whites. "Where a test excludes minorities at disproportionate rates, those who use the test must show that its use enhances their abilities to predict job performance for minorities as well as whites and thus legitimately excludes minorities because of deficiencies in their qualifications for the job." United States v. City of Chicago, 549 F.2d at 433. That court said that the requirement to test for racial fairness was not "a mere technicality." Ibid. The DOJ guidelines also have provisions requiring differential validity analyses, § 12(B)(7).

However, a racial fairness analysis, which is actually a "mini-validation" study computed separately for whites and for minorities, cannot be performed where the minority sample is too small. The EEOC guidelines do not specify what is considered an adequate sample size, but the DOJ guidelines, at § 12(B)(1), indicate that a sample of smaller than 30 persons is unacceptable. Therefore, only the South Works validity analysis contained a sufficiently large minority sample to conduct the differential validity analyses.

The absence of a differential validity study in the Fairless studies, and the use of a "borrowed" study to establish racial fairness, is treated differently under the two sets of guidelines. Under the DOJ guidelines, where a user is unable to conduct its own internal analyses for racial fairness, it may rely on the analyses of others if the validity studies are otherwise properly used under the guidelines as "borrowed" studies. However, under the EEOC instructions, § 1607.7, a more stringent set of rules for "transportability" is called into play when a test user seeks to use another's study in lieu of its own, rather than merely as corroboration. It must then substantiate "in detail" job comparability and must show the absence of "major difference in contextual variables or sample composition which are likely to significantly affect validity." This section does not allow the party seeking to rely on a borrowed validation to benefit from any inferences of assumptions of similarity. Since the Court has only been able to find a hint of task and content similarity through inference, and the evidence on similarity of the sample reveals distinct difference between the South Works and Fairless groups,[33] this higher standard clearly has not been met.

*1340 What this means, to this Court using the EEOC guidelines, is that the South Works study's evidence of racial fairness cannot be considered probative of the fairness of the tests at Fairless. It is not "transportable" to the Fairless situation, and cannot be used as primary evidence. The Court finds that the higher standard of the EEOC guidelines is appropriate when applied to the problem of racial validation, which is so important in judging the propriety of a test's use when it has a disparate impact. Therefore, even if the Court were to accept the South Works study as corroborative on other issues, it cannot accept it as the sole evidence of the tests' racial fairness. By rejecting this evidence, the record is now barren of evidence of the tests' racial fairness.

Under both the DOJ and EEOC guidelines, therefore, the failure of the job analyses require rejection of the South Works study as evidence of test validity at Fairless, since there is no basis for concluding job similarity. Even if job similarity were assumed, USS has not presented evidence that the applicant pools are similar, as required under § 1607.4(c)(2). Even if the Court were to accept the South Works report as corroborative evidence, the racial fairness analyses, offered as primary evidence on that crucial issue, cannot be applied in Fairless' defense, since the study clearly does not meet the higher requirements for transportability of § 1607.7. Since the South Works differential validity study must be regarded as irrelevant, there is no competent evidence of the tests' racial fairness at Fairless, which is a serious defect in USS' validation efforts.

J. Criteria

All three validation studies in evidence are criterion-related analyses. A criterion-related study involves the correlation of job performance with success on an examination. Firefighters Institute v. City of St. Louis, 549 F.2d 605 (8th Cir. 1977). In order to correlate job performance with test results, a validation study must use certain "criteria," which are merely indices of successful job performance. United States v. City of Chicago, 549 F.2d at 430. Since the analyses of the tests' relation to the job depend on these criteria, as Judge Pointer noted, a criterion study "is subject to any limitations which those same criteria have in either predicting or assessing job performance." Seibels at 6799. As the APA standards succinctly state: "The merit of a criterion-related validity study depends on the appropriateness and quality of the criterion chosen." Quoted in City of Chicago, 549 F.2d at 431.

For these studies, the authors used five measures of job performance. The first was grades in the classroom work related to each apprenticeship. The others were specially-constructed supervisory ratings made once or twice during the study period by the apprentice's supervisor. The supervisors were given forms and asked to rate the individual's quality of work, quantity of work, potential for advancement and ability to work without supervision.

No description of the grading process or standards is included for the classroom work. The supervisory rating forms are attached to the studies and shed light on the other criteria. Factors included in the quality rating are: cost of inspection, number of errors, and wasted material. On the quantity rating, the foreman is to consider "the number of assignments worked on per day or the number completed within the time available." To judge ability to work without supervision, the rater is to evaluate "the amount of time needed to explain an assignment to an individual, the amount of direct supervision necessary while he is working on that project and the amount of inspection needed upon completion of that project." For the potential to advance, foremen were given this statement as a frame of reference:

"Assume that you have transferred into a new department. Your new supervisor has asked you to try to bring with you some of the personnel who do or have worked for you. You have been instructed by your new management to identify *1341 those individuals who show the greatest potential for moving into higher jobs, potential reflected by arithmetic skills, verbal skills and problem-solving ability.
Try to focalize (sic) your evaluations on the characteristics underlined above."

After examining these five criteria, the Court finds them to be inadequate, inaccurate and overly subjective. Under both guidelines and the case law, the Court finds that they fail to meet minimum standards for a criterion-related study. Therefore, this failure of the criteria, which by itself would render the studies meaningless, is another element in the Court's overall rejection of the studies.

First, both guidelines require that the criteria be generated from the job analyses. Since the job analyses are inadequate, it follows that any criteria developed from them would be similarly inadequate.[34] However, even assuming that the analyses were proper, there is no evidence that these criteria, with the exception of the classroom grades, relate in any way to the job analyses. None of the documents attached to the studies (the JDC excerpts, course outlines, and the Apprentice Training manual) use any of the terms of the supervisory ratings. The Apprentice Training manual divides apprentice training into three areas: trade analysis, which is the apprentice's knowledge of job terms—names of tools, materials and process; job performance, which is judged in terms of on-the-job skills; and related instruction, which is the classroom course element. Only two of these three, related instruction and job performance, are reflected in any way in the selected criteria and only in the most general way. The JDC does not appear to be the source of the criteria either. Basically, the criteria appear to be "common sense" measures, which could be applied to any job requiring some level of productivity, skill and responsibility. The supervisory ratings are not related to the specific skill of an apprentice or crafts worker, and could be applied to any upper-level P & M job. The guidelines require more than this for criteria to be appropriate for a particular job. This is the reason the job analyses must first be performed.

Second, both guidelines require that the criteria selected must represent significant aspects of the job itself. The DOJ guidelines, at § 12(B)(2), state that criteria must reflect "critical or important job duties, work behavior or work outcomes." The EEOC guidelines are similar, saying simply that the criteria must represent "major or critical work behaviors." 29 CFR § 1607.5(b)(3). The Court does not find that any of the USS criteria meet this requirement. In Seibels, as was done here, the study authors developed special ratings for the validation analysis. The ratings referred to personality characteristics, job knowledge and specific abilities found through the job analyses process to be relevant to job performance. Here, the criteria do not evaluate skills or job knowledge (except for classroom grades), but only test these in the most indirect way, by measuring a generalized ability to do the job.

The first criterion, grades in related instruction, suffers from certain problems. First, it is almost completely undefined. The Court has no way of knowing if it is based on paper-and-pencil tests in the class-room or more subjective criteria. The only description of the grades is in the South Works study which stated that they were averages of the "numerical" grades for each course. No explanation of what the course grades were based upon is included. Without more information, the criterion is meaningless. It appears that this criterion was chosen primarily because of its availability. *1342 This reason for selecting a criteria was criticized in the City of Chicago case, where the Court quoted from the APA standards, 549 F.2d at 431:

"[T]he logic of criterion-related validity assumes that the criterion possesses validity. All too often, tests are validated against any available criterion with no corresponding investigation of the criterion itself . . . Criterion-related validity studies based on the `criterion at hand' chosen more for availability than for a place in a carefully reasoned hypothesis, are to be deplored."

Another problem with use of course grades, if they are dependent solely upon objective paper-and-pencil tests, is that they may result in artificially high correlations with the test battery.[35] In Seibels, Judge Pointer cautioned:

"The emphasis in the (police) academies on paper-and-pencil multiple choice items, while providing objectivity, may also reflect a relationship to the paper-and-pencil screening exam not found in job performance. These concerns should cause one to be cautious in making non-empirical judgments about the usefulness of relative academy grades as a criterion measure."

Plaintiffs have argued that the use of related instruction grades places undue emphasis on verbal abilities. Dr. Schmidt, USS' expert, admitted that certain classroom instruction might be extraneous or require verbal skills not necessary to the job. This is borne out by the final correlations. The only criterion to which the test battery consistently correlated was the grades measure. In the Fairless study, Dr. Schmidt testified, the relationship between the grades and the supervisory ratings was not statistically significant and were only in the .20 to .40 area. As the Court discusses further below, it would thus appear that the grades and test battery results, while related to each other, may not be related to actual job performance in a practically significant way. This might be explained by the need for verbal ability to do well both on the battery and in the classroom, but no need for such ability to do well on the job. As the Second Circuit appellate court said,

"Performance on a written multiple-choice examination [for selection] may well correlate quite highly with the ability to learn certain skills but not with the ability to perform them on the job." Vulcan Society, supra at 396.

The Court must conclude that these results bear out Judge Pointer's fears by demonstrating that the classroom grades are inappropriate as a job performance measure.

In Seibels, the grades involved not only paper-and-pencil multiple choice tests, which may be an objective criterion, but also the instructors' subjective appraisal of the student's performance. Therefore, Judge Pointer reasoned, the classroom grades were as prone to the influence of bias, when uncontrolled, as a supervisory rating. Here, defendant has not described what factors were involved in the grades. The Court cannot judge their objectivity or lack of same and indeed is unable to determine this criterion's propriety at all. Defendant bears the burden of explaining and justifying the criteria used, stating how the grades were arrived at and what standards were used to avoid bias. Unlike the supervisory ratings, where at least foremen were given a statement of what each rating *1343 meant and a chart by which to score, there is no evidence here that course instructors' grades were standardized in any way. Therefore, the Court must assume that teachers were not regulated in their grading, which leaves a very large opportunity for bias to affect grades, even assuming uniform and objective tests. Since the grades are totally undefined and have the danger of being tied inordinately to verbal, non-job-related skills and are apparently unchecked for bias, the Court must reject them as inappropriate criteria.

The supervisory ratings pertain to attributes rather than skills. Attempting to evaluate such factors may result in highly subjective and unreliable results. The EEOC guidelines state, 29 CFR § 1607.5(b)(4):

"In view of the possibility of bias inherent in subjective evaluations, supervisory rating techniques should be carefully developed, and the ratings should be closely examined for evidence of bias."

This sentiment is echoed in the DOJ guidelines at § 12(B)(2). Dr. Schmidt judged the supervisory ratings to be professionally sound. Dr. Kirkpatrick, however, stated his opinion that criteria such as those used here were not highly regarded among psychologists because they dealt with abstract concepts and were very subject to biased impressions. In the South Works study, the author performed an analysis of the differences on ratings by racial subgroups. The differences between blacks' and whites' ratings for quality of work were found to be statistically significant. On all ratings, black had a lower mean score than whites, although not statistically significant. Of the blacks, 26.2% were found to be "poor apprentices," while only 12.4% of the whites were so rated. This is not repeated in either Fairless study, since the sample sizes are so small. Dr. Ramsay concluded that the differential in supervisory scores bore out the validity of the tests: blacks score lower than whites on the test and blacks score lower than whites on the supervisory ratings. Therefore, he concluded, the test properly predicts the results of the ratings. Another conclusion that might be drawn from this data is that the supervisory ratings have a similar disparate impact upon blacks, being subjective and similarly biased. In Watkins, 530 F.2d at 1189, that court agreed with this conclusion, also drawn by Cooper and Sobol in Seniority and Testing and Fair Employment Laws: A General Approach to Objective Criteria of Hiring and Promotion, 82 Harv.L.Rev. 1598, 1662 (1969). They said:

"[S]upervisory ratings . . . which are possibly the single most common performance measure used in validity studies, are subject to personal prejudice. When test scores are correlated with such ratings, the validation, if it can be called that, is of questionable value and may simply prove that the test has the same bias as the supervisors."

In spite of these problems with subjective ratings which disparately impacted blacks as shown in the South Works study, defendant took no precautions in the two Fairless studies to avoid bias. The same categories and rating form were used in the Fairless study. In Seibels, at 6809, n.23, Judge Pointer said that "[t]he possibility of bias is of particular concern where subjective evaluations are used as criteria and there are significant differences in those measures for different racial groups," citing DOJ § 12(B)(2). He criticized those studies for failing to use statistical techniques to control for racial bias. In League of United Latin American Citizens v. City of Santa Ana, the court noted other ways in which supervisor bias could be controlled.

"Supervisors were trained to disregard general impressions and to concentrate instead on nine `cognitive' factors believed to represent critical job dimensions. No one supervisor established a rating. Instead, supervisors met together *1344 to discuss each rating and to resolve differences in judgment." 11 EPD ¶ 10,818 at 7429.

Furthermore, additional factors were included in the ratings form to ferret out evidence of bias, to make sure that an employee's personal relations did not determine his rating. Here USS has not made any real effort to avoid or control for supervisory bias in the ratings. Although, as the court noted in Santa Ana, the guidelines do not specifically require any particular type of bias control, this Court certainly believes that when supervisory ratings reflect disparities, a greater effort must be made to avoid bias in the evaluations.

A second problem with the supervisory ratings is that they are too generalized. Repeatedly, courts have condemned subjective ratings which do not analyze specific skills, but simply judge whether an employee is "good." In Albemarle Paper, no instructions were given to raters except to determine which employee "did a better job." The High Court said that that was an extremely vague standard, "fatally open to divergent interpretations." 422 U.S. at 432, 95 S.Ct. 2362. The USS raters were given somewhat more defined criteria, but they were still, in this Court's opinion, open to divergent interpretations. In Watkins v. Scott Paper, that court condemned a supervisory rating system that did not require supervisors to judge specific skills or work functions. Although somewhat better than the ratings in those cases, the Court finds that the ratings used, most notably that of ability to work without supervision and potential to advance, are so generalized as to be essentially asking if an employee is "good," which is an invitation to discriminatory evaluation.

The Court must comment on the criteria's reliability. This is important in deciding if the ratings were standardized or fluctuated because of supervisor bias. The South Works study showed that the supervisory ratings' reliability was almost perfect. This was judged by comparing the first and second round of ratings, which were taken only two weeks apart. In the Fairless I study, Dr. Wolz performed a reliability analysis for 27 of the apprentices, comparing their ratings with those given them in conjunction with the Fairless III study three months later. Here, the intra-rater reliability was at the lower edge of the acceptable range with the exception of the potential index. Plaintiffs argue that the high level of reliability shown in the South Works study was caused by the memory of the raters who merely repeated the earlier score, since the two scores were only two weeks apart. This is supported, in the Court's opinion, by the drastically different results of the Fairless I reliability analysis.

Plaintiffs argue that intra-scorer reliability analyses (comparison of two scores by a single rater) is not sufficient evidence. They complain that the validity studies must also demonstrate inter-scorer reliability (comparison of two different raters' scores). Dr. Schmidt at first stated that the failure of the studies to include an inter-rater analysis made them professionally inadequate, but later appeared to retract that statement. Reliability analysis is not required under either of the guidelines, but as Judge Pointer noted, a measurement that is not reliable is not valid for any purpose. Seibels, 13 EPD at 6797. Normally, the Court would be satisfied with the single check for reliability. However, where the results are so widely divergent as those between the South Works and Fairless I studies, another test of reliability might have been advisable. USS contends that inter-rater reliability was impossible because only one supervisor works closely enough with each apprentice to rate properly. If that is true, then USS should have conducted a more thorough intra-rater study than the piecemeal one performed in Fairless I.

*1345 Fairless III was, by its own terms, a seriously marred study. Very few conclusions can be drawn from it because the Rolling Division supervisor rated all apprentices very highly. It is not clear if Dr. Wolz went back and reexamined Fairless I for a similar defect. At any rate, the fact that the supervisor so completely ignored the rating forms' instructions is further evidence of the ratings' subjectivity and tendency toward misuse.

The Court has concluded that the uneven reliability of the ratings, the apparent problem of following instructions as demonstrated by the results in Fairless III, and the consistently lower scores of blacks establishes that the supervisory ratings are too subjective and prone to reflect bias. It is this problem that the guidelines' authors fear. Since all criteria are defective either as unspecific, subjective, potentially biased or wholly undefined, the Court cannot accept the validity studies. As the Court said in City of Chicago, 549 F.2d at 433, "The entire rationale of a criterion-related study requires that the criterion with which test results are compared to be a good measure of job performance." Here, where they are not clear, objective, reliable or fair measures, the Court cannot rely on them.

K. Correlations

In Albemarle Paper, the Supreme Court was concerned with consistency in the correlations of test scores to criteria. Here, the battery has been viewed as a whole and as parts and correlated to five criteria.

Even if the Court were to accept the samples and criteria as proper, and to view all three studies as appropriate, the only consistency that would emerge is that the test results correlate weakly to classroom performance.

Tables 2 and 3 show the results of correlations in the South Works and Fairless I studies. The South Works study has both sets of supervisory ratings correlated separately.

[See following illustrations.]

*1346
                                                TABLE 2
                             SOUTH WORKS CORRECTIONS OF TEST SCORES WITH CRITERIA
  Test      Grades in     Potential     Quality      Quantity      Work w/o supervision   Potential      Quality     Quantity   Work w/o supervision
            classwork         I            I            I                  I                 II             II          II              II
--------------------------------------------------------------------------------------------------------------------------------------------------
Wonderlic   .161[*]     -.076         -.077        -.102              -.095             -.081          -.092       -.115            -.076
RMPFB       .177[*]      .211[*]    .207[*]   .213[*]         .127              .215[*]     .157        .202[*]       .159[*]
Bennett     .262[**]     .060          .114         .115               .044              .099           .118        .136[*]       .090
DAT-NA      .228[**]     .104          .117         .116               .094              .109           .123        .106             .115
----------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------------------
                                                  TABLE 3
                               CORRELATIONS OF CRITERIA TO AVERAGE PATTERN SCORE
           Average      Quality of     Quantity of     Work w/o supervision     Potential
           Course         Work            Work
           Grade
-------------------------------------------------------------------------------------------------------------------------
Average
Score      .26[**]    .22[*]       .24[**]           .22[**]        .27[**]

*1347 As it is clear, the results of the two validity studies do not appear to be consistent. Of course, the Court is in the position of comparing "apples and oranges," since the South Works study looked at each test within the battery, while Fairless I looked at the battery as a single unit. Dr. Schmidt attempted to construct the South Works study into a single score, but only as to the classroom criterion. Fairless III did not include correlations to each criteria, because of the sample problem of misunderstood directions. However, in that study Dr. Wolz did attempt to show that the test battery selects more "good" apprentices than "poor." Furthermore, Dr. Schmidt constructed correlations for Fairless III which showed tests correlated only to grades at a statistically significant level.

Even if the studies' methodology of sample, job analyses and criteria were acceptable, the Court would find these results to be less than satisfactory. The "spotty" correlation of the three tests in the South Works study is as inconclusive as these in Albemarle Paper. The Wonderlic only correlates significantly (or positively) to grades. The RMPFB correlates to four of the five criteria consistently at the significant level, but only once on the fifth criterion. The Bennett and DAT-NA look almost as bad as the Wonderlic, correlating to grades only (with Bennett having a single significance, not repeated, in one of the supervisory ratings).

The Fairless I study shows a consistent correlation of average test scores to all the criteria. This might indicate that, when the tests are grouped together, the test battery is related to all criteria. However, defendant's own expert, Dr. Schmidt, did not feel justified in drawing that conclusion. His opinion was limited to a conclusion that the test batteries were related to performance in classroom work.

Studying the results of South Works and Fairless I, the Court finds that Dr. Schmidt's conclusion of correlation to classroom grades is proper, given the assumption of overall study propriety. However, due to the inconsistency of the correlation to the other variables among the individual tests, the Court also finds that the batteries do not correlate to the other variables to an extent to be legally sufficient. The next question is: what is the legal significance of correlation to this single criterion?

The question is whether it is proper to validate the test battery against performance in the apprentice program, or whether the test battery should be checked against performance in the ultimate crafts jobs. In dicta in Washington v. Davis, the Supreme Court noted approvingly a judge's validation of a police test to training-course performance, rather than actual performance as an officer. The High Court commented that this relationship was not foreclosed by earlier cases and "it seems to use much more sensible construction of the job-relatedness requirement." 426 U.S. at 251, 96 S.Ct. at 2053.

However, a number of judges have read into that dicta a limitation that the training program must itself be job-related. Judge Pointer in Seibels said:

"Where, as here, new employees are required to complete special training before performing their duties, successful completion of that training may properly be used as a criterion-measure, if that is, the training is intended to and does, provide skills or knowledge needed for performance of the job."

This point of view was echoed by the dissenters to a per curiam affirmance by the Supreme Court. In NEA v. South Carolina, 15 EPD ¶ 8027 (January 16, 1978), Justice White said that Washington v. Davis "did not hold that a training course, the completion of which is required for employment, need not be validated in terms of job-relatedness." Here, of course, there is a bigger problem. Even assuming that the apprentice program as a whole is related to the ultimate job, which has not been proven, USS has shown only that a single aspect of the apprentice training relates to the test *1348 battery. The significance of this relation of the battery to grades is irrelevant unless some relation is also shown from grades to the on-the-job performance. Otherwise, as the Court explained earlier, all that will be proven is that a non-job-related battery selects applicants who are successful in non-job-related classroom courses.

USS attempted to show the importance of classroom instruction in a number of ways. Three craftsmen testified that they used knowledge acquired in class on the job. A motor inspector, Timothy Piasecki, testified that to graduate from the apprentice program, one had to pass a paper-and-pencil test on electricity, which was part of the classroom work. However, classroom training is a very small portion of total apprentice training, consisting of as little as one-eighth or less of total apprentice hours.

USS also used statistical correlations between grades and the on-the-job ratings. All the correlations were statistically significant but low.[36] In the South Works study, as shown in Table 4, all the supervisory ratings correlated with classrooms grades, but the coefficients were all less than .30. In the Fairless I studies, Table 5, the coefficients ranged from .39 to .50. As the Court will discuss more thoroughly in the next section, a low coefficient, even though statistically significant, may indicate a low practical utility. Thus, the usefulness of classroom grades as a predictor of on-the-job skills is less than average.[37]

                                TABLE 4
              SOUTH WORKS INTER-CORRELATION OF GRADES AND
                         SUPERVISORY RATINGS
                  Average Course Grade Correlated with
                         Rating I            Rating 2
--------------------------------------------------------------
Potential               .231[**]         .217[**]
Quality of Work         .268[**]         .271[**]
Quantity of Work        .228[**]         .257[**]
Work W/O Supervision    .233[**]         .274[**]
                        ---------------------------------------
-------------------------------------------------------------------------

*1349
-------------------------------------------------------------------------
                                     TABLE 5
                     FAIRLESS I INTER-CORRELATION OF CRITERIA
                         Quality      Quantity      Work w/o
                         of work      of work       supervision    Potential
-------------------------------------------------------------------------------
Avg. Course Grade       .39[**]    .41[**]    .41[**]      .50[**]
Quality of Work                       .88[**]    .88[**]      .77[**]
Quantity of Work                                    .91[**]      .82[**]
Work W/O Supervision                                                .84[**]
                       ------------------------------------------------
-----------------------------------------------------------------------

The EEOC guidelines require that "the relationship between the test and at least one relevant criterion must be statistically significant." 29 CFR 1607.5(c)(2). What USS has failed to do is to prove that classroom performance is a proper criterion, as discussed above, and that it is relevant. The EEOC guidelines require strict scrutiny if a test is valid against only one criterion, when testing is the sole selection device. Here, where USS has not shown any other selection device except personnel office subjectivity, and defendant's expert has concluded that the tests are valid against a single dimension of apprentice training only, the Court concludes that close examination is in order. Under such strict scrutiny especially, this validity study and the resultant correlation to this doubtfully relevant criterion certainly is no evidence of the test battery's job validity. As discussed above, defendant has not shown what is measured by the classroom grades criterion, or how the rating is arrived at. The criterion is potentially flawed by being too subjective or too tied to non-relevant skills. Although success in the classroom is statistically related to success in the other ratings, the correlations are so low as to have no practical significance. Therefore, even assuming that the validity studies were methodologically sound in sample and job analysis, the failure to demonstrate the adequacy of classroom performance, the only criterion related to the test battery, as a relevant measure of job success means that the studies have not sustained use of the batteries in whole or in part.

L. Practical Significance

In addition to requiring that the correlations of the test battery to the criteria be statistically significant, both guidelines require that the correlations indicate practical significance. Seibels, supra at 6802. Plaintiffs' expert concluded that the tests had no practical significance. Under the guidelines, the case law and statistical principles, this Court must agree.

The statistical correlations of the test battery to the criteria were all at the level of less than .30. In City of Chicago, both the district and appellate courts found no practical significance with correlations as high as .36. Some of the correlations here were negative. Negative correlations show that the test is having the opposite result— those scoring highest on the test will be least successful in job performance. Seibels, supra. Under both the guidelines, low coefficient correlations are not per se evidence of a lack of practical utility, Seibels, supra, but they do trigger closer scrutiny, especially where, as here, only a single criterion consistently correlates to the test scores. Under the DOJ guidelines, § 12(B)(5), "in general, other factors remaining *1350 the same, the greater the magnitude of the coefficient the more likely it is that the test will be appropriate for use." Seibels, supra at 6803. To evaluate the meaning of these low correlations, the guidelines require the Court to look at certain factors. These factors were also considered in Santa Ana, where there was practical significance with larger correlations. Balancing these factors, this Court concludes that the guidelines mandate rejection of the test batteries.

The DOJ guidelines first look to the degree of adverse impact. Here, where the impact is quite severe, that factor would weigh heavily against use of the tests. Second, the guidelines require evaluation of the availability of other selection procedures. USS has not presented any evidence regarding other selection procedures, so this aspect cannot be evaluated. The DOJ guidelines also look to any requirement "by law or regulation," to use merit selection. Being a private employer, USS is not required to follow merit principles by government mandate. However, the extensive union agreements do require selection for these jobs according to some measureable standards. Under these three factors, the balance would tip against use of the tests, since the adverse racial impact is not counterbalanced by the other factors.

Under § 12(B)(5), the Court also is to look at "the importance or number of aspects of job performance covered by that criteria" to which the test battery correlates. The battery correlates consistently only to the related classroom instruction. In Seibels, Judge Pointer rejected such a correlation to classroom grades, finding it was not indicative of practical significance. As the Court noted above, the classroom instruction in the apprentice program is only a very small portion of the total training program, when judged in terms of hours spent. Although the witnesses testified as to the usefulness of classroom knowledge on the job, the Court cannot find that this single criterion is so important as to carry the day in light of the adverse impact.

Under the EEOC guidelines, 29 CFR § 1607.5(c)(2), the Court is to weigh three factors: 1) how many of the applicants for the jobs are selected out by the tests; 2) how many successful employees are chosen by other selection devices; and 3) how great is the economic and human risk in hiring unqualified applicants.

USS has shown that the test selects out only 25% of all applicants for the prestigious apprentice jobs. Under § 1607.5(c)(2)(i), this low selection rate would justify use of the batteries even with the low correlations.

The second factor, § 1607.5(c)(2)(ii), requires the Court to look at the adequacy of alternative selection procedures by evaluating the number of persons successful who were not selected by the battery. The evidence shows that since the test battery came into use, virtually all apprentices have been selected by it. Also, as the Court found above, a substantial number of all craftsmen came into their journeyman positions through the apprentice program. This exclusivity of selection limits the Court's ability to judge the second factor. David Braithwaite testified that a purely random process would select a group of apprentices of which only 54% are successful. The present success rate is 90%, he said. Braithwaite testified that if selection was on a random basis, USS would have to choose twice as many apprentices to allow for this failure rate, at a cost of $10 million. Of course, as plaintiffs pointed out, USS has never used a random basis for apprentice selection. To do so, when training costs $10,522 for the first year, would be foolish. It appears from looking at those jobs most recently elevated to "crafts," that craftsmen prior to the apprentice program (and test battery) were chosen by ability, demonstrated by working up through the level of helper. For example, the motor inspector and millwright jobs were "crafted" in 1965. Alfred Capriotti, a departmental assistant general superintendent, testified that one-half of the motor inspectors and 70% of the millwrights did not go through the apprentice classes. USS has presented no evidence that these non-apprenticeship-trained craftsmen are any less successful than the training program graduates selected by the test battery. Indeed, the Rolling division *1351 superintendent, Dwight L. Torlay, said that "quite a few" of these older craftsmen developed themselves to be fine workers. There is no basis for the Court to conclude that these batteries select more successful apprentices or craftsmen than another device. Therefore, under § 1607.5(c)(2)(ii), the low correlations are indicative of a lack of practical significance.

Finally, the third factor is the element of economic and human risk. Defendant USS has established that the complete apprentice program averages $31,982, very little of which is recoverable if an apprentice fails. Therefore, the company has a substantial economic interest in selecting those who will succeed in the program. However, use of these batteries leaves in 10% of unsuccessful apprentices.[38] A pure random selection allegedly would leave in 46% of the poor apprentices. Surely an alternative job-related selection process could be found to bridge this gap. Therefore, the Court does not agree with Braithwaite's estimate of $10 million loss by abandoning the batteries, but find it is far below that. As to human safety, although USS has shown that steel-making is a complicated and sometimes dangerous business, it is no more important to have quality steelworkers than quality policemen, which were the subject of the Seibels case. Judge Pointer found that this factor did not outweigh the low correlations, which were somewhat higher in Seibels than those in this case. Vestibule or simulated training allows the company to eliminate those who are safety hazards early on. As in Seibels, where the "occasional incompetent may be detected and dismissed" during the early period of training, low correlations would be most inappropriate.

The conclusion that both the guidelines require a finding of no practical significance is also supported by a statistical analysis. Dr. Schmidt urged use of the Brogden method, whereby a test's dollar utility could be translated by converting the correlation (e. g. .40) into a percentage (40%) of savings. Plaintiffs' Dr. Kirkpatrick stated that this method was only used when low correlations were sought to be justified. He instead suggested use of the coefficient of determination (r2), which was followed by Judge Pointer in Seibels. This computation requires that a correlation be squared (.40 would have an r2 of .0160). The r2 value is the percentage (16%) of variance in criteria rating explainable by variance in test score. Thus, one can readily see that even on the statistically significant correlations of .30 or so, only 9% of the success on the job is attributable to success on the batteries. This is a very low level, which does not justify use of these batteries, where correlations are all below .30.

In conclusion, based upon the guidelines and statistical analysis, as utilized in the few reported cases, the Court cannot find that these tests have any real practical utility. The guidelines do not permit a finding of job-relatedness where statistical but not practical significance is shown. On this final ground as well, therefore, the test batteries must be rejected.

In summary, the Court must reject the test batteries because they have an adverse impact on blacks and have not been shown by competent evidence to be job-related. This is a clear violation of Title VII under Griggs and the later cases. No evidence of intent need be shown, although the Court has found it, infra, with regard to § 1981 liability. Defendant failed to meet its burden on the affirmative defense because the studies presented to justify the tests' use were seriously flawed and at best inconclusive. Therefore, on this issue under Title VII, the Court will find for the plaintiffs.

M. Testing and § 1981

Having found that the testing program at Fairless violates Title VII, the Court must still look at liability under § 1981. The main distinction between a Title VII *1352 cause of action and one under § 1981 is the issue of intent. As the Court discussed above, under § 1981 discriminatory intent must be shown.

On the issue of testing, the Court does find that the requisite level of intent to discriminate has been established. Invidious discriminatory intent is difficult to establish, as the courts have repeatedly observed. There are a number of indirect indices of intent which the courts have recognized as probative on this score. At this Court held in Croker, intent may be shown by evidence of discriminatory impact, buttressed if needed by general and specific historical evidence. Cf. Resident Advisory Board v. Rizzo, 564 F.2d 126 (3d Cir. 1977).

Here, the proof of the discriminatory impact of the testing program is clear. The Court is of the opinion that it alone would support a finding of discriminatory intent. Defendant argues, however, that the history of the testing program does not reveal any discriminatory intent. The Court cannot agree with this. In Lewis v. Bethlehem Steel, supra, the Maryland district court considered a very similar problem. In that case, as in the instant case, there was no evidence that the tests had their genesis in discrimination. However, once blacks began to take the tests and fail them in disproportionate numbers, the court said that the issue changed. The question was not, were the tests designed to be discriminatory? It became: was the continued use of the tests, with knowledge of their racially adverse impact, evidence of intentional discrimination? The Maryland court said "yes."

The Court of Appeals for the Third Circuit recently implied its agreement with the Lewis court. In Richardson v. Penna. Dept. of Health, supra, the appellate court considered a §§ 1981 and 1983 action in the posture of a motion to dismiss. The court held that plaintiff adequately had pleaded intentional discrimination when she alleged that her employer administered a test that it knew had adverse impact and that it knew was not job-related. The court also observed that disparate impact alone could be probative of intent to discriminate.

In this case, it is clear that USS knew or should have known of the adverse racial impact of its testing program. The disparities are so significant that they are clear upon any brief inspection of the pass-fail rates. Certainly after the Griggs opinion in 1971, company officials became aware of the potential for discrimination presented by test batteries. David Braithwaite's testimony revealed that he was aware of the Griggs opinion. He stated that the court decisions had cast doubt on the use of the Wonderlic, at that time part of the Fairless battery. However, it was not until 1973, two years after Griggs, that the Wonderlic was removed from USS use. It was partially in response to Griggs as well as to union pressure, that USS increased its program of test validation. The South Works and Fairless studies were part of USS' effort to justify use of the tests. Studies had been conducted prior to Griggs as well, which the Court construes as evidence of knowledge of a need to justify the tests' racial impact as early as 1967.[39]

The district court in City of Chicago, chastised defendants for continuing to attempt psychological justifications for their tests in face of clear adverse impacts. "[R]ather than joining in the search for the cause of the disparities to the end that they might be remedied," the court said, 385 F.Supp. 540 at 554, "defendants have chosen to lead the court `deep into the jargon of psychological testing.'" Here, had USS spent its money on finding or developing new selection devices with no adverse impact instead of trying to justify the old ones, the Court might be convinced of the company's good intentions. But where all of its psychologists' energies and all of its testing money have gone into defense of these tests, which have severe and continued adverse impact, the Court is convinced that the tests were retained with discriminatory intent. In Detroit Police Officers Ass'n v. Young, 446 F.Supp. 979, 999 (E.D. *1353 Mich.1978), the police testing battery was changed a number of times because of adverse impact. The defendants there repeatedly made efforts "to find an entry level examination which would be a reasonable predictor of job success, free from cultural bias." In the case at bar, defendant's only objective seemed to be to find tests which would validate properly, regardless of racial impact.

Also, the Court does not find that the results of the validation studies, showing in USS' opinion that the tests were job-related, are a defense to the charge of intentional discrimination. USS placed its trust in very weak analyses which at best indicated marginal correlations. The company clung to its testing program in spite of, not because of, the studies' results. The Company made only one change in the battery's makeup because of the studies over the entire limitations period. This was dropping the egregious Wonderlic exam in 1973. After the South Works study mandated this change, the company refused to budge from use of the present battery, despite numerous grievances and pressure from the union.

Finally, the Court does not find the kind of affirmative action campaign for the crafts that USS expended in the area of management. It was the company's efforts to bring more blacks into management that allowed it to avoid a finding under § 1981 regarding first-level management. Here, the company has not presented evidence of any attempts to get more blacks into the crafts jobs. Therefore, the Court must conclude that the defendant's officials were content to continue the underrepresentation of blacks in the crafts jobs.

Based on these findings, the Court concludes that USS is liable for intentional discrimination in its testing program under § 1981, in addition to its liability under Title VII. The Court has been informed that because of the arbitration award, the use of the tests at all USS plants has been suspended. Of course, plaintiffs have made claims for relief other than injunctive, so this issue is not moot. Those claims for relief under both Title VII and § 1981 will be considered in a later trial for damages.

N. Union Liability

In its earlier opinion, this Court held that a union's acquiescence in discriminatory company conduct was actionable under both Title VII and § 1981.[40] A union's failure to negotiate for non-discriminatory contract terms or to protest discriminatory treatment is in itself discrimination, because of the union's duty to protect its minority members. The Union still contests the Court's interpretation of the law. However, the Court feels that it has properly read the cases and will affirm that earlier pronouncement.[41] A recent decision, Lewis v. Bethlehem Steel, supra, also supports this position.

In Lewis, the district court was faced with the same issue as in the instant case— the union's liability because of discriminatory testing. That court said, 440 F.Supp. at 974:

"Although local 2610 has contended that its lack of involvement in instituting or administering the test, as well as its negotiation of contract provisions with the Company in 1968 requiring that all tests be job-related, absolve it of any possible liability under Title VII, the court cannot agree. The Company had a history of discriminating against blacks holding Production and Maintenance jobs and if Lewis had proven that the test had a disparate impact upon blacks it would undermine Title VII's attempt to impose responsibility on both unions and employers *1354 to hold that unions' passivity at the negotiating table in such circumstances cannot constitute a violation of the Act."

However, in Lewis there was no liability on the union because the court found two facts: first, that the union had actively negotiated for abolition of tests and second, that the plaintiff failed to establish his case against the company.

Here, in spite of the successful case against the company, the Court believes that the Union has proven that its efforts, both on the local and international level, are sufficient to avoid a finding of liability. Outside of this litigation, both prior to it and during it, the Union has attacked USS' testing program on a number of fronts. First of all, since 1965, the Union has attempted, and succeeded in part, to limit management's use of discriminatory non-job-related testing. Second, based on these gradual contract changes, the Union has pressed grievances at Fairless and elsewhere. These grievances have just recently culminated in the arbitration award which resulted in suspension of the present battery's use. Finally, the Union has opposed the use of tests in the courts. In Griggs case, the union filed an amicus brief which argued against the allowance of such tests under Title VII. In the case at bar, the Union has stood silent on the issue of the tests' propriety, even though it would have been to their tactical advantage to join in the battery's defense.

The Court therefore finds and concludes that the Union, both at the plant and industry-wide levels, has made a genuine and concerted effort to get rid of this barrier for blacks. Although plaintiffs presented isolated incidents of failure to process test-related grievances, the Court cannot find that these few events, which are at variance to the Union's clear anti-test stance, constitute proof of a class-wide claim. Under the Teamsters case, plaintiffs must "establish by a preponderance of the evidence that racial discrimination was the [Union's] standard operating procedure— the regular rather than the unusual practice." 431 U.S. at 336, 97 S.Ct. at 1855. Having failed to meet that burden, and the Court having found the Union's practice to be to the contrary, the Court must find in favor of the Union.[42]

INDIVIDUAL PLAINTIFFS

The only remaining claims to be resolved at this time are those of the two class representatives, Moses Dickerson and Eddie Williams, and that of Curtis Worthy, who filed his own individual case. The Court resolved these claims in part in its earlier opinion and incorporates that background and discussion of claims dismissed here as its findings and conclusions. 439 F.Supp. at 90-93.

After the motion to dismiss, Dickerson has four claims remaining. They are: discriminatory denial of a crafts job, discrimination in initial assignment, unfair stigmatizing, and retaliatory disciplinary action. The first two, denial of a crafts job and discriminatory assignment, are somewhat tied together. Dickerson claims that on USS' recruitment trip to Alabama in 1969, he was promised a welding job. Instead when he came to Pennsylvania, he was assigned to the job of janitor. USS has argued that he was found to be physically unfit for the welding job due to a bad back, and was only given the janitor job because Dickerson had made the journey from Alabama to work at Fairless. The evidence of initial assignment discrimination and discriminatory denial of the crafts job hinged on the MPS, which attempted to show that blacks were overrepresented in janitorial jobs and underrepresented in the crafts. Since this plant-wide evidence has been discredited, very little remains to establish that Dickerson was discriminatorily assigned to janitorial rather than welding. An analysis of the "Birmingham hires," many of whom were black, reveals that *1355 Dickerson was the only one assigned to janitorial. This does not show that Dickerson was treated differently on the basis of his race; in fact, it adds credence to the assertion that Dickerson was offered the job because he physically could not qualify for any others.

Dickerson has failed to meet his burden under McDonnell Douglas Corp. v. Green, 411 U.S. 792, 93 S.Ct. 1817, 36 L.Ed.2d 668 (1973). USS' evidence called his physical qualifications into question by showing that he had a back problem, revealed in early X-rays and confirmed by USS' physician, who gave him a physical limitation rating. It was part of the McDonnell Douglas prima facie case to show he was qualified.[43] Since the plaintiff's prima facie burden has not been met as to either of these claims, judgment will be entered for defendant.

Dickerson also testified that he was discriminated against by being classified as a "hard-core unemployable" under the National Association of Businessmen's "Basic Jobs" Program. Dickerson was so categorized on the basis of his unfinished high school education as well as his status as a minority group member. Another factor which contributed to the classification, although not specified on the NAB's list of criteria, is Dickerson's admitted criminal record. According to the NAB, this would make him "unemployable." The NAB program was one designed to encourage hiring of such "unemployables." Although this was considered a discriminatory stigma by the Court in its earlier opinion, it is now clear that race was only one factor in this classification, see University of California Regents v. Bakke, supra. Under Bakke, the Supreme Court indicated that race may be a factor so long as it is for affirmative action. Here, the NAB program, and thus the classification of "unemployable," was designed to help persons like Dickerson get jobs, not to prevent it. Furthermore, it appears that this "stigma" was not widely publicized, but was only used in confidential reports. So, Dickerson suffered no ill effects from it. Finally, the factor that a legitimate outside organization established these criteria removes an inference of intentional discrimination and in fact shows that the classification was made in good faith.

Dickerson contends that such classification, even if not stigmatizing, was used to discriminate against him because it kept him on job probation for a longer time. USS claims that this was a business necessity, in view of his "unemployable" traits, which made him a job risk. The Court will accept this as true, although it recommends that race not be used as the sole factor in making any such future decisions. Dickerson has failed to raise any inference of discrimination, since he has presented no evidence that only blacks were considered "unemployable," or even that they were disproportionately so classified. Therefore, the Court finds against Dickerson on this claim.

Finally, Dickerson claims that he was disciplined in retaliation for his publicizing of his claims against USS. An article appeared in the Trenton Times newspaper on March 19, regarding the complaints of Dickerson and others. The first discipline, for reporting to work drunk, arose early on March 19, before the article was published. Two followed shortly thereafter—on March 20 and 21 for unfit condition and absenteeism. All were issued by the same foreman, J. Soltis, who was not called as a witness. However, the company placed into evidence his reports of the events surrounding these disciplines.

It is the opinion of this Court, after reviewing these reports and Dickerson's testimony, that the disciplines were not retaliatory. The Court noted in its earlier opinion that there was some connection between the article and the disciplines. However, the connection appears not to be in the mind of the foreman, who probably did not know of the article at the time of the first *1356 discipline, but in Dickerson's own mind. His grievances regarding his placement were heavy on his mind and he viewed his contacts with the NAACP, and the resulting news article, as the beginnings of their resolution. The discipline reports of the March 19, 20 and 21 all mention that Dickerson was drunk and raving about his lawsuit and the NAACP, and mentioned the article. Dickerson's medical reports from Alabama reveal that he drank quite a bit, so the Court finds the allegations of the report credible. The Court also finds Dickerson's own testimony, denying the incidents, to be self-serving and incredible. The Court concludes that the three disciplines were not retaliatory, but were caused by Dickerson's hostility toward USS and resultant drinking, which coincided with the Trenton Times article. Because of his actions while drunk, he brought the suspensions on himself. The Court concludes that judgment must be entered in favor of the company and against Moses Dickerson on all his claims.

Eddie Williams, the second class representative, also made claims of initial assignment discrimination and discriminatory denial of a crafts job. His initial assignment claim must fall with the class claim, as did Dickerson's. His claim of denial of a crafts job is tied to class-wide proof that blacks are discriminatorily barred from taking the journeyman's test. The Court has already noted the dearth of evidence on this subject, and has denied the class claim. Therefore, Williams' individual claim also falls. Williams' only remaining claim consists of alleged harassment and special work assignments given by his foreman, Mike Fraikor. Fraikor testified that he did not treat Williams differently or give him different work. This testimony was supported by Ernest Duncan, a black employee who vouched for Fraikor's lack of bias. In keeping with the Court's earlier finding that Williams exaggerated many of his experiences at USS, the Court finds Fraikor more credible than Williams. On all claims, therefore, judgment will be rendered in favor of the company and against Williams.

Finally, the Court must consider the individual case of Curtis Worthy. Worthy claims he was disciplined and forced to demote from the craneman's job because of his race. The company claims that he was not treated differently than whites and that Worthy's accident record provided a legitimate business reason for removing him from the craneman's job and for disciplining him by suspending him for ten days, on the basis of a crane collision in 1971.

The Court has examined all the evidence and has concluded that Worthy's suspension and demotion were appropriate when viewed against the handling of the white cranemen's cases. The Court considers that Worthy's safety record, of three accidents and other citations for improper crane operation, placed him at the same level, if not worse, as white cranemen who were demoted such as Yankowski, Yandrich and Borden. The Court finds from Worthy's history that he failed on many occasions to properly operate the crane and thus endangered the safety of other men on the job. To remove him from the craneman's job, as whites who operated the cranes recklessly had been, was a rational and non-discriminatory safety measure, not a discriminatory discipline. Worthy also complains that the 10-day suspension given him at that time was excessive, in that whites who were demoted got little or no suspension, usually three days or less. Worthy had received similar, lighter suspensions for earlier accidents. Here, Worthy was not only disciplined for the improper crane operation; he also had violated two other rules that the whites apparently did not violate. First, he failed to report the accident. Second, he continued to operate the crane after the accident. The Court finds that, in view of these additional violations, which were described as serious, the 10-day discipline was not unreasonable, even when compared with lesser suspensions received by whites. The Court cannot find that this discipline was discriminatory.

ORDER

AND NOW, to wit, this 2nd day of August, 1978, it is hereby Ordered, in accordance with the attached findings of facts and conclusions of law:

*1357 1. The Court sets forth its findings of facts and conclusions of law for these cases in the attached narrative form memorandum. Except to the extent any fact or conclusions are inconsistent with those set forth in the attached memorandum, the Court also incorporates as its findings and conclusion its opinion under Rule 41(b), F.R.Civ.P. dated July 25, 1977 and reported at 439 F.Supp. 55.
2. Judgment on liability is entered in favor of the class of plaintiffs and against defendant United States Steel on the issues of testing under 42 U.S.C. § 1981 and § 2000e and access to management under 42 U.S.C. § 2000e only. A trial on relief will be scheduled.
3. Judgment on liability is entered in favor of United States Steel and against the class of plaintiffs on the issues of initial assignment, and manning new facilities.
4. Judgment is entered in favor of defendant unions (United Steelworkers of America, et al.) and against the class of plaintiffs on all remaining issues.
5. Judgment is entered in favor of United States Steel and against Moses Dickerson and Eddie Williams.
6. Judgment is entered in favor of defendant United States Steel and against Curtis Worthy on his individual case.
7. The plaintiffs' motion to admit rebuttal exhibits into evidence is GRANTED, except as to P-3129 and P-3130, as specified in the attached findings and conclusions.
8. The Union's motion to reopen the record is GRANTED. The arbitrator's award is hereby admitted.

AND IT IS SO ORDERED.

NOTES

[1] USS presented another expert, Dr. Bernard Siskin. Much of that evidence was seriously flawed by an erroneous assumption that in order for a statistical analysis to be proper evidence in a discrimination case, it must demonstrate an "adverse pattern" to blacks. This argument seems to be based on the theory that Title VII requires proof of racially discriminatory intent. Therefore, he criticized plaintiffs for using a one-tailed test, which only shows if blacks are disproportionately assigned, rather than a two-tailed significance test, which would be sensitive to both races. Furthermore, he went into some length to show that blacks were not the only ones assigned to undesirable jobs. He concluded that since the concentrations shown were not "adverse" to blacks, in his opinion, that there could be no discrimination. This conclusion is legally wrong, and therefore much of his analyses were irrelevant. It does not matter whether defendant does not view jobs as good or bad; if blacks are arbitrarily channeled into one job over another, because of stereotypes for example, that is discrimination. Teamsters v. United States, 431 U.S. 324, 97 S.Ct. 1843, 52 L.Ed.2d 396 (1977); Dickerson v. United States Steel, 439 F.Supp.55, 76 (E.D.Pa.1977). Dr. Siskin's testimony regarding the age and education variables seems more directed toward a "business necessity" defense than an attack on the prima facie case. Since the plaintiffs' case is defeated by failing to meet the prima facie level, the Court need not judge the business necessity defense. However, his testimony in other areas was relevant, helpful and will be further discussed below.

[2] The Court takes note of the fact that, according to the evidence, USS had the capability to produce a study of similar dimensions to the MPS, but without many of the alleged flaws. It apparently had prepared, in the course of the trial, the records of all employees at all times in computer-readable form. Plaintiffs argue that USS' failure to put any similar study forward should be construed by the Court as an affirmation of plaintiffs' conclusions—that is, since USS did not put forward a statistical analysis showing that it did not discriminate, the Court can presume that a USS study would have established that it did. However, it is not defendant's burden to put forward such evidence, if it can defeat plaintiffs' case by other means.

[3] In its July 25 opinion, the Court anticipated this situation, supra at 78, note 23:

"Of course, any conclusions that the Court has made [at the close of plaintiffs' case-in-chief] as to the validity and accuracy of the MPS or any other statistical devices and analyses have been made on the basis of the preponderance of the evidence [at this time] and are under no circumstances irrebuttable. All that the Court means is that it has decided at this juncture that the challenges raised on cross-examination and in argument do not sufficiently undermine a finding of propriety in relying on these studies. Further evidence may prove that reliance to be ill-founded." (Emphasis added).

[4] Dr. Siskin testified that prior to 1968, 49% of the whites in the study were hired, as compared to 39% of the blacks. After 1968, 51% of the whites studied were hired and 61% of the blacks.

[5] Plaintiffs attempted to show the MPS' lack of seniority bias by demonstrating the closeness of the matches through Court Exhibit # 7. This exhibit addressed only the issue of "backwards bias," an attack launched by USS on cross-examination. This challenge was apparently dropped by the defendants in their cases-in-chief. Court Exhibit # 7 did not justify the overall seniority shift caused by the alteration of the selection rules, since it only demonstrated that the matches were close in time, but not evenly distributed throughout the study in total number.

[6] Dr. Litwin was not consulted until the eleventh hour before trial, a fact which harmed the plaintiffs' case in many ways. Perhaps if he had been able to offer plaintiffs his statistical expertise earlier in the preparation of the plaintiffs' case, he would not have been required to offer guesses in the guise of studies.

[7] The Court does not give great weight to the criticism that the sample thus drawn is not "random" because it failed to use a random start. Defendants' own studies, such as Dr. Siskin's error analysis, also failed to choose a starting point at random. Since this is such a common omission, the Court does not consider it as a significant flaw.

[8] At the time, the Court accepted plaintiffs' argument that since the termination date was external to the initial assignment, a failure to sample the later black terminations and an over-sampling of earlier white ones would not affect the sample. However, as defendants have pointed out, this is by no means a random sample, simply because the initial assignments are not necessarily from the same time period. Also, the termination date is in some cases predictive of the initial assignment, since an employee terminated must have been assigned prior to that date. In that way, the heavy white sample of early dates predicts that more senior whites were included.

[9] The Court is very disturbed about the evidentiary basis for plaintiffs' clusters. In the order of June 21, 1977, the Court provisionally admitted the information supporting the clusters, the line of progression (LOP) charts, if plaintiffs established a time frame for them. Plaintiffs never did so. Since they did not comply with the order, the Court may consider that the LOP charts are not in evidence. If they are not part of the record, there is no evidentiary basis for arranging the MPS data into "clusters." The Court is not rejecting the MPS on this technical evidentiary point, but it does consider this as yet another failure of plaintiffs' case.

[10] For those crews selected under the regular seniority agreement—all those except the first and second Galvanizing and the first crew for the Rod Mill rolling line—the system is protected by § 703(h) of Title VII. Under the recent Teamsters case, any personnel action taken in compliance with a seniority system is protected unless the system is not bona fide. Finding no evidence which would support an inference that the Fairless seniority system is not legitimate, 439 F.Supp. at 71-72, the Court must only consider those crews not picked by the seniority system as possible bases of Title VII liability.

[11] That limitations period includes events back to June 11, 1967. 439 F.Supp. at 69.

[12] Exhibit P-42 showed that, in 1973, 1.3% of first-level managers were black as compared with 10.5% of the P & M workforce. In 1974, 1.7% of the managers were black, as compared with 12.01% of the total workforce.

[13] Dr. Litwin recomputed his figures for the 1973 and 1974 statistics. He found that the actual number of black managers varied from the number expected in a random distribution by more than two standard deviations. Under Hazelwood School District v. United States, 433 U.S. 299, 97 S.Ct. 2736, 53 L.Ed.2d 768 (1977), such a variance is prima facie evidence of discrimination.

[14] Here, the evidence shows that the upper-level management was aware of the small numbers of black managers. In fact, it was due to such concern that the management selection programs were born.

[15] The Court opined earlier, 439 F.Supp. at 82-83, that plaintiffs had made a prima facie case of discriminatory access to the crafts via the journeyman skill exam. However, they have presented no class-wide data as to the black-white pass rate for this examination. Since the MPS has been ruled out as a competent measure of initial assignments, the evidence of disparity in assignments that the Court relied upon in July is no longer available. The disparity in the crafts overall, as the Court will discuss below, can be explained more logically by the apprentice program disparities, since a large percentage of all crafts workers enter their jobs through this program—at least 100 a year. Without the MPS data, the AAP "snapshots" do not establish an inference of discrimination in initial assignment to crafts, because of the variable of the apprentice system. The subjective procedures in selecting journeymen, without evidence of a disparity causally linked to these procedures, are not a Title VII violation. Hester v. Southern Railway Co., 497 F.2d 1374 (5th Cir. 1974), cited in Nath v. General Electric, 438 F.Supp. 213 (E.D.Pa. 1977) (Huyett, J.).

[16] The argument that the Gilbert case required a showing of intent was also rejected by the courts in Officers for Justice v. Civil Service Commission, 14 EPD ¶ 7548 (N.D.Calif. Jan. 31, 1977) and Neloms v. Southwestern Elec. Power Co., 440 F.Supp. 1353 (W.D.La.1977).

[17] The pertinent portions of that section read: "Notwithstanding any other provision of this subchapter, it shall not be an unlawful employment practice for an employer to apply different standards of compensation, or different terms, conditions or privileges of employment pursuant to a bona fide seniority or merit system . . . provided that such differences are not the result of an intention to discriminate because of race, color, religion, sex, or national origin, nor shall it be an unlawful employment practice for an employer to give and to act upon the results of any professionally developed ability test provided that such test, its administration or action upon the results is not designed or used to discriminate because of race, color, religion, sex or national origin . . ." 42 U.S.C. § 2000e-2(h). (Emphasis added.)

[18] Another way of construing Griggs and the merit system language would be to hold what both require defendant to prove a lack of "intent" before a testing device could be held to be protected by § 703(h). Such lack of "intent" would be established by proof that the test is job-related. This lack of "intent" would not require proof of a subjective non-racial animus, but would only require justification of the test as a predictor of ability. Since this has the same result as the Court's discussion above, in that under this theory plaintiffs would still not have the burden of proving intent in their prima facie case, the Court finds it is clearer to eliminate all references to "intent" in a testing case.

[19] Plaintiffs also had statistics which purported to analyze initial assignments to crafts jobs as part of their Matched Pair Study. However, since the Court has found that entire study to be unreliable, it would not be proper to consider that evidence here.

[20] In plaintiffs' post-trial brief, they have attempted to expand the testing issue to encompass individuals not included in these areas. While it is true that the tests were administered to individuals other than those seeking to obtain crafts apprenticeships or metallurgy and inspection positions, they have not presented competent evidence showing that the disparate pass rate has resulted in underrepresentation of blacks in the other positions. Certainly, since this Court has decertified the issue of hiring, the administration of the tests to applicants, from 1965-1971, is not relevant here. The lack of supporting evidence on the learner positions, maintenance planner and expediter and 1971 management candidates, requires the Court to rule out those areas from further consideration.

[21] Counsel have stipulated that the testing procedures were the same in all divisions of Fairless. Therefore, the Court's discussion will apply with equal force to the various mills involved in this litigation.

[22] This is based on the ratio given the Court by Dr. Haimes of four to one.

[23] This is based on the ratio of 3.6 to one.

[24] In NAACP, Ensley Branch v. Seibels, 13 EPD ¶ 11,504 at 6796 (N.D.Ala. Jan. 10, 1977), Judge Pointer said he was to follow both sets of guidelines absent "cogent reason." There, however, he noted that he was not presented with any conflict between the EEOC and DOJ instructions.

[25] A number of cases since Griggs have resulted in discontinuance of the Wonderlic exam, because of unvalidated use with disparate impact. See, e. g., Albemarle Paper Co. v. Moody, supra; James v. Stockham Valves, supra; Watkins v. Scott Paper Co., 530 F.2d 1159, 1185 (5th Cir. 1976). The Bennett was discussed in Griggs, and was dropped prior to litigation in Albemarle.

[26] Although 29 CFR § 1607.5(b)(1) requires that the applicant sample be compared to the "local labor market," this comparison would not be appropriate here, where apprentices are not hired off the street. The proper comparison would be to the total P & M population at Fairless. See discussion supra, on the relevancy of Dr. Wolfbein's labor market testimony, and cases cited there.

[27] The DOJ guidelines say that absence of evidence which is "desirable" but not "essential," as is the information on the applicant pool, "will not be a basis for considering a report incomplete." § 13(b). This is one place where, in the Court's opinion, the DOJ and EEOC guidelines conflict. As noted in this opinion, the EEOC guidelines list as a "minimum standard" proof of the representativeness of the sample. The information on the applicant pool is, in the Court's opinion, essential in this determination. Therefore, even though the DOJ guidelines do not require this information, the Court finds that its absence does render the studies incomplete, in its own judgment and in applying the EEOC guidelines.

[28] Similarly organized course outlines were included for: boilermaker, bricklayer, carpenter, machinist, millwright, mobile equipment repairman, pipefitter, sheet metal apprentice, rigger, and welder.

[29] Not included in the manual but in the Fairless I study is the rigger apprenticeship. No job analysis (except for the irrelevant JDC pages) is included for this job in that study.

[30] Of the 16 crafts included in the South Works studies, 10 fall into this category, comprising 76 members of the sample. They are: boiler-maker, bricklayer, carpenter, machinist, millwright, mobile equipment inspector, pipefitter, rigger, roll turner, welder.

[31] Six South Works crafts were included in this category: armature winder, lineman, wireman, electronics repairman, motor inspector, comprising 66 members of the sample. Instrument repairmen were taught math, but no drawing or print reading and therefore are included in this group.

[32] Dr. Wolz also performed an examination in which he classified the courses and on-the-job requirements according to whether they require general mental ability (or numerical ability in the Fairless III study), mechanical comprehension or space visualization. These are the three abilities that the test batteries are supposed to measure. This informal "correlation" does not show any content similarity, but attempts to show by non-scientific methods what the validity studies are designed to demonstrate. The Court will not consider this so-called "correspondence" that Dr. Wolz claimed to have established by this clearly deficient study.

[33] For example, the South Works sample contains: more crafts (16 as compared with 12 and 8), proportionately more minorities, and older apprentices (mean age 25.6 compared with 22.1 in Fairless III). There is no information in South Works as to how many months the apprentices had been in the program, but only that it was at least one year. Fairless I's sample had a mean of apprentice service of 14.0 months, while Fairless III had a mean of 9.85. The Court sees these differences as potentially significant, and finds that USS failed to satisfy its burden of explaining them or demonstrating their non-significance.

[34] Under the DOJ guidelines, § 12(B)(3), certain quantifiable job measures, such as production rate, error rate, absenteeism and turnover, may be used without a full job analyses. The USS criteria are not such objective measure, but require consideration of many factors which together result in a subjective evaluation of quantity or quality of work.

[35] The DOJ guidelines, § 12(B)(3), allow use of "properly measured success in training" as a criterion. "Measures of training success based on pencil and paper tests will be closely reviewed for job relevance." Here, where the training grades are not defined, the Court cannot determine if the grades are "proper" measures. Since defendant bears the burden of proof here, the Court concludes that they are not appropriate measures, especially to the extent they rely on tests.

[36] The Court has used the coefficients without "correction" for range restriction. As Dr. Schmidt demonstrated, such correction would raise the coefficients, but there can then be no computation of statistical significance. This kind of statistical device, which inflates coefficients, is condemned in the DOJ guidelines, § 12(B)(6). Judge Pointer said that such correction is an assumption of normal score distribution which "is just that—an assumption." He implied that reliance on such an assumption, without any evidence of its validity, was at some risk to the factfinder. The Court declines, under these circumstances, with the general methodological unreliability found in these studies so far, to take that risk.

[37] Compare this with the correlation of the supervisory ratings within themselves. According to Fairless I, a person who has high quality of work will also have high quantity, and so on. All the supervisory ratings correlate on a level of .77 or above.

[38] The Court thinks this 10% figure is too low, in light of the results of Fairless III. There, of 30 apprentices scoring average or above average on the battery, eight were rated "poor," or almost 25%. Therefore, the Court would find that the selection rate of successful apprentices is somewhere between 10% and 25%.

[39] According to the South Works study, Dr. Ramsay performed validation studies at Clairton (1967), South Works (1970) and Fairfield Works (1972) with all or part of the battery.

[40] The Court incorporates that discussion, at 439 F.Supp. 55, 62-63, as part of its final conclusions of law here.

[41] The union has argued that the Supreme Court's recent holding in Monell v. New York City Dept. of Social Services, 436 U.S. 658, 98 S.Ct. 2018, 56 L.Ed.2d 611 (1978), merits a different conclusion because of its interpretation of "causation" under § 1983. The union argues that the case outlaws vicarious liability in this situation. The Court does not agree that Monell has such far-reaching consequences; in fact, it believes the opposite is true. At any rate, in view of the Court's holding below on the facts, the issue is really unnecessary to decide.

[42] Having found no Union liability on any claim, the Court must also find against the plaintiffs on their claim under 42 U.S.C. § 1985(3), the civil rights conspiracy statute. The company cannot be liable for conspiring with itself. Jones v. Tennessee Eastman Co., 398 F.Supp. 815 (E.D.Tenn.1974); Milburn v. Blackafrica Productions, 392 F.Supp. 434 (S.D. N.Y.1974).

[43] Even if the Court were to hold that a prima facie case were to be established, USS' evidence of Dickerson's infirmity would be an adequate business necessity defense to the crafts denial claim. 439 F.Supp. at 64. Plaintiff has failed to show that this was a mere pretext, as allowed under Albemarle Paper, so his claim must fall.

[*] Also in Fairless 3

[*] Significant at .05 level

[**] Significant at .01 level

[*] Significant at .05 level

[**] Significant at .01 level

[**] Significant at .01 level

[**] Significant at .01 level