dissenting.
Transit Mix laid off Metz and closed the Knox cement plant, which he was managing, because that plant’s sales were insufficient to justify its operation. Five months later it reopened the plant under the management of Burzloff, who made a little more than half of Metz’s $26,000 salary. The district court found that “Metz’s salary was too high to justify in light of the poor performance of the Knox plant.” The court also found that because of “differences of opinion and style between Mr. Metz and those who populate the Plymouth plant [Transit Mix’s other, larger, plant], it was legitimate and non-discriminatory” not to employ Metz at Plymouth or to ask Metz to put the Knox plant back in operation. Burzloff, who had worked for 18 years at the Plymouth plant, could be recalled to Plymouth if Knox should prove unprofitable again; Metz could not be detailed to Plymouth. The district court concluded “that each of these reasons — the greater flexibility afforded by Mr. Burzloff and the salary savings — was a determining factor in the decision to terminate Mr. Metz.” My colleagues do not hold that any of these findings is clearly erroneous. See Pullman-Standard v. Swint, 456 U.S. 273, 287-90, 102 S.Ct. 1781, 1789-91, 72 L.Ed.2d 66 (1982).
I
The district court expressed the view that Transit Mix was entitled to take Metz’s salary into account. 646 F.Supp. 286, 291-94 (N.D.Ind.1986). The majority disagrees. But we review judgments, not opinions, and it is hard to see how Metz’s salary mattered. The district court found that both salary and flexibility were “determining” factors. It is clear from the context that the court meant sufficient rather than necessary conditions. That is, Transit Mix was not going to reopen the Knox plant unless its manager could work at Plymouth too. My colleagues do not disturb this finding. Because causation is an essential part of the plaintiff’s burden in a disparate treatment case, see Dale v. Chicago Tribune Co,, 797 F.2d 458, 462 (7th Cir.1986); Sherkow v. Wisconsin, 630 F.2d 498, 502 (7th Cir.1980), Metz loses no matter what we make of the district court’s approval of the salary business.* So al*1212though I see why my colleagues disagree with the district court’s opinion, I do not understand why they disagree with its judgment.
The district court also found that Transit Mix was not going to pay $26,000 to a manager at Knox, because the plant did not do enough business to support such a salary. The sales of the Knox plant were less than $300,000 in each of 1982 and 1983, falling from earlier levels. “Metz’s salary was too high to justify in light of the poor performance of the Knox plant.” We have held, in common with every other court to consider the issue, that a firm may lay off or fire employees of any age when economic conditions make that prudent. E.g., Tice v. Lampert Yards, Inc., 761 F.2d 1210 (7th Cir.1985); Dorsch v. L.B. Forster Co., 782 F.2d 1421 (7th Cir.1986); Sahadi v. Reynolds Chemical, 636 F.2d 1116 (6th Cir. 1980); Price v. Maryland Casualty Co., 561 F.2d 609 (5th Cir.1977). “Economic conditions” implies a comparison of the employees’ wages with their product; a plant that is unprofitable when the average wage is $20 per hour may be a bonanza for the firm when the average wage is $10. If Transit Mix had said: “We are losing money at Knox, in part because of your high salary, so we are closing that plant”, Metz could not have complained. If Transit Mix had known what the court holds today— that it is forbidden to replace Metz with another employee at lower salary — it would have kept the Knox plant shuttered. Metz still would be out of work. He would have been fired had he been 35 years old and everything else the same. This is age discrimination?
Metz’s victory today is Pyrrhic — not for him, but for older employees in general. The court tells employers to keep their plants closed. Throw overpaid employees out of work because their salaries are high (as Tice permits) but don’t you dare hire anyone else at a lower salary to do the work. If that rule were widely followed, the Metzes of the world would be no better off, and the Burzloffs (also in the protected age group) would be worse off. They would be denied advancement, and other employees whom the Burzloffs would manage at Knox would never be hired. If Congress wants such a stultifying result, if Congress wants to hurt older workers, so be it. But judges should not go out of their way to injure protected groups. The ADEA as it exists does not prohibit consideration of the relation between an employee’s salary and his productivity.
II
My colleagues’ treatment of “wage discrimination” under the ADEA has the support of several other courts. Fair arguments may be made on both sides. But I am persuaded that my brethren, and these other courts, have settled on an approach that is too broad, and I shall try to explain why. Wage discrimination is age discrimination only when wage depends directly on age, so that the use of one is a pretext for the other; high covariance is not sufficient, and employers always should be entitled to consider the relation between a particular employee’s wage and his productivity.
Section 4(a) of the ADEA, 29 U.S.C. § 623(a), provides in part that it is unlawful for an employer:
(1) to ... discriminate against any individual with respect to his compensation, terms, conditions, or privileges of employment, because of such individual’s age; [or]
(2) to limit, segregate, or classify his employees in any way which would deprive or tend to deprive any individual of employment opportunities or otherwise adversely affect his status as an employee, because of such individual’s age;
Section 4(f)(1), 29 U.S.C. § 623(f)(1), adds that it is lawful “to take any action ... where the differentiation is based on reasonable factors other than age”. A natural reading is that an employer may take into account wages, which are “factors other than age”. Many people under 40 (the lower bound of the protected group) earn $26,000 or more; if such a salary exposes *1213them to discharge on economic grounds, then it should expose older employees to discharge. You do not get immunity from an otherwise lawful employment decision by growing old. As my colleagues say, the “ADEA is aimed at protecting the individual employee” (Maj. op. at 1206), but what it protects each employee against is age discrimination. The Act prohibits adverse personnel actions based on myths, stereotypes, and group averages, as well as lackadaisical decisions in which employers use age as a proxy for something that matters (such as gumption) without troubling to decide employee-by-employee who can still do the work and who can’t. The ADEA does not protect anyone against decisions based on actual performance.
The contrary view starts from the belief that wage and age are correlated. But age and ability also are correlated. For many years employees add to their skills and as a result do better work; eventually the tables turn, as mental and motor skills slip away. This proceeds at different paces for different people; the ADEA ensures that employers examine each employee’s actual performance rather than the average performance of a group defined by age. No one doubts, however, that an employer may discharge an employee, of any age, who no longer performs the job with acceptable skill. But one could say about performance on the job exactly what my colleagues say about wages: a test based on performance hurts the old relative to the young. Does it follow that this adverse impact makes inquiry into performance impermissible?
The customary response is that no one is protected by the ADEA unless qualified for the job. An older employee whose skills have diminished is not qualified. Yet there are degrees of skill; an employee is not “qualified” one day and “unqualified” the next. In business the question is not “is Jones qualified?” but “can Jones do the job well enough to cover his wage?” A welder good enough to work on simple sheet metal at $10 per hour may be unqualified for a welding job, paying $30 per hour, in a nuclear plant or on a bridge where lives depend on the quality of the joint and other, better welders compete for the position. There is no “qualified welder” in the abstract, and there is no “qualified manager of a cement plant” either. To say that someone is “qualified” to manage the Knox plant is to say that he can handle the manufacture and sale of concrete well enough that he adds to the value of the enterprise at least the cost of his salary. If he cannot do this, he is unqualified for the particular job at the particular time. It is therefore not possible to divorce the ability to do a job from the wage demanded. If the ADEA allows employers to make decisions based on performance — surely it does, even though performance is systematically related to age — then it also allows employers to make decisions based on the interaction of performance and wage. If the wage is too high for the performance, the employer may act.
My colleagues concede as much when they say (Maj. op. at 1208, 1209-10) that Transit Mix could have cut Metz’s salary. Cases such as Tice and Dorsch hold that employers also may fire workers whose productivity does not justify their wage. If these things are true, however, then the rule the majority creates — that employers cannot act on the basis of salary — cannot be right. More, the language of the ADEA will not sustain a difference between firing an employee based on salary (which my colleagues think forbidden) and reducing an employee’s salary based on salary (which my colleagues think OK). The premise of the court’s opinion is that wage is the equivalent of age, and to treat an employee adversely because of his high wage is illegal because it has a disproportionately large effect on older employees. Reducing the salary of higher-paid employees also affects older employees adversely, and therefore should be equally illegal. Section 4(a)(1) lists compensation as one forbidden ground. It would be a shocking violation of the ADEA to reduce by 50% the wages of all employees 50 and up; yet my colleagues suggest that Transit Mix should have done just that to Metz. Discharge and a reduction in salary are treated the same under § 4(a)(1). If one is off *1214limits, so is the other; neither the language nor the structure of the Act creates the sort of distinction my colleagues suggest. One of the principal reasons for enacting the ADEA was a belief that people dismissed at advanced ages cannot obtain jobs at equivalent pay elsewhere; most employees care about the discharge because of its financial consequences, not because of a sentimental attachment to their employers; discharge is really no different from a reduction in income to the salary paid by the next employer; yet the court ironically says that it is fine to reduce the pay of older employees, so long as it is reduced “at home.”
My colleagues view salary reduction as a less restrictive alternative and therefore preferable. Perhaps it is, but we have held that the ADEA does not require employers to use less restrictive alternatives such as offering employees other jobs rather than firing them. E.g., Tice, 761 F.2d at 1217. The disparate impact model of race discrimination law on which my brethren draw instead demands that tests or devices with disparate effects be validated or justified by a business necessity. See Griggs v. Duke Power Co., 401 U.S. 424, 431, 91 S.Ct. 849, 853, 28 L.Ed.2d 158 (1971); Dothard v. Rawlinson, 433 U.S. 321, 328-32, 97 S.Ct. 2720, 2726-28, 53 L.Ed.2d 786 (1977). The majority waters down the disparate impact approach even as it borrows.
If we apply disparate impact analysis rigorously to decisions based on wages, we will require some fundamental changes in the operation of American business. A firm may not close a plant or curtail its operations on the basis of high wage costs. At a minimum, the court must determine whether a general wage reduction would have restored the plant’s profitability. A firm may not give lower-paid employees a wage increase without doing the same for higher-paid employees. Many times an increase will help the lower-paid employees catch up with others; if wage discrimination is age discrimination, this differential increase is presumptively unlawful. (I pass the question whether the ADEA would require an equal percentage increase or an equal dollar increase — whether, indeed, an across-the-board percentage increase might be called discrimination against the younger employees in the protected group who receive lower absolute dollar increases.) In times of corporate austerity, firms may freeze or reduce the salaries of their managers and other well-paid employees; no more, because that is age discrimination.
Ill
I would accept all of this if the ADEA required it. The language of the Act does not, however, and neither does the analogy to disparate impact cases under Title VII of the Civil Rights Act of 1964. It is time to unscramble the strands of doctrine involved in this and similar cases.
Anti-discrimination law uses two forms of inquiry: disparate treatment and disparate impact. (This ugly use of “impact” instead of “effect” is ingrained, and I follow the convention.) The plaintiff in a disparate treatment case contends that the employer treated him adversely because of a forbidden characteristic. He must make a prima facie case, see McDonnell-Douglas Corp. v. Green, 411 U.S. 792, 93 S.Ct. 1817, 36 L.Ed.2d 668 (1973), after which the employer must articulate (but not prove) a neutral explanation for its action, see Texas Department of Community Affairs v. Burdine, 450 U.S. 248, 101 S.Ct. 1089, 67 L.Ed.2d 207 (1981). The plaintiff then must show that the employer treated him adversely because of the prohibited characteristic. This means intent and causation. Postal Service Board of Governors v. Aikens, 460 U.S. 711, 714-15, 103 S.Ct. 1478, 1481, 75 L.Ed.2d 403 (1983). Once the evidence is in, the presumption created by the prima facie showing “drops from the case”, Burdine, 450 U.S. at 255 n. 10, 101 S.Ct. at 1095 n. 10. See also Kier v. Commercial Union Insurance Co., 808 F.2d 1254, 1257 (7th Cir.1987); Morgan v. South Bend Community School Corp., 797 F.2d 471, 480 (7th Cir.1986). The disparate treatment model, designed for handling individual cases of discrimination, makes it easy for the plaintiff to get to court (the prima facie case is not demanding) and requires *1215an employer to produce an explanation; but once the employer does this the employee faces stiff burdens on intent and causation.
The plaintiff in a disparate impact case complains not about what happened to him (though that may play some role) but about a test or device the employer is using. The plaintiff shows that the test or device selects by race (or other prohibited characteristic). The employer then must show that there is a good reason for this selectivity. One good reason may be the “validity” of the test — that is, its ability to predict performance on the job or its association with some trait essential for the job. The other approach is “business necessity”; the employer tries to show that it can’t live without the test or practice. The disparate impact model is addressed to class-wide discrimination, usually unintentional. The plaintiff bears a heavy burden and often must amass a great deal of data (and subject it to expert statistical analysis) to show the disparate impact. The defendant has a correspondingly high burden once the plaintiff shows that the test or device systematically filters out members of the protected group.
Metz filed and litigated this case under the disparate treatment model. He was accordingly required to show intent and causation — which, as I have pointed out, he failed to do. My colleagues bail him out by merging the two models. They allow him to get into court with a prima facie case rather than with the daunting statistical showing of a class-wide disparate impact. Then they require the employer to refute this ersatz disparate impact case; it is not enough, they say, for the employer to advance a legitimate reason. And at the end, they conclude, the trier of fact must (not just may) infer intent from the unrefuted disparate impact case, so that Metz prevails. This mixture gives Metz the benefit of the easy parts of both models. Only by using the aspects of the disparate treatment and disparate impact routes most favorable to plaintiffs, and discarding the aspects of each approach favorable to employers, does the court find a violation in today’s case.
The two methods of proof should be kept separate. They are built on different premises: disparate treatment on the premise that employees are identical, so that differential treatment must be attributed to use of the prohibited characteristic, and disparate impact on the premise that because of a history of discrimination employees are different, so that employers must be prevented from using arbitrary tests and devices that play on that regrettable difference without advancing any legitimate interest. Putting the two theories together yields nothing but confusion. See Douglas Laycock, Statistical Proof and Theories of Discrimination, 49 L. & Con-temp.Prob. 66 (Aut.1986). This case shows why.
As a disparate treatment case, Metz’s claim falls short. Transit Mix articulated two justifications other than Metz’s age: the need to have a manager at Knox who also could work at Plymouth, and the relation between Metz's salary and the revenues of the Knox plant. The district court credited the first as an accurate and sufficient reason. The second, too, was a sufficient reason; although perhaps related to Metz’s age in a statistical sense, such relatedness does not show discriminatory intent in a disparate treatment case. As a disparate impact case Metz’s claim is equally weak. A disparate impact claim depends on groupwide adverse effects, which Metz never offered to show. It also depends on the outcome of a validation study and an inquiry into business necessity, which no court has conducted.
The consequences of scrambling the two models are most apparent in the court’s treatment of intent. The plaintiff in a disparate treatment case must show discriminatory intent. There is no finding that Transit Mix treated Metz adversely because of his age; there is only a finding that Transit Mix considered something that is correlated with age. Yet the majority allows disparate impact to substitute for intent, although all the disparate impact cases reflect the belief that disparate impact and intent are different. They allow liability in the absence of discriminatory *1216intent. See Griggs, 401 U.S. at 431-32, 91 S.Ct. at 853-54. Intent means doing something because of, not in spite of, a particular consequence. Personnel Administrator v. Feeney, 442 U.S. 256, 279, 99 S.Ct. 2282, 2296, 60 L.Ed.2d 870 (1979). That means using wage to get at age. Metz did not claim that Transit Mix used his wage as a smokescreen; the record shows, and the district court found, that Transit Mix used Metz’s wage with indifference to his age, rather than because of it. Feeney and Washington v. Davis, 426 U.S. 229, 239-45, 96 S.Ct. 2040, 2047-50, 48 L.Ed.2d 597 (1976), reject the equation between disparate impact and intent on which my colleagues’ conclusion depends. Both cases reverse decisions that had equated the two, or used disparate effect as the sole basis of inferring intent.
The two approaches are related in the sense that if in a disparate impact case a court declares that a particular employer may not insist that bricklayers have high school degrees, that employer could not respond to a later disparate treatment claim by saying “I did not hire Smith because he lacked a high school degree.” But it is not the law that if Duke Power Co. cannot demand a high school degree of its janitors, Boeing cannot demand a high school degree of its engineers. Each new job, each new employer, requires a separate inquiry. My colleagues have fused disparate treatment and disparate impact rules in such a way that one employer’s loss in a disparate impact case means that no employer can use a particular ground of decision in a disparate treatment case. The Supreme Court has declined to allow this fusion. For example, Furnco Construction Corp. v. Waters, 438 U.S. 567, 98 S.Ct. 2943, 57 L.Ed.2d 957 (1978), held that although the employer had used a number of arbitrary and subjective factors in hiring employees, factors that may well have had a disparate impact on minority applicants, any given applicant’s suit still had to satisfy the requirements of a disparate treatment case. See also Teamsters v. United States, 431 U.S. 324, 367-71, 97 S.Ct. 1843, 1870-73, 52 L.Ed.2d 396 (1977). Once a plaintiff makes out a class-wide pattern, the court may shift to the employer the burden of showing that the discrimination did not affect the plaintiff personally; until then, each plaintiff must carry the burden on intent and causation. See Furnco, Burdine, and Aikens. The melding of the two strands of discrimination law effectively relieved Metz of his burden — indeed has allowed him to prevail even though the employer advanced, and the trier of fact credited, a sufficient reason utterly unrelated to his age. This unfortunate outcome is the wages of conceptual confusion.
IV
Perhaps, however, we could abandon the disparate treatment model in cases of this sort. I now inquire how Metz should fare if we were to explore that possibility. The ADEA was enacted in 1967, before the first of the disparate impact cases (Griggs, in 1971), so we cannot be confident that the Act adopts this method. Let us assume for the moment, however, that because of the parallel language in Title VII and § 4(a) of the ADEA this approach governs and ask whether it applies to decisions based on the relation between an employee’s wage and his productivity. I return toward the end of this opinion to the question whether there should be a disparate impact model in ADEA cases.
Griggs, which creates disparate impact analysis, identifies as the source of concern “practices, procedures, or tests neutral on their face” (401 U.S. at 430, 91 S.Ct. at 853) that affect groups differently. These tests or practices might be unrelated to any legitimate need of the business, and if so would be the kind of “artificial, arbitrary, and unnecessary barriers to employment” (id. at 431, 91 S.Ct. at 853) that are “discriminatory in operation” (ibid.). Griggs gives the employer two options when a test or practice affects a group of employees adversely: to “validate” the test (that is, to show that it predicts performance on the job and hence is not “arbitrary” and “unnecessary”) or to show “business necessity” (ibid.). If the test or practice is not “unrelated to measuring job capability” (id. at 432, 91 S.Ct. at 854) or bears a “manifest *1217relationship to the employment in question” (ibid.) or is necessary, it may be used; otherwise not.
Since Griggs both the EEOC and the courts have produced an ocean of regulations and opinions trying to define what validity means and how necessitous an employer need be to use an unvalidated test. E.g., New York City Transit Authority v. Beazer, 440 U.S. 568, 587 n. 31, 99 S.Ct. 1355, 1366 n. 31, 59 L.Ed.2d 587 (1979); Dothard, 433 U.S. at 328-37, 97 S.Ct. at 2726-31; Davis, 426 U.S. at 249-52, 96 S.Ct. at 2052-53; Albemarle Paper Co. v. Moody, 422 U.S. 405, 425-36, 95 S.Ct. 2362, 2375-80, 45 L.Ed.2d 280 (1975). Sometimes validation requires careful statistical testing; sometimes the job-relatedness of a test or device is apparent. Beazer, for example, holds that a subway system may decline to hire methadone users, although that has a racially disparate impact, because freedom from drugs is obviously related to safety. We held in Aguilera v. Cook County Police and Corrections Merit Board, 760 F.2d 844 (7th Cir.1985), that a government may make a high school degree a requirement for jobs in the police or the jail guard forces — even though Griggs itself dealt with a high school degree rule— because a minimum level of education is essential in such jobs.
Disparate impact analysis under Griggs has three steps: (1) identifying a test, device, or practice; (2) establishing that the test, device, or practice adversely affects a group protected by the statute; (3) assessing the validity or business necessity of a test that has disparate impact. The application of this approach to wage discrimination encounters problems at each step.
Where is the class-wide test, practice, or device? Decisions based on the relation between the value of the employee’s work and the pay he receives for it are scarcely arbitrary; to the contrary, they are essential in every business. This is individualized decisionmaking, the opposite of the rote and pointless tests the Supreme Court had in mind in Griggs. It is not a test, device, or practice at all.
Where is the disparate impact of considering wages? It is true, as my colleagues observe, that the average employee’s income tends to increase with age. Some employees regress (for example, lawyers earn less as judges than in the practice) but the usual direction is up. See Gary S. Becker, Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education 219 (2d ed. 1975) (time series analysis of people who completed their education in a single year); Jacob Mincer, Schooling, Experience, and Earnings 64-82, 101 (1974) (many time series analyses for different levels of education). This may occur because people do better work as time goes by, because they are better matched to their jobs, see Boyan Jovanovic, Job Matching and the Theory of Turnover, 87 J.Pol.Econ. 972 (1979), or because of other factors. Robert Topel, Job Mobility, Search, and Earnings Growth: A Reinterpretation of Human Capital Earnings Functions, 8 Research in Labor Economics 199 (1986).
The change in each person’s income with time is not the sense relevant to disparate impact analysis, however. We want to know whether wages for the body of employees rise with age: that is, do 50-year-old employees earn more than 40-year-olds at any given moment? Would an employer, bent on slashing costs, find older employees’ wages the most attractive target? This is a question about the profile of an employer’s wage bill by the age of its employees. The wage-age profile in a cross-sectional analysis (that is, the data obtained from a snapshot of everyone’s wages at an instant of time) shows wages rising through age 40 and thereafter declining. E.g., Lloyd G. Reynolds, Stanley H. Masters & Colletta H. Moser, Labor Economics and Labor Relations 234 (9th ed. 1986) (national cross section of annual wages of all employees in 1982). In 1982, employees age 60 had roughly the same average income as employees age 30. This is so in part because younger employees are better educated and therefore start at higher wages than employees did a generation ago. Data from 1981, reproduced in the National Commission for Employment Poli*1218cy’s Ninth Annual Report: Older Workers: Prospects, Problems and Policies 16 (1985), have a similar pattern, though they show that employees aged 40-44 have the highest earnings. See Figure 1. The Bureau of Labor Statistics’ estimate for March 1987 has the same pattern. BLS News (Apr. 28, 1987).
[[Image here]]
No matter why the cross-section looks as it does, however, the foundation for a disparate impact analysis is shaky; the most one can say is that the average wage at (say) 50 is higher than the average wage at 25, even though the cross-section shows leveling off or decline after age 40. The data for hourly wages show much the same pattern. Census data from 1984 covering workers paid by the hour give median wages as follows:
Ages Hourly Wage
16-19 $3.64
20-24 $4.94
25-29 $6.52
30-34 $7.23
35-39 $7.37
40-44 $7.17
45-49 $7.23
Ages Hourly Wage
50-54 $7.20
55-59 $6.85
60-64 $6.45
65-69 $4.95
70+ $4.38
Department of Labor, Bureau of Labor Statistics, Monthly Labor Review 22 (Feb. 1986) (median covers all races and both sexes). No evidence in the record of this case shows that the wage-age profile at Transit Mix has an upward slope. If Metz wanted to use a disparate impact analysis, he should have built the statistical foundation. It is not appropriate to take judicial notice of a wage-age profile that creates a disparate impact problem — especially when the court is “noticing” something that is not true. See Fed.R.Evid. 201(b)(1) (judicial *1219notice appropriate only when the fact is “not subject to reasonable dispute”).
Finally, what happened to the search for validation or business necessity? Griggs does not condemn all tests or devices with disparate impact; it forbids only those that are not valid (job-related) or supported by strong business reasons. It is hard to imagine how the use of wages could not be valid; wages correspond precisely to the costs of doing business, and hence to profitability. We might have a validity problem if an employer tried to slash wages without regard to the employee’s performance. For example, a wage reduction for all employees over 60 could reflect the stereotypical belief that no one over 60 can do the job well. This is the kind of concern reflected in EEOC v. Chrysler Corp., 733 F.2d 1183 (6th Cir.1984), on which the majority relies. But a use of wage in relation to job performance — which is how Transit Mix used Metz’s wage — is “valid” almost by definition. This is why the district court was right to see a difference between across-the-board decisions and employee-by-employee decisions. My colleagues, however, have written validation out of the disparate impact test. Their opinion does not suggest that it matters whether Transit Mix had a sound business reason for taking Metz’s wages into account. Indeed, the holding of this case is that an employer may not replace an employee with one willing to accept lower pay, even though it has a sound business reason. Nothing in Griggs or any other case in the Supreme Court’s disparate impact sequence supports this.
If neither the text of the ADEA nor the disparate impact cases under Title VII support Metz, what about the ADEA’s legislative history? Little of the history is pertinent. None of the committee reports discusses the extent to which employers may take salary into account in making decisions. To the extent the legislative history addresses the subject, it suggests that employers may consider the costs of hiring older employees — and § 4(f)(2), 29 U.S.C. § 623(f)(2), writes into the statute the permission to use age as a ground of decision when costs so dictate. Section 4(f)(2) provides that employers may use age in, for example, designing insurance plans: term life insurance costs much more for 65-year-old employees than for 25-year-old employees, and § 4(f)(2) permits employers to consider that in designing packages of benefits. Senator Javits’s additional statement in the Senate Report, S.Rep. 90-723, 90th Cong., 1st Sess. 14 (1967), applauds § 4(f)(2) on the ground that without it “employers might actually have been discouraged from hiring older employees because of the increased costs involved in providing certain types of benefits to them.” See also, e.g., EEOC v. Borden’s, Inc., 724 F.2d 1390, 1395-96 (9th Cir.1984) (discussing the legislative history of § 4(f)(2)); Hearings Before the Subcommittee on Labor of the Senate Committee on Labor and Public Welfare, 90th Cong., 1st Sess. 30 (1967) (statement of Sen. Smathers, co-sponsor of the ADEA).
The assumption behind § 4(f)(2) is that without an explicit privilege to use age in the design of welfare and pension plans, the higher costs of fringe benefits for older persons would be a legitimate reason not to employ them. Secretary of Labor Wirtz, the Johnson Administration’s chief spokesman on the ADEA, made that point explicitly in both the Senate and the House hearings. See Senate Hearings at 49 (the higher costs of training an older employee for a job would be a “legitimate factor” for an employer to consider); Age Discrimination in Employment: Hearings Before the General Subcommittee on Labor of the House Committee on Education and Labor, 90th Cong., 1st Sess. 14 (1967) (an “unavoidable” differential effect on an employer’s payroll is a legitimate factor to be considered under the legislation).
The floor debate was inconclusive. Several members of Congress expressed concern that employers were taking the higher cost of older labor into account, but in the context of remarks that the employers did not appreciate that older workers still did good work. What were these members getting at?: that it is forbidden to look at an employee’s salary, or that it is forbidden to judge an employee by his age rather *1220than by his ability to perform the work? The latter theme predominates.
The structure of the Act accords with its history. Section 4(a) parallels Title VII in some respects but is different in others. One striking difference is § 4(f)(1), which says that “reasonable factors other than age” may be the basis of decision — implying strongly that the employer may use a ground of decision that is not age, even if it varies with age. What else could be the purpose of this language? Surely it does not mean simply that “only age discrimination is age discrimination.” “The prohibition and the exception appear identical. The sentence is incomprehensible unless the prohibition forbids disparate treatment and the exception authorizes disparate impact.” Douglas Laycock, Continuing Violations, Disparate Impact in Compensation, and Other Title VII Issues, 49 L. & Contemp.Prob. 53, 55 (Aut.1986) (referring to the identical structure of the Equal Pay Act of 1963). In Washington: County v. Gunther, 452 U.S. 161, 170-71, 101 S.Ct. 2242, 2248-49, 68 L.Ed.2d 751 (1981), the Court concluded that the “factor other than sex” language in the Equal Pay Act has independent significance. See also Los Angeles v. Manhart, 435 U.S. 702, 710-11 n. 20, 713 n. 24, 98 S.Ct. 1370, 1376-77 n. 20, 1377-78 n. 24, 55 L.Ed.2d 657 (1978), holding the “factor other than sex” exception to the Equal Pay Act precludes reliance on disparate impact analysis. Should not the parallel structure of the ADEA, enacted four years later, yield the same result?
There are other differences between Title VII and the ADEA. For example, § 4(f)(2) allows age to be used explicitly. Then there is § 4(f)(3), stating that an employer may discharge anyone for “cause”— another clause missing from Title VII. “Cause”, like “qualified”, is a continuous rather than dichotomous variable; not being productive enough to cover your wage is “cause”.
The language, structure, and history of the ADEA have led thoughtful people to conclude — with the district court in our case — that employers may consider wages in light of job performance, even that disparate impact analysis is inapplicable in ADEA cases. E.g., Albert Calille, Three Developing Issues of the Federal Age Discrimination in Employment Act of 1967, 54 Detroit J. Urban L. 431, 444 (1977); Donald R. Stacy, A Case against Extending the Adverse Impact Doctrine to ADEA, 10 Employee Relations L.J. 437 (1985); Note, The Cost Defense Under the Age Discrimination in Employment Act, 1982 Duke L.J. 580. Not to mention the views of Schlei and Grossman, the authors of the leading text on the law of employment discrimination, see Employment Discrimination Law 506 (2d ed. 1983), which directly support the district court’s decision. See also EEOC v. Wyoming, 460 U.S. 226, 233, 103 S.Ct. 1054, 1058, 75 L.Ed.2d 18 (1983) (describing § 4(f)(1) as permitting employers “to use neutral criteria not directly dependant [sic] on age”) (emphasis added); Markham v. Geller, 451 U.S. 945, 948, 101 S.Ct. 2028, 2030, 68 L.Ed.2d 332 (1981) (Rehnquist, J., dissenting from the denial of certiorari).
All of this does not deny the force of the position, expressed in Chrysler and similar cases, including Leftwich v. Harris-Stowe State College, 702 F.2d 686, 691 (8th Cir. 1983), that the ADEA forbids use of wage as a euphemism for age. In the words of EEOC v. Wyoming, when wage is “directly dependent on” age, the use of one is no better than the use of the other. Some colleges (and law firms) use lock-step compensation systems. The wage is a function of age and age only. For such an employer the statements “we are firing all professors with salaries above $35,000” and “we are firing all professors older than 65” are identical. Courts should treat them as identical. That is not remotely what Transit Mix did, however: it shut a poorly performing plant and fired its manager.
A growing literature on education, training, employment, and other aspects of human capital suggests that there may be times when employers will pay wages that do not represent the employees’ marginal products. For example, while receiving firm-specific training the employee may receive a wage exceeding his product; this is how the firm finances the training (for *1221which the employee will hot pay, because it has no use outside the firm). Later the firm will recoup its investment by paying less than the marginal product. See Becker, Human Capital 26-37, 216-23. Other firms that give their employees access to trade secrets or put them in positions of trust may try to cement the employees’ loyalty (or honesty) with “golden handcuffs” — wages in excess of the employees’ marginal product, a form of special compensation the employee forfeits if he leaves the firm. E.g., Gary S. Becker & George J. Stigler, Law Enforcement, Malfeasance, and Compensation of Enforcers, 3 J.Legal Studies 1 (1974). Still other firms may pay employees slightly less than their marginal product early in their careers, knowing that as each employee’s productivity declines at the end of his career, the firm will be paying more than marginal product (thus paying the employee his due over the life cycle). This gives employees strong reasons to stick with their firms and be more productive throughout their careers, which in turn yields society the benefit of everyone’s abilities. Edward P. Lazear, Agency, Earnings Profiles, Productivity and Hours Restrictions, 71 Am.Econ.Rev. 606 (1981); Robert Hutchens, Delayed Payment Contracts and a Firm’s Propensity to Hire Older Workers, 4 J.Labor Econ. 439 (1986). But cf. Peter Kuhn, Wages, Effort, and Incentive Compatibility in Life-Cycle Employment Contracts, 4 J.Labor Econ. 28 (1986).
Whenever the age-wage profile of a class of employees includes a period of compensation at more than marginal product, the firm may be inclined to behave opportunistically — to fire the employee as soon as his current productivity no longer covers his current wage. A firm’s desire to attract new employees will curtail this opportunism, to the extent new hires learn of the firm’s reputation (or depend on a union to police the firm’s behavior). When the firm encounters economic trouble or for some other reason plans to shrink, it need not worry about scaring away bright new employees; it is out of that market. The distressed or shrinking firm may try to dispose of higher paid, older employees, cheating them out of the high compensation at the end of their careers. A disparate impact approach under the ADEA might help to curtail this opportunism. Whether it would do so well enough in light of the substantial error costs the inquiry would entail, I need not consider, for this approach would not assist Metz even if it were the law. Metz does not contend that Transit Mix was changing the structure of its compensation so as to exploit its older employees. And my colleagues apparently would allow Transit Mix to do so, if it wanted — it could reduce Metz’s salary, they say.
The court’s invitation to employers to cut wages the next time they are in a situation like the one Transit Mix encountered not only fails to protect older employees against the principal danger they face but also creates an anomaly. If it would have been legitimate to reduce Metz’s wage, why can he collect damages in this case? Presumably Transit Mix would be entitled to reduce Metz’s wage to what he could command in the market from another employer. Metz in fact took another job two months after Transit Mix put Burzloff in charge of the Knox plant. His new job, with the Starke County Highway Commission, pays about $12,500 per year. Metz wants the difference between $26,000 (his salary at Transit Mix) and $12,500; but if Metz was worth only $12,500 in the market arid Transit Mix could have cut his salary to that level, he should collect nothing.
We need not worry about damages, however. Metz had to show discriminatory intent and causation; he showed neither. He showed, at most, a decision by Transit Mix to disregard his age. That is the opposite of age discrimination. My colleagues avoid this only by merging disparate treatment and disparate impact analysis — which they ought not, because the premises of these approaches are different, because Feeney and Davis hold that they are different, because Manhart holds that “factor other than [age]” language forecloses disparate impact analysis, because this circuit already has held in Tice and Dorsch that employers may consider wages, and be*1222cause the factual premises of the disparate impact approach do not obtain. My colleagues do not even cite these cases. If we are to buck both the language of the statute and the holdings of the Supreme Court, we ought to do so only when the facts are on our side.
Dale and Sherkow did not arise under the ADEA, but like the majority I assume that the principles established by other antidiscrimination statutes apply to cases under the ADEA. On the role of causation, see also, e.g., Lewis v. University of Pittsburgh, 725 F.2d 910, 915-17 (3d Cir.1983), and Toney v. Block, 705 F.2d 1364 (D.C.Cir.1983) (Scalia, J.). There is a potentially difficult problem in allocating the risk of non-persuasion about causation. Compare Toney with Hopkins v. Price Waterhouse, 825 F.2d 458, 469-72 (D.C.Cir.1987), with id. at 464 (Williams, J., dissenting). But the parties have not briefed *1212this question, and so I do not consider it further.