755 F.Supp. 1206 (1990)

McNEIL-P.P.C., INC., Plaintiff,
v.
BRISTOL-MYERS SQUIBB COMPANY, INC., Defendant.

No. 90 Civ. 5669(JMC).

United States District Court, S.D. New York.

December 17, 1990.

*1207 Patterson, Belknap, Webb & Tyler by Gregory L. Diskant, New York City, for plaintiff.

Skadden, Arps, Slate, Meagher & Flom by Kenneth A. Plevan, New York City, for defendant.

MEMORANDUM AND ORDER

CANNELLA, District Judge:

Plaintiff's request for injunctive relief under the Lanham Act is granted. 15 U.S.C. § 1125(a) (1988).

BACKGROUND

Defendant Bristol-Myers Squibb Company, Inc. ["Bristol-Myers"] is one of the leading sellers of over-the-counter ["OTC"] internal analgesics. For approximately thirty years, the company has been marketing Excedrin, an analgesic containing 250 milligrams ["mg."] aspirin, 250 mg. acetaminophen ["APAP"] and 65 mg. of caffeine per tablet. In May 1990, Bristol-Myers introduced Aspirin Free Excedrin ["AF Excedrin"], an OTC analgesic containing 1000 mg. APAP and 130 mg. caffeine in a two tablet dose.

McNeil-P.P.C., Inc. ["McNeil"] manufactures Tylenol and is the market leader in OTC analgesics. In 1989, McNeil's adult Tylenol products had a 22.9% share of the market, while its closest competitor, Advil, had only a 10.2% share of the market. Extra-Strength Tylenol ["ES Tylenol"] contains 1000 mg. APAP in a two tablet dose. The only difference between AF Excedrin *1208 and ES Tylenol is that AF Excedrin contains caffeine.

On August 30, 1990, Bristol-Myers launched an aggressive television advertising campaign for AF Excedrin. The advertising directly compares AF Excedrin with ES Tylenol and asserts that AF Excedrin works better than ES Tylenol. Before Bristol-Myers began its television advertising campaign, it distributed promotional materials to drug retailers which also asserted AF Excedrin's superiority over ES Tylenol. Bristol-Myers plans to spend approximately $10 million on its commercial advertising for AF Excedrin. A storyboard of one commercial currently appearing on a national television network is reprinted below:

*1209

*1210 McNeil commenced the instant action on September 4, 1990, alleging that Bristol-Myers' advertising for AF Excedrin is false. McNeil amended its complaint to add an allegation that the advertising conveys a false message to consumers as to the magnitude of difference between AF Excedrin and ES Tylenol. McNeil contends that Bristol-Myers' advertisements and promotional materials for AF Excedrin violate section 43(a) of the Lanham Act, 15 U.S.C. § 1125(a) (1988) and sections 349 and 350 of the New York General Business Law, N.Y.Gen.Bus.Law §§ 349, 350 (McKinney 1988).^[1] McNeil seeks a preliminary and permanent injunction (1) enjoining Bristol-Myers from claiming in its advertisements that (A) AF Excedrin relieves headache or other pain better than ES Tylenol or (B) scientific tests establish that AF Excedrin relieves headache or other pain better than ES Tylenol and (2) directing Bristol-Myers to recall advertisements or promotional materials which claim that AF Excedrin is superior to ES Tylenol.

Pursuant to Rule 65(a)(2) of the Federal Rules of Civil Procedure, the Court consolidated a hearing on McNeil's motion for a preliminary injunction with a trial on the merits.^[2] The following opinion constitutes the Court's findings of fact and conclusions of law on plaintiff's motion for injunctive relief.

DISCUSSION

I. Standards Governing Injunctive Relief Under Section 43(a) of the Lanham Act

Section 43(a) of the Lanham Act provides in pertinent part:

Any person who shall affix, apply, or annex, or use in connection with any goods or services, ... any false description or representation ... and shall cause such goods or services to enter into commerce ... shall be liable to a civil action ... by any person who believes that he is or is likely to be damaged by the use of any such false description or representation.

15 U.S.C. § 1125(a) (1988). To establish a violation of section 43(a) the plaintiff has the burden of proving by a preponderance of the evidence that an advertisement is literally false, or that it has a tendency to mislead or deceive. See Coca-Cola Co. v. Tropicana Prods., Inc., 690 F.2d 312, 314-15, 317-18 (2d Cir.1982); Johnson & Johnson v. Carter-Wallace, Inc., 631 F.2d 186, 192 (2d Cir.1980); American Home Prods. Corp. v. Johnson & Johnson, 577 F.2d 160, 165-66 (2d Cir.1978). A merchant's description of its product may be false or misleading in either the description of the product itself or in its comparison to a product manufactured by a competitor. See McNeilab, Inc. v. American Home Prods. Corp., 848 F.2d 34, 38 (2d Cir.1988).

When an advertising claim is alleged to be false, the court may grant relief based on its own findings without reference to the consumer's reaction to the advertisement. See Coca-Cola Co., 690 F.2d at 317; American Home Prods. Corp., 577 F.2d at 165. Plaintiff is not entitled to relief merely by showing that the clinical tests or other evidence relied on by the defendant to support its superiority claim are unpersuasive. See Procter & Gamble Co. v. Chesebrough-Pond's Inc., 747 F.2d 114, 119 (2d Cir.1984). Rather, *1211 plaintiff must adduce evidence that the advertisement is false. See id. A claim of test proven superiority may be deemed false if it is shown that the clinical research purportedly supporting the representation was not sufficiently reliable to permit the reasonable conclusion that the research established the claim made. See id. Similarly, "representations found to be unsupported by accepted research or which are contradicted by prevailing authority or research, may be deemed false on their face and actionable under section 43(a) of the Lanham Act." Alpo Petfoods, Inc. v. Ralston Purina Co. 720 F.Supp. 194, 213 (D.D.C.1989), aff'd in part, rev'd in part on other grounds, 913 F.2d 958 (D.C.Cir. 1990).

It is well settled that section 43(a) of the Lanham Act encompasses more than literal falsehoods. See, e.g., American Home Prods. Corp., 577 F.2d at 165. To determine whether a statement that is literally true is misleading the court's reaction is not determinative; rather, the court must consider whether the message that is conveyed to the public is beyond its literal meaning and misleading or deceptive. See Avis Rent A Car System, Inc. v. Hertz Corp., 782 F.2d 381, 386 (2d Cir.1986); American Home Prods. Corp., 577 F.2d at 165-66. In making its decision, the court first considers the literal meaning of the advertisement and then as the finder of fact decides whether the evidence shows that the public is likely to be mislead. See American Home Prods. Corp. v. Johnson & Johnson, 654 F.Supp. 568, 590 (S.D.N.Y. 1987); McNeilab, Inc. v. American Home Prods. Corp., 501 F.Supp. 517, 525 (S.D.N. Y.1980). The extent to which consumers are deceived need not be established to support the finding that an advertisement tends to mislead, all that is required is a "qualitative showing [to] establish that a not insubstantial number of consumers received a false or misleading impression from it." McNeilab, Inc., 501 F.Supp. at 528. Surveys may be used to demonstrate the meaning an advertisement has to the target audience, but whether the survey is of probative value depends on its fundamental fairness and objectivity. See American Home Prods. Corp., 654 F.Supp. at 590.

To obtain injunctive relief under the Lanham Act, a party must demonstrate a "likelihood of deception or confusion on the part of the buying public" as a result of the product's false or misleading description. See Burndy Corp. v. Teledyne Indus. Inc., 748 F.2d 767, 772 (2d Cir.1984). It is well settled that such a likelihood of confusion sufficiently establishes both the requisite irreparable harm and likelihood of success on the merits typically required for the issuance of injunctive relief. See, e.g., Hasbro, Inc. v. Lanard Toys, Ltd., 858 F.2d 70, 73 (2d Cir.1988); Home Box Office, Inc. v. Showtime/Movie Channel, Inc., 832 F.2d 1311, 1314 (2d Cir.1987); Standard & Poor's Corp. v. Commodity Exchange, Inc., 683 F.2d 704, 708 (2d Cir. 1982). Irreparable harm is generally presumed for Lanham Act violations because a false comparison to a specific product reduces the consumers' incentive to purchase that product. Thus, a competitor in the relevant market may obtain permanent injunctive relief by showing a likelihood of competitive injury resulting from the false advertising. See Johnson & Johnson, 631 F.2d at 190-91.

McNeil's entitlement to injunctive relief depends on whether McNeil is able to show by a preponderance of the evidence that AF Excedrin does not relieve headache pain better than ES Tylenol. More specifically, since both AF Excedrin and ES Tylenol contain the same amount of APAP, McNeil must demonstrate that the presence of caffeine in AF Excedrin fails to make AF Excedrin superior to ES Tylenol in relieving headache pain.

II. The Food and Drug Administration's Review of Caffeine's Effectiveness as an Analgesic Adjuvant

The Food and Drug Administration [the "FDA"] reviews OTC drugs for safety and effectiveness. The FDA recognizes that APAP is an effective analgesic. In addition, it is well settled by the FDA that caffeine acting alone is not effective in *1212 relieving headache pain. See 42 Fed.Reg. 35482 (1977). However, the FDA has not reached a conclusive determination regarding caffeine's effectiveness as an analgesic adjuvant.^[3] The FDA's review of OTC drugs is conducted in four stages. In the first stage, the FDA appoints an advisory panel of independent scientists who analyze existing data and make recommendations to the FDA in the form of monographs, which set forth the drug's safety and effectiveness. See 21 C.F.R. § 330.10(a)(1)-(5) (1990). Next, the FDA reviews the monographs and makes them available for public comment. See id. § 330.10(a)(6). In the third stage the FDA publishes a tentative final monograph and affords the public an opportunity to object to its findings. See id. § 330.10(a)(7)-(8). In the final stage, the FDA issues a final monograph establishing the FDA's conclusive and legally binding determination regarding the conditions under which the drug is considered safe and effective. See id. § 330.10(a)(9).

The monographs categorize the drugs into three groups: Category I, which includes drugs recognized as safe and effective; Category II, which includes drugs recognized as unsafe and ineffective; and Category III, which includes drugs for which the available data are insufficient to warrant classification in either of the other categories. See id. § 330.10(a)(5)(iii).

The FDA began reviewing caffeine's effectiveness as an analgesic adjuvant in 1977. In 1988, the FDA issued a tentative final monograph placing caffeine's effectiveness as an analgesic adjuvant in Category III. See 53 Fed.Reg. 46204, 46248 (1988). Thus, to date the FDA has not made a substantiated and conclusive determination regarding caffeine's efficacy as an analgesic adjuvant.

In view of the inconclusive nature of the FDA's findings to date, the parties rely on their interpretations of test data to substantiate their claims. To support its claim regarding the falsity of Bristol-Myers' superiority contention McNeil relies on statistical analyses of two headache studies performed by Bristol-Myers in 1988 and 1989 [the "AF Excedrin studies"]. To support its second claim regarding Bristol-Myers' deception in the magnitude of the claimed superiority McNeil relies on a survey that it commissioned to test consumer reactions to the AF Excedrin commercial.

Bristol-Myers argues that its advertising claims are true, relying principally on the AF Excedrin headache studies it performed. In addition, Bristol-Myers relies on headache studies it performed in 1986 and 1987 [the "Excedrin studies"], a study conducted by Whitehall Laboratories, the manufacturer of Anacin [the "Whitehall study"], and a 1983 analysis of thirty clinical studies ["the meta-analysis study"] to establish caffeine's effectiveness as an analgesic adjuvant. In response to McNeil's second claim, Bristol-Myers contends that the AF Excedrin commercials makes no quantitative claim regarding the magnitude of difference between AF Excedrin and ES Tylenol.

III. Studies Demonstrating Caffeine's Effectiveness as an Analgesic Adjuvant

The AF Excedrin studies are the only studies that directly compare AF Excedrin with ES Tylenol. However, Bristol-Myers argues that the Excedrin studies, the Whitehall study and the meta-analysis support its claim that AF Excedrin is superior to ES Tylenol in relieving headache pain. McNeil argues that the these studies are not relevant in determining the truth or falsity of Bristol-Myers' advertising claim.

A. The Excedrin Studies

The Excedrin studies consisted of four headache studies comparing Excedrin, containing 500 mg. APAP, 500 mg. aspirin and 130 mg. caffeine per two tablet dose, with ES Tylenol. McNeil argues that the Excedrin studies cannot be used to support Bristol-Myers' advertising claim because Excedrin contains both APAP and aspirin *1213 while AF Excedrin contains only APAP. While aspirin and APAP are equivalent pain relievers, McNeil contends that because the two analgesics operate in different ways, there is no scientific basis for concluding that caffeine will interact with APAP in the same way it would interact with aspirin. McNeil's contention is consistent with the view espoused by the FDA. In 1983, William E. Gilbertson, the director of the OTC drug evaluation at the FDA, wrote to Bristol-Myers setting forth the type of study the FDA would consider in evaluating the effectiveness of caffeine as an analgesic adjuvant. One requirement was that "[t]he enhancement by caffeine of the analgesic activity of acetaminophen and aspirin must be established separately for each analgesic ingredient." Plaintiff's Exh. 19, at 1 (Letter to G. Blewitt, vice-president of Bristol-Myers, from W. Gilbertson, Director of the FDA's Division of OTC Evaluation, dated Aug. 5, 1983). Defendant's clinical expert, Professor Beaver agreed that the Excedrin studies did not follow the FDA's requirement because they compared a product containing aspirin, APAP and caffeine with a product containing only APAP. See Trial Transcript ["Tr."], at 481. Indeed, Professor Beaver agreed that if the two AF Excedrin headache studies showed no evidence of superiority he could not say that it had been proven that AF Excedrin was superior to ES Tylenol.^[4]See id. at 472. Accordingly, the Court finds that the Excedrin studies are not relevant to establishing the truthfulness of Bristol-Myers' advertising claim.

B. The Whitehall Study

In 1989, Whitehall Laboratories, the manufacturer of Anacin, submitted a study to the FDA establishing the superiority of two tablets of Anacin, containing 800 mg. aspirin and 64 mg. caffeine, to 800 mg. aspirin. As with the Excedrin studies, the Whitehall study does not follow the FDA criterion because it evaluates caffeine's effectiveness as an adjuvant with aspirin and not APAP. Thus, the Court finds that the Whitehall study cannot be relied upon by Bristol-Myers to show AF Excedrin's superiority to ES Tylenol.

C. The Meta-Analysis

In 1984, the Journal of the American Medical Association published a meta-analysis in which clinicians analyzed thirty clinical studies and concluded that caffeine was an effective analgesic adjuvant. The FDA requires that the studies relied upon to establish caffeine's effectiveness as an analgesic adjuvant use the doses commonly used by the consumer. Professor Beaver agreed that the meta-analysis used data in which the doses of both caffeine and the analgesic were outside of the OTC range. See Tr. at 482. Thus, the Court finds that these studies are not persuasive in establishing the truthfulness of Bristol-Myers' advertising claim.

IV. The Two AF Excedrin Headache Studies

To meet its burden of proof McNeil relies upon the AF Excedrin studies conducted by Bristol-Myers. From February 1988 through January 1989, Bristol-Myers conducted two headache studies comparing the efficacy of AF Excedrin and ES Tylenol at the two tablet dose.^[5] At this dose each product contains 1000 mg. of APAP, but AF Excedrin also contains 130 mg. of caffeine.^[6] The design of the two headache *1214 studies were based on an article published by Gary G. Koch, entitled "A Two-Period Crossover Design for the Comparison of Two Active Treatments and Placebo" [the "Koch article"]. See Plaintiff's Exh. 102.

Bristol-Myers argues that the data illustrates a statistically significant difference between the two products in favor of AF Excedrin.^[7] McNeil argues that the AF Excedrin headache studies show no statistical difference between the two products and, therefore, Bristol-Myers' advertising claim is false. McNeil's argument is two-fold. First, McNeil contends that Bristol-Myers' conclusions from the test data are not justified because Bristol-Myers failed to consider the existence of a carryover effect. McNeil then contends that upon taking carryover into account, the data shows no statistically significant difference between the two products.

A. The Concern for Carryover Generally

To properly evaluate McNeil's argument requires an understanding of carryover in clinical trials. Carryover is a problem that arises in crossover studies. In a crossover study the patients take one study drug in "period one" and another study drug in "period two." Thus, in a crossover trial each subject receives at least two different treatments. The crossover design is in contrast to a parallel design in which the patients are divided into two groups and each group takes a separate study drug. Period one of a crossover trial is equivalent to a parallel trial. The AF Excedrin headache studies are based on a crossover design. Each subject first took either ES Tylenol, AF Excedrin or a placebo in period one and then took a different product in period two.

The principal advantage of a crossover design is that each patient serves "as his own control," receiving both treatments so that his response to both treatments can be compared. See Plaintiff's Exh. 106, at 19 (M. Kenward & B. Jones, Design and Analysis of Crossover Trials (1989) ["M. Kenward & B. Jones"]). In addition, the number of patients needed to detect a difference between the two treatments is less than if a parallel design was used. See id. at 20. However, it is well recognized among the experts in the field that crossover designs contain a potential danger, called a carryover effect. See Plaintiff's Exh. 105, at 1 (article by M. Hills & P. Armitrage, entitled "The Two-Period Crossover Clinical Trial") ["Hills & Armitrage"]; Plaintiff's Exh. 107, at 1 (article by T. Louis et al., entitled "Crossover and Self-Controlled Designs in Clinical Research"). Carryover occurs when the effect of a treatment given in period one is present in period two. As a result, the patient's response in period two is biased. See Plaintiff's Exh. 106, at 4 (M. Kenward & B. Jones). Similarly, carryover does not exist when the results in period one are comparable to the results in period two:

[I]n statistical jargon ... [there is no carryover where] there ... [is] no interaction between the difference in effects of the two treatments and the period of administration, i.e., that the difference in the effects of the two treatments be the same in the first time period as in the second. Another way of stating the assumption [of no carryover], more understandable to the clinician, is that the administration of the first treatment must leave no lasting effect on the patient, so that the patient is virtually a virgin patient, again ready for treatment, regardless of his treatment and response in period one. *1215 Plaintiff's Exh. 100, at 20 (article by B. Brown, entitled "Statistical Controversies in the Design of Clinical Trials Some Personal Views"). The concept that there should be no difference in effects of the two treatments in both periods can best be explained by considering a basic crossover model:

                          Group A         Group B
           Period 1     Treatment X     Treatment Y
           Period 2     Treatment Y     Treatment X

In period one, group A patients rate the effectiveness of treatment X, while group B patients rate the effectiveness of treatment Y. The difference in effectiveness between the two drugs in period one is then calculated. In period two, group A patients rate the effectiveness of treatment Y and group B patients rate the effectiveness of treatment X. The difference in effectiveness between the two drugs in period two is then calculated. In a crossover study that is not infected with carryover, the difference between the two treatments in period one should be the same as the difference in period two.

There are several different possible explanations for the presence of carryover. One reason is pharmacological, that is, carryover results because the physical effects of the first treatment are still present when the subject enters the second treatment period. Here, pharmacological carryover is not an issue due to the short physical effect of the drugs and the seven day washout period between the therapies.

The key dispute between the parties is whether psychological carryover exists, that is, carryover caused by the fact that the attitudes of the subjects as they enter the second treatment period are influenced by their experiences in the first treatment period. For example, if a patient takes a placebo in period one and an active drug in period two to relieve pain, when the patient enters period two he may feel unhappy because he did not get any relief for his pain in period one. This feeling could affect how the patient rates the active drug in period two. If psychological carryover exists, then the results in the second period are distorted due to the experience that the patient had in the first period.

Due to the fact that the statistical analyses used to determine whether carryover exists often fail to detect carryover when it in fact does exist, medical researchers uniformly agree that a crossover study should not be performed unless there is good evidence that carryover will not appear. See Dr. Laird Tr. at 59; Dr. Kenward Tr. at 141; Plaintiff's Exh. 103, at 3-4 (report on crossover designs by FDA's Biometric and Epidemiological Methodology Advisory Committee). If the crossover model is utilized when strong evidence showing that carryover is not a concern is lacking, elimination of the period two data may be warranted without further analysis, or at a minimum, statistical analysis for carryover is necessary. See Dr. Laird Tr. at 60; Dr. Kenward Tr. at 150.

B. Use of the Crossover Model in the AF Excedrin Headache Studies

Two factors suggest that the crossover model should not have been used to study the efficacy of APAP and caffeine compared to APAP alone. First, the data was based on each person's subjective measurements of the pain intensity of his headache and the amount of pain relief he felt. Professor Kenward testified that a crossover study should not be utilized where subjective measurements are involved because it is particularly difficult to predict or control for carryover in this situation. See Tr. at 143-44. Second, at least one of the four Excedrin headache studies conducted in *1216 1986 and 1987, which were based on a crossover model, showed carryover. See Dr. Laird Tr. at 124-25; Dr. Kenward Tr. at 146-47; Dr. Gillings Tr. at 375-76. Conflicting testimony was given as to whether or not evidence of carryover in only one of four studies was of any significance. See Dr. Gillings Tr. at 325-26; Dr. Kenward Tr. at 148. Although the Court previously recognized that the Excedrin studies have no probative value in the instant dispute, the fact that carryover was present in an earlier crossover trial indicates that the existence of carryover in a subsequent crossover trial is a legitimate concern.

However, there are also two opposing factors suggesting that the use of the crossover model may have been appropriate. First, the FDA's guidelines regarding the study of analgesics permits the use of crossover studies. See Dr. Beaver Tr. at 422; Plaintiff's Exh. 113, at 8-9 (Proposed Revisions of the Guidelines for Clinical Evaluation of Analgesic Drugs); Plaintiff's Exh. 104, at 135 (article by S. Dubey, entitled "Current Thoughts on Crossover Designs"). McNeil contends that the FDA guidelines for antidepressants, which disfavor the use of crossover trials are applicable because caffeine shares some characteristics of antidepressants. Since the FDA is aware that caffeine is often used as an ingredient in OTC analgesic compounds, the FDA's unqualified approval of crossover designs for studying analgesics shows that Bristol-Myers' use of the crossover model was not presumptively improper.

Second, while McNeil presented much evidence demonstrating that caffeine's stimulating effects may have caused psychological carryover in the AF Excedrin studies, Dr. Gillings testified that his analysis of the prior four Excedrin studies, which used the crossover model, showed that caffeine did not cause any psychological carryover. See Tr. at 291-99. Therefore, Bristol-Myers may have had a basis for believing that the stimulating effects of caffeine would not cause psychological carryover in the AF Excedrin studies.

Given the FDA's informed opinion that crossover models are appropriate for the study of analgesics, the Court finds the use of the crossover model was not fundamentally flawed. Nevertheless, the Court also recognizes that carryover is a problem inherent in crossover trials and at a minimum statistical analyses should be performed to remove doubt that carryover biased any of the data. If statistical analysis shows that carryover exists, the experts substantially agree that only the period one data is reliable.^[8]See Dr. Laird Tr. at 77; Dr. Kenward Tr. at 150; Dr. Gillings Tr. at 362-63. One of the leading articles in the area of crossover trials supports this view. See Plaintiff's Exh. 105, at 19 (Hills & Armitrage). When the period one data of the AF Excedrin headache studies alone is analyzed, it shows no statistically significant difference between AF Excedrin and ES Tylenol.^[9]See Dr. Laird Tr. at 66-71; Dr. Kenward Tr. at 150-51. However, when the period one and two data together are analyzed there is a statistically significant difference between the two products, in favor of AF Excedrin. Thus, in order for McNeil to sustain its burden of showing with reasonable certainty that AF Excedrin is not superior over ES Tylenol, McNeil must demonstrate that statistical analysis establishes the existence of carryover, thereby eliminating consideration of the period two data.

*1217 C. Statistical Analyses for Carryover

Preliminarily, McNeil argues that carryover is indicated by graphic illustration of the data. Professor Laird, whom the Court finds to be a knowledgeable and credible witness, explained that one could quickly see evidence of carryover by plotting on a graph the mean SPID and TOTPAR scores for AF Excedrin and ES Tylenol in each period.^[10]See Tr. at 66. According to Professor Laird, carryover is evidenced by the fact that there is a small difference between the ratings of the products' effectiveness in period one and a large difference in period two. See id. at 66-67. Defendant's statistical expert, Dr. Gillings, gave a slightly different definition of carryover, according to which the illustrations showed no carryover. Dr. Gillings testified that treatment by period interaction means that the treatment differences in the second period are not identical to the treatment differences in the first period. See Tr. at 314-15. He further stated that this is not a carryover problem unless one product wins in one period and then loses in the second period. See id. at 316. The Court finds that his definition of carryover is not supported by the evidence in the medical literature or by plaintiff's expert Professor Kenward, one of the foremost leaders in the study of crossover trials and carryover effects. See Dr. Kenward Tr. at 178-79. Thus, in light of the medical literature and the informed view of one of the most prominent medical researchers in the field, the Court concludes that illustrated differences between the effectiveness of the treatments in both periods indicates that carryover may be present. However, a graphic illustration of the differences in treatments in periods one and two is not sufficient to establish carryover and actual statistical analyses of the data must be performed.

No definitive statistical test exists to firmly establish carryover. Nevertheless, carryover is a problem known to invalidate the period two results in a crossover study. Thus, a major portion of the evidence adduced by both parties consisted of conflicting expert opinion as to which statistical tests for carryover should be relied upon by the Court.

Bristol-Myers initially conducted a "goodness-of-fit" test which showed no carryover. The goodness-of-fit test does not directly test for carryover and, therefore, the Court places little weight on the result of this test.

McNeil primarily relies on the result of the "sums and differences" test performed by Professors Laird and Kenward, who testified that the sums and differences test was proper because the sums contain the information concerning carryover. See Dr. Laird Tr. at 73; Dr. Kenward Tr. at 156. Koch recognizes the validity of the sums and difference test in his paper upon which the AF Excedrin studies were based. The Koch paper states that "one can use weighted regression methods to obtain substantially better estimates from the combined information in the ... [sums and differences] than is available separately." See Plaintiff's Exh. 102, at 491 (Koch article). Both Professors Laird and Kenward found significant evidence of carryover using this test. See Dr. Laird Tr. at 74-75; Dr. Kenward Tr. at 151.

Bristol-Myers attempted to discredit the value of the sums and differences test by introducing a memorandum dated November 3, 1990, in which Koch wrote to Dr. Gillings concerning his endorsement of the sums and differences test in his paper: "the paper suggests that a weighted regression method [i.e. the sums and differences test] ... might potentially provide better estimates.... Although such a weighted regression method is worthy of scholarly interest and research, its general usage cannot be recommended because of issues which require cautious attention." Defendant's Exh. T3, at 1-2. Dr. Gillings, a co-author of the article, further testified that the sums and differences test set forth in the Koch paper was intended only as pure speculation of the type usually made by academics. See Tr. at 329. Neither Laird nor Kenward could explain why Koch *1218 later disavowed the test. See Dr. Laird Tr. at 114; Dr. Kenward Tr. at 153-55. In particular, Professor Kenward testified that he was "extremely puzzled" by the letter because it contradicted Koch's own paper and he disagreed with the logic of Koch's reasons for disavowing the test. See Tr. at 153-55.

The Court finds Professor Kenward to be a particularly knowledgeable and credible witness, as he clearly has the greatest expertise in crossover trials and the problem of carryover. He has written the only book on crossover designs and has been involved in over 100 crossover trials. It is significant that one of the most prominent authorities in the area of crossover trials and carryover was not persuaded by Koch's criticisms of the sums and differences test. Koch's reasons for disavowing the test are at best unclear, especially in light of Professor Kenward's testimony that the sums and differences test has been used in statistical analysis for over sixty years. See Tr. at 154. The Court finds Professor Kenward's continued reliance on the sums and differences test to be reasonable and is similarly unpersuaded by Koch's subsequent disavowal of the test. Thus, the flaws subsequently asserted by Koch in the sums and differences test do not necessitate disregard for the test. Rather, the results of the sums and differences test weigh heavily in McNeil's favor in establishing the existence of carryover.

To rebut the results of the sums and differences test Bristol-Myers relies on its "four headache analysis." This test was performed by Dr. Gillings at the suggestion of Dr. Koch after he rejected the sums and differences test. Professor Kenward found several flaws with the four headache analysis. First, he testified that it contradicted the design of the AF Excedrin experiment. See Tr. at 155. Second, the test did not properly adjust for carryover as did the sums and differences test. See id. Third, and most significantly, the test analyzed the data without looking at the sums, when in fact the existence of carryover is contained in the sums. See id. at 156. While the four headache analysis may be of some value, in light of Professor Kenward's reasonable criticisms of the test the Court is less persuaded by the results of the four headache analysis than it is with the sums and differences test. In any event, the result of the sums and differences test is not wholly inconsistent with the result of the four headache analysis, as Dr. Gillings acknowledged that there is a 5-15% chance that carryover exists despite the test's finding of no carryover. See Tr. at 332.

McNeil also relies upon a "two-sequence test" for carryover which was performed by Professors Laird and Kenward. The two-sequence test analyzes only the data in the AF Excedrin-ES Tylenol and ES Tylenol-AF Excedrin sequences for carryover.^[11] The two-sequence test demonstrated the existence of carryover upon pooling the data that is, upon combining the data from both of the AF Excedrin headache studies. Professor Laird initially believed that it was proper to omit the data from the placebo treatment sequences from consideration when testing for carryover. However, she later decided that a reliable test for carryover could be performed without eliminating these sequences. See Tr. at 94-102. Indeed, she endorses the sums and differences test which analyzes the data from all six sequences. See id. No evidence was presented criticizing the validity of the two-sequence test due to the fact that placebo treatment sequences are not considered. Thus, the Court finds that the result of the two-sequence test, upon pooling the data, is relevant in considering the existence of carryover.

The Court finds Professor Laird's and Professor Kenward's opinions as to the existence of carryover in the AF Excedrin studies to be convincing. Graphic illustration of the data suggest the presence of carryover as that term is defined by a leading expert in the field and the medical literature. The results of the sums and differences test, recognized by two prominent *1219 experts to be a reliable test, confirms the existence of carryover. In addition, the two-sequence test lends further support to the conclusion that carryover exists. Weighing against the suggestion that carryover exists is the goodness-of-fit test, which inadequately tests for carryover, and the four headache analysis, which has been strongly criticized by one of the leading experts in the field. Thus, the Court concludes that the weight of the statistical analyses establishes the presence of carryover in the AF Excedrin headache studies.

The Court finds that the results of McNeil's statistical analyses are sufficient to meet its burden of proof. Statisticians, clinicians and the medical literature recognize that carryover is a major problem inherent in crossover trials, despite the lack of a definitive test to prove the existence of carryover. In the well-considered opinion of experts in the area, carryover was a legitimate concern in evaluating the AF Excedrin studies and reliable statistical analyses established its existence. Consequently, the period two data relied upon by Bristol-Myers to establish the truth of its advertising claim is unpersuasive. Rather, an analysis of the period one data indicates that AF Excedrin is not superior over ES Tylenol in relieving headache pain. Thus, the Court concludes that Bristol-Myers falsely represented the characteristics of its product AF Excedrin by claiming its superiority over ES Tylenol.^[12]

While the evidence clearly supports the issuance of an injunction, the Court notes that it would have been preferable for the parties to conduct a parallel study which avoids the problems of carryover presented in a crossover study. Nevertheless, despite the absence of such a clinical study, McNeil has met its burden relying on the crossover studies performed by Bristol-Myers. The AF Excedrin studies produced unreliable data in period two because of the existence of carryover, thereby permitting reliance upon the period one data which demonstrates the falsity of Bristol-Myers' superiority claim.

CONCLUSION

Plaintiff's request for injunctive relief under the Lanham Act is granted. 15 U.S.C. § 1125(a) (1988). Plaintiff's request for attorney's fees pursuant to 15 U.S.C. § 1117 is denied. The parties shall submit a proposed order within ten (10) days of the filing of this Memorandum and Order.

SO ORDERED.

NOTES

[1] Neither party has addressed whether their rights under sections 349 and 350 of New York's General Business Law vary from the Lanham Act and, therefore, no separate consideration will be given to the state claims.

[2] The parties stipulated that each side was limited to presenting the testimony of three expert witnesses at trial. The experts who testified on behalf of McNeil were Professor Nan Laird, an expert qualified in biostatistics, Professor Michael G. Kenward, an expert qualified in cross-over trials, and Professor William H. Barr, an expert qualified in OTC analgesics. The experts who testified on behalf of Bristol-Myers were Professor Dennis B. Gillings, an expert qualified in biostatistics and in the statistical design of clinical trials, Professor William T. Beaver, an expert qualified in clinical testing of analgesics, and Professor Yoram Wind, an expert qualified in consumer research. In addition, each party submitted numerous affidavits and leading articles written by statisticians, pharmacologists, biostatistians and others with an expertise in the study of analgesics, which pursuant to the parties' stipulation were made part of the record of the consolidated hearing and trial.

[3] An adjuvant is a substance that has no efficacy of its own but when used with an active drug it enhances the effectiveness of the active drug.

[4] Ironically, Professor Beaver testified on behalf of Bristol-Myers that he uses ES Tylenol to relieve his headache pain. See Tr. at 497.

[5] Bristol-Myers also conducted a study comparing the efficacy of AF Excedrin and ES Tylenol in relieving dental pain. Professor Laird testified that there was no significant difference between the two products compared in the dental pain study. See Tr. at 50. Her testimony was not contradicted. However, Dr. Gillings characterized the study as a model failure, and explained that two out of five studies typically "go awry." Id. at 313. In light of Dr. Gillings' reasonable explanation and the fact that the study did not directly analyze headache pain, the Court finds that the results of dental study are inconclusive and do not demonstrate the falsity of Bristol-Myers' advertising claim.

[6] The subjects who participated in the study received four to ten headaches a month. Each patient was instructed to take an unlabeled dose of either AF Excedrin, ES Tylenol or a placebo to relieve the first two headaches he suffered. After a seven-day "washout" period, the same patient was instructed to take a different unlabeled dose of either AF Excedrin, ES Tylenol or a placebo to treat the next two headaches that he suffered. The study was double-blinded to avoid bias i.e., neither the patients nor the investigators were told the identity of any of the treatments. Patients were not instructed to alter their caffeine intake before taking the treatment, but were told not to consume caffeine during the four hour period following intake of the treatment dosage. Patients rated the pain intensity of their headaches and the pain relief over a four hour period.

[7] Results are statistically significant when it is ninety-five percent certain that the results are not due to chance.

[8] Instead of looking at only the period one data Professor Laird testified that the adjustment method suggested, in the Koch paper could be used to control the period two data for carryover. See Tr. at 77. Upon performing the adjustment, Professor Laird found no significant differences between the two products. See id. However, defendant's expert Dr. Gillings did not agree that the Koch article recommended the adjustment method. See id. at 329. It is not necessary to the Court's decision to determine the validity of the adjustment method since the majority of the experts and medical literature agree it is proper to analyze only the period one data only in the event of carryover.

[9] Although Bristol-Myers' statistical analysis indicated a significant difference between the two products, Bristol-Myers primarily analyzed only the period one data of the first headache. See Dr. Gillings Tr. at 335-36. Due to Bristol-Myers' failure to adequately consider all of the data, the Court does not adopt its analysis of the period one data.

[10] Statistical analysis involves measuring SPID, sum of the pain intensity difference, and TOTPAR, total pain relief, measurements commonly used in analgesic studies.

[11] In the AF Excedrin studies there are six possible treatment sequences: (1) AF Excedrin-ES Tylenol, (2) ES Tylenol-AF Excedrin, (3) AF Excedrin-Placebo, (4) Placebo-AF Excedrin, (5) ES Tylenol-Placebo and (6) Placebo-ES Tylenol.

[12] McNeil also argues that even if a statistically significant difference exists, the advertising claim is nonetheless false because no clinically significant difference exists. Bristol-Myers contends that its advertising does not claim a clinically significant difference between the two products. In light of the Court's determination that no statistically significant difference exists, it follows that a clinically significant difference does not exist.

Alternatively, McNeil contends that if both a statistically significant and clinically significant difference exists between the two products, the advertising claim is nonetheless false and misleading because the public perceives a greater difference than actually exists. In essence, McNeil argues that even if the advertising claim is true it still violates section 43(a) of the Lanham Act because it is deceptive and misleading. Given the Court's decision that the advertising claim is literally false, it is not necessary to address McNeil's second claim.

McNeil-P.P.C., Inc. v. Bristol-Myers Squibb Co.

Related Cases

McNEIL-P.P.C., INC., Plaintiff,
v.
BRISTOL-MYERS SQUIBB COMPANY, INC., Defendant.

MEMORANDUM AND ORDER

BACKGROUND

DISCUSSION

I. Standards Governing Injunctive Relief Under Section 43(a) of the Lanham Act

II. The Food and Drug Administration's Review of Caffeine's Effectiveness as an Analgesic Adjuvant

III. Studies Demonstrating Caffeine's Effectiveness as an Analgesic Adjuvant

A. The Excedrin Studies

B. The Whitehall Study

C. The Meta-Analysis

IV. The Two AF Excedrin Headache Studies

A. The Concern for Carryover Generally

B. Use of the Crossover Model in the AF Excedrin Headache Studies

*1217 C. Statistical Analyses for Carryover

CONCLUSION

NOTES

McNeil-P.P.C., Inc. v. Bristol-Myers Squibb Co.

Related Cases

McNEIL-P.P.C., INC., Plaintiff, v. BRISTOL-MYERS SQUIBB COMPANY, INC., Defendant.

MEMORANDUM AND ORDER

BACKGROUND

DISCUSSION

I. Standards Governing Injunctive Relief Under Section 43(a) of the Lanham Act

II. The Food and Drug Administration's Review of Caffeine's Effectiveness as an Analgesic Adjuvant

III. Studies Demonstrating Caffeine's Effectiveness as an Analgesic Adjuvant

A. The Excedrin Studies

B. The Whitehall Study

C. The Meta-Analysis

IV. The Two AF Excedrin Headache Studies

A. The Concern for Carryover Generally

B. Use of the Crossover Model in the AF Excedrin Headache Studies

*1217 C. Statistical Analyses for Carryover

CONCLUSION

NOTES

McNEIL-P.P.C., INC., Plaintiff,
v.
BRISTOL-MYERS SQUIBB COMPANY, INC., Defendant.