PRECEDENTIAL
UNITED STATES COURT OF APPEALS
FOR THE THIRD CIRCUIT
____________
No. 16-2247
____________
IN RE: ZOLOFT (SERTRALINE HYDROCHLORIDE)
PRODUCTS LIABILITY LITIGATION
Jennifer Adams, et al, Plaintiffs appealing dismissal
by order entered April 5, 2016,
Appellants
On Appeal from the United States District Court
for the Eastern District of Pennsylvania
(D. C. Civil Action No. 2-12-md-02342)
District Judge: Honorable Cynthia M. Rufe
Argued on January 25, 2017
Before: CHAGARES, RESTREPO and ROTH, Circuit
Judges
(Opinion filed: June 2, 2017)
David C. Frederick [Argued]
Derek T. Ho
Kellogg Hansen Todd Figel & Frederick
1615 M Street, N.W.
Suite 400
Washington, DC 20036
Dianne M. Nast
NastLaw
1101 Market Street
Suite 2801
Philadelphia, PA 19107
Mark P. Robinson, Jr.
Robinson Calcagnie Robinson Shapiro Davis
19 Corporate Plaza Drive
Newport Beach, CA 92660
Counsel for Appellants
Sheila L. Birnbaum
Mark S. Cheffo [Argued]
Quinn Emanuel Urquhart & Sullivan
51 Madison Avenue
22nd Floor
New York, NY 10010
Robert C. Heim
Judy L. Leone
Dechert
2929 Arch Street
18th Floor, Cira Centre
Philadelphia, PA 19104
Counsel for Appellees
2
Cory L. Andrews
Washington Legal Foundation
2009 Massachusetts Avenue, N.W.
Washington, DC 20036
Counsel for Amicus Washington Legal
Foundation
Brian D. Boone
Alston & Bird
101 South Tryon Street
Suite 4000
Charlotte, NC 28280
David R. Venderbush
Alston & Bird
90 Park Avenue
15th Floor
New York, NY 10016
Counsel of Amicus Chamber of
Commerce of the United States
Joe G. Hollingsworth
Hollingsworth
1350 I Street, N.W.
Washington, DC 20005
Counsel for Amicus American Tort
Reform Association and Pharmaceutical
Research and Manufacturers of America
OPINION
3
ROTH, Circuit Judge:
This case involves allegations that the anti-depressant
drug Zoloft, manufactured by Pfizer, causes cardiac birth
defects when taken during early pregnancy. In support of
their position, plaintiffs, through a Plaintiffs’ Steering
Committee (PSC), depended upon the testimony of Dr.
Nicholas Jewell, Ph.D. Dr. Jewell used the “Bradford Hill”
criteria 1 to analyze existing literature on the causal connection
between Zoloft and birth defects. The District Court
excluded this testimony and granted summary judgment to
defendants. The PSC now appeals these orders, alleging that
1) the District Court erroneously held that an expert opinion
on general causation must be supported by replicated
observational studies reporting a statistically significant
association between the drug and the adverse effect, and 2) it
was an abuse of discretion to exclude Dr. Jewell’s testimony.
Because we find that the District Court did not establish such
a legal standard and did not abuse its discretion in excluding
Dr. Jewell’s testimony, we will affirm the District Court’s
orders.
I.
This case arises from multi-district litigation involving
315 product liability claims against Pfizer, alleging that
Zoloft, a selective serotonin reuptake inhibitor (SSRI), causes
cardiac birth defects. The PSC introduced a number of
experts in order to establish causation. The testimony of each
of these experts was excluded in whole or in part. In
particular, the court excluded all of the testimony of Dr.
Anick Bérard (an epidemiologist), which relied on the “novel
1
See Section II.B infra.
4
technique of drawing conclusions by examining ‘trends’
(often statistically non-significant) across selected studies.” 2
The PSC filed a motion for partial reconsideration of the
decision to exclude the testimony of Dr. Bérard, which the
District Court denied. The PSC then moved to admit Dr.
Jewell (a statistician) as a general causation witness. Pfizer
filed a motion to exclude Dr. Jewell, and the District Court
conducted a Daubert 3 hearing.
The District Court considered Dr. Jewell’s application
of various methodologies, reviewing his expert report,
rebuttal reports, party briefs, and oral testimony. The District
Court first examined how Dr. Jewell applied the traditional
methodology of analyzing replicated, significant results.
While Dr. Jewell discussed many groupings of cardiac birth
defects, he focused on the significant findings for all cardiac
defects and septal defects. Dr. Jewell presented two studies
reporting a significant association between Zoloft and all
cardiac defects (Kornum (2010) 4 and Jimenez-Solem
(2012) 5). He also presented five studies reporting a
2
In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig.
(Zoloft I), 26 F. Supp. 3d 449, 465 (E.D. Pa. 2014). Since Dr.
Jewell seems to provide similar testimony, we take into
account the District Court’s rationale in excluding Dr. Bérard.
3
Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993).
4
JA 1059-67. Jette B. Kornum, et al., Use of Selective
Serotonin-Reuptake Inhibitors During Early Pregnancy and
Risk of Congenital Malformations: Updated Analysis, 2 Clin.
Epidemiol. 29 (2010).
5
JA 1040-51. Espen Jimenez-Solem, et al., Exposure to
Selective Serotonin Reuptake Inhibitors and the Risk of
5
significant association between Zoloft and septal defects
(Kornum (2010), Jimenez-Solem (2012), Louik (2007), 6
Pedersen (2009), 7 and Bérard (2015) 8). After excluding two
studies from its consideration, 9 the District Court expressed
two concerns with the remaining studies: Jimenez-Solem
(2012), Kornum (2010), and Pedersen (2009). First, despite
the fact that the remaining studies produced consistent results,
the District Court did not consider them to be independent
replications because they used overlapping Danish
Congenital Malformations: A Nationwide Cohort Study, 2
British Med. J. Open 1148 (May 2012).
6
JA 5622-34. Carol Louik, et al., First-Trimester Use of
Selective Serotonin-Reuptake Inhibitors and the Risk of Birth
Defects, 356 N. Eng. J. Med. 2675 (June 2007).
7
JA 1030-39. Lars H. Pedersen, et al., Selective Serotonin
Reuptake Inhibitors in Pregnancy and Congenital
Malformations: Population Based Cohort Study, 339 British
Med. J. 3569 (Sept. 2009).
8
JA 5987-99. Anick Bérard, Sertraline Use During
Pregnancy and the Risk of Major Malformations, 212 Am. J.
Obstet. Gynecol. 795 (2015).
9
The District Court noted that during the trial, a transcription
error was found in Louik (2007), which led to a significant
result for septal defects being reclassified as insignificant. JA
65. The New England Journal of Medicine (NEJM) required
the author to revise his discussion in light of this change.
Additionally, multiple people tried to replicate the results in
Bérard (2015)—including Dr. Jewell, a member of the PSC’s
legal team, and Pfizer’s experts—and failed. The District
Court did not allow Dr. Jewell to rely on Bérard (2015) after
Dr. Jewell consequently “expressed a lack of confidence”
about its reliability on cross-examination. JA 64-65.
6
populations. Second, a larger study, Furu (2015), 10 included
almost all the data from Jimenez-Solem (2012), Kornum
(2010), and Pedersen (2009) and did not replicate the findings
of those studies. Dr. Jewell did not explain the reasons why
this attempted replication produced different results or why
the new study did not contradict his opinion.
The court then examined Dr. Jewell’s reliance on
insignificant results, noting that it was very similar to Dr.
Bérard’s methodology. The court noted that Dr. Jewell did
not provide any evidence that the epidemiology or
teratology11 communities value statistical significance 12 any
10
JA 4395-4404. Kari Furu, et al., Selective Serotonin
Reuptake Inhibitors and Venlafaxine in Early Pregnancy and
Risk of Birth Defects: Population Based Cohort Study and
Sibling Design, 350 British Med. J. 1798 (Mar. 2015). This
study was not available to Dr. Jewell when he prepared his
report, but the District Court noted that Dr. Jewell testified
that he was familiar with it. JA 63, 7297-327.
11
As the District Court noted, “[t]eratology is the scientific
field which deals with the cause and prevention of birth
defects. . . . [Where a drug is alleged to be] a teratogen, it is
common to put forth experts whose opinions are based on
epidemiological evidence.” JA 52.
12
The findings in these studies are often expressed in terms of
“odds ratios.” Odds ratios are merely “a measure of
association.” JA 2446. An odds ratio of 1, in the context of
these studies, generally means that there is no observed
association between taking Zoloft and experiencing a cardiac
birth defect. Since these odds ratios are just estimates, a
confidence interval is used to show the precision of the
estimate. JA 2439-40. If the confidence interval contains the
7
less than it has traditionally been understood. 13 The court
also expressed concern that Dr. Jewell inconsistently applied
his “technique” of multiplying p-values 14 and his trend
analysis.
The District Court critiqued several other techniques
Dr. Jewell used in analyzing the evidence. First, Dr. Jewell
rejected meta-analyses on which he had previously relied in a
lawsuit against another SSRI, Prozac. The meta-analyses
reported insignificant associations with birth defects for
Zoloft but not for Prozac. Dr. Jewell rationalized his decision
to ignore these meta-analyses because the “heterogeneity” 15
within its Zoloft studies was significant; the District Court
odds ratio of 1, the risk of cardiac birth defects while taking
Zoloft is not considered “significantly” greater than the risk
while not taking Zoloft.
13
The District Court instead noted that the NEJM’s treatment
of the Louik (2007) transcription error suggests that the
epidemiology and teratology communities still strongly value
significance. JA 67.
14
A “p-value” indicates the likelihood that the difference
between the observed and the expected value (based on the
null hypothesis) of a parameter occurs purely by chance. JA
2396. In this context, the null hypothesis is that the odds ratio
is one; rejecting the null hypothesis suggests there is a
significant association between Zoloft and cardiac birth
defects.
15
The District Court quoted Dr. Jewell in defining
heterogeneity as “the measure of the variation among the
effect sizes reported in [various] studies [and] . . . where
heterogeneity is significant, the source of variation should be
investigated and discussed.” JA 70.
8
accepted this explanation but questioned why Dr. Jewell
“fails to statistically calculate the heterogeneity” across other
studies instead of relying on trends. 16 Second, Dr. Jewell
reanalyzed two studies, Jimenez-Solem (2012) and
Huybrechts (2014), 17 both of which had originally concluded
that there was no significant effect attributable to Zoloft.18
The District Court questioned his rationale for conducting,
and tactics for implementing, this reanalysis. Finally, Dr.
Jewell conducted a meta-analysis with Huybrechts (2014) and
Jimenez-Solem (2012). The District Court questioned why he
used only those particular studies. 19
Based on this analysis, the District Court found that
Dr. Jewell, tasked with explaining his opinion about Zoloft’s
effect on birth defects and reconciling contrary studies,
16
JA 72.
17
JA 4256-67. Krista F. Huybrechts, et al. Antidepressant
Use in Pregnancy and the Risk of Cardiac Defect, 370 N.
Eng. J. Med. 2397 (2014).
18
Jimenez-Solem (2012) found that both current Zoloft users
and SSRI users who “paused” their use during pregnancy had
elevated risks of birth defects; this study concluded that the
increased risk resulted from a confounding factor. JA 1044,
1047-48. Huybrechts (2014) found the increase in the risk of
cardiac birth defects from taking Zoloft to be insignificant.
JA 4257-67.
19
Additionally, the District Court found that Dr. Jewell may
have relied on a Periodic Safety Update Report, which
contains literature reviews, and email correspondence
summarizing a literature review. The District Court excluded
this testimony because this is not the type of information
statisticians generally rely on. This exclusion is not contested
here.
9
“failed to consistently apply the scientific methods he
articulates, has deviated from or downplayed certain well-
established principles of his field, and has inconsistently
applied methods and standards to the data so as to support his
a priori opinion.” 20 For this reason, on December 2, 2015,
the District Court entered an order, excluding Dr. Jewell’s
testimony, and on April 5, 2016, the court granted Pfizer’s
motion for summary judgment. The PSC appeals the
exclusion of Dr. Jewell and the grant of summary judgment. 21
20
JA 82.
21
The PSC concedes that if the exclusion of Dr. Jewell was
proper, it is unable to establish general causation and
summary judgment was properly granted. Oral Argument
Recording at 13:30-13:59,
http://www2.ca3.uscourts.gov/oralargument/audio/16-
2247In%20Re%20Zoloft.mp3.
10
II. 22
In general, courts serve as gatekeepers for expert
witness testimony. “A witness who is qualified as an expert
by knowledge, skill, experience, training, or education may
testify in the form of an opinion or otherwise if,” inter alia,
“the testimony is the product of reliable principles and
methods[] and . . . the expert has reliably applied the
principles and methods to the facts of the case.” 23 In
determining the reliability of novel scientific methodology,
courts can consider multiple factors, including the testability
of the hypothesis, whether it has been peer reviewed or
published, the error rate, whether standards controlling the
technique’s operation exist, and whether the methodology is
22
The District Court had jurisdiction over this claim under 28
U.S.C. § 1332 and 28 U.S.C. § 1407(a). We have jurisdiction
under 28 U.S.C. § 1291. We review questions of law de
novo, and questions of fact for clear error. Ragen Corp. v.
Kearney & Trecker Corp., 912 F.2d 619, 626 (3d Cir. 1990)
(citations omitted). We review the decision to exclude expert
testimony for abuse of discretion. In re Paoli R.R. Yard PCB
Litig. (In re Paoli), 35 F.3d 717, 749 (3d Cir. 1994).
However, when the exclusion of such evidence results in a
summary judgment, we perform a “hard look” analysis to
determine if a district court has abused its discretion. Id. at
750. An abuse of discretion occurs when a court’s decision
“rests upon a clearly erroneous finding of fact, an errant
conclusion of law or an improper application of law to fact”
or “when no reasonable person would adopt the district
court's view.” Oddi v. Ford Motor Co., 234 F.3d 136, 146
(3d Cir. 2000) (internal quotation marks and citation omitted).
23
Fed. R. Evid. 702.
11
generally accepted. 24 Both an expert’s methodology and the
application of that methodology must be reviewed for
reliability. 25 A court should not, however, usurp the role of
the fact-finder; instead, an expert should only be excluded if
“the flaw is large enough that the expert lacks the ‘good
grounds’ for his or her conclusions.” 26
Central to this case is the question of whether
statistical significance is necessary to prove causality. We
decline to state a bright-line rule. Instead, we reiterate that
plaintiffs ultimately must prove a causal connection between
Zoloft and birth defects. A causal connection may exist
despite the lack of significant findings, due to issues such as
random misclassification or insufficient power. 27 Conversely,
a causal connection may not exist despite the presence of
significant findings. If a causal connection does not actually
exist, significant findings can still occur due to, inter alia,
inability to control for a confounding effect or detection bias.
A standard based on replication of statistically significant
24
In re Paoli, 35 F.3d at 742.
25
Id. at 745 (“However, after Daubert [v. Merrell Dow
Pharm., Inc., 509 U.S. 579 (1993)], we no longer think that
the distinction between a methodology and its application is
viable.”).
26
In re TMI Litig., 193 F.3d 613, 665 (3d Cir. 1999),
amended, 199 F.3d 158 (3d Cir. 2000) (internal quotation
marks and citation omitted).
27
Power is “the chance that a statistical test will declare an
effect when there is an effect to be declared. This chance
depends on the size of the effect and the size of the sample.
Discerning subtle differences requires large samples; small
samples may fail to detect substantial differences.” JA 2409.
12
findings obscures the essential issue: a causal connection.
Given this, the requisite proof necessary to establish causation
will vary greatly case by case. This is not to suggest,
however, that statistical significance is irrelevant. Despite the
problems with treating statistical significance as a magic
criterion, it remains an important metric to distinguish
between results supporting a true association and those
resulting from mere chance. Discussions of statistical
significance should thus not understate or overstate its
importance.
With this in mind, we proceed to the issues at hand.
The PSC raises two issues on appeal: 1) whether the District
Court erroneously concluded that reliability requires
replicated, statistically significant findings, and 2) whether
Dr. Jewell’s testimony was properly excluded.
A.
The PSC argues that the District Court erroneously
held that replicated, statistically significant findings are
necessary to satisfy reliability. This argument seems to have
been originally raised in the motion for reconsideration of Dr.
Bérard’s exclusion. Explaining its decision to exclude Dr.
Bérard, the District Court cited a previous case, Wade-Greaux
v. Whitehall Labs, Inc., for the proposition that the teratology
community generally requires replicated, significant
epidemiological results before inferring causality. 28 The PSC
28
Zoloft I, 26 F. Supp.3d at 454 n.13 (citing Wade-Greaux v.
Whitehall Labs., Inc., 874 F. Supp. 1441, 1453 (D.V.I. 1994)
aff'd, 46 F.3d 1120 (3d Cir. 1994), for text, see No. 94-7199,
1994 WL 16973481 (3d Cir. Dec. 15, 1994)).
13
claims that in so doing, the District Court was asserting a
legal standard that required replicated, significant findings for
reliability. 29 Pfizer contends that the District Court merely
made a factual finding about what the teratology community
generally accepts.
Upon review, it is clear that the District Court was not
creating a legal standard, but merely making a factual finding.
The PSC argues that the District Court must have created a
legal standard because it did not cite any sources other than
Wade-Greaux to support its assertion that the teratology
community generally requires replicated, significant
epidemiological findings. However, in its initial exclusion of
Dr. Bérard, the District Court noted that it looked to the
standards adopted by “other epidemiologists, even the very
researchers [Dr. Bérard] cites in her report.” 30 Similarly, in
29
Relatedly, the PSC claims that the District Court made a
legal standard that “it was not reliable for Dr. Jewell to
invoke studies observing non-statistically significant positive
associations.” However, the language cited does not support
this conclusion: The District Court merely asserts that
“experts may use congruent but non-significant data to bolster
inferences drawn from replicated, statistically significant
data. However, in this case . . . three of the studies Dr. Jewell
relies upon to show replication use overlapping data . . . [and]
have not been replicated by later, well-powered studies which
attempt to control for various confounding factors and
biases.” JA 67-68.
30
Zoloft I, 26 F. Supp. 3d at 456 (“There exists a well-
established methodology used by scientists in her field of
epidemiology, and Dr. Bérard herself has utilized it in her
published, peer-reviewed work. The ‘evolution’ in thinking
14
its order denying general reconsideration of Dr. Bérard’s
exclusion, the District Court clarified that it “made this
factual finding after review of the published literature relied
upon by Dr. Bérard and other experts, as well as its review of
the reports and testimony of both parties” 31 and merely used
this factual finding as part of its FRE 702 analysis. 32 While
the District Court does cite Wade-Greaux, 33 it uses it merely
to show “that other courts have made similar findings
regarding the prevailing standards for scientists in Dr.
Bérard’s field.” 34
about the importance of statistical significance Dr. Bérard
refers to does not appear to have been adopted by other
epidemiologists, even the very researchers she cites in her
report.”).
31
In re Zoloft (Sertraline Hydrocloride) Prod. Liab. Litig.
(Zoloft II), No. 12-2342, 2015 WL 314149, at *2 (E.D. Pa.
Jan. 23, 2015); see, e.g., JA 3962, 3971-72.
32
While general acceptance by the scientific community is no
longer dispositive in the Rule 702 analysis, it remains a factor
that a court may consider. Daubert, 509 U.S. at 594 (“[A]
known technique which has been able to attract only minimal
support within the community may properly be viewed with
skepticism.”) (internal quotation marks and internal citation
omitted).
33
Wade-Greaux, 874 F. Supp. at 1453 (noting that “[a]bsent
consistent, repeated human epidemiological studies showing a
statistically significant increased risk of particular birth
defects associated with exposure to a specific agent, the
community of teratologists does not conclude that the agent is
a human teratogen.”).
34
Zoloft II, 2015 WL 314149, at *2.
15
Second, the course of the proceedings make clear that
the replication of significant results was not dispositive in
establishing whether the testimony of either Dr. Bérard or Dr.
Jewell was reliable. In fact, the District Court expressly
rejected Pfizer’s argument that the existence of a statistically
significant, replicated result is a threshold issue before an
expert can conduct the Bradford-Hill analysis. 35 In doing so,
the District Court was clear that it was not requiring a
threshold showing of statistical significance. Similarly, the
District Court did not end its inquiry after analyzing whether
there were replicated, significant results. Instead, the District
Court examined other techniques of general trend analysis,
reanalysis of other studies, and meta-analysis. Even though it
ultimately rejected the application of these techniques as
unreliable, it did not categorically reject alternative
techniques, suggesting that it did not make a legal standard
requiring replicated, significant results.
For these reasons, we find that the District Court did
not require replication of significant results to establish
reliability. Instead, it merely made a factual finding that
teratologists generally require replication of significant
results, and this factual finding did not prevent it from
considering other evidence of reliability. 36
35
Id. (“In so doing, the Court rejected Pfizer's argument that
the Court could exclude Dr. Bérard's opinion without even
reaching her Bradford–Hill analysis, because the Bradford–
Hill criteria should only be applied after an association is well
established”); see also Zoloft I, 26 F. Supp. 3d at 462.
36
The PSC also argues that the District Court did not discuss
one study providing a significant, positive association
between Zoloft and birth defects, Wemakor (2015). The PSC
16
B.
The second issue on appeal is whether it was an abuse
of discretion for the District Court to exclude Dr. Jewell’s
testimony. Dr. Jewell utilized a combination of two methods:
the “weight of the evidence” analysis and the Bradford Hill
criteria. The “weight of the evidence” analysis involves a
series of logical steps used to “infer[] to the best
explanation[.]” 37 The Bradford Hill criteria are metrics that
epidemiologists use to distinguish a causal connection from a
mere association. These metrics include strength of the
association, consistency, specificity, temporality, coherence,
biological gradient, plausibility, experimental evidence, and
analogy. 38 In his expert report, Dr. Jewell seems to utilize
numerous “techniques” in implementing the weight of the
evidence methodology. Dr. Jewell discusses whether the
claims this is “reversible error because it inaccurately
depicted Dr. Jewell’s opinion as unsupported by replicated,
non-overlapping data.” Pfizer argues that the District Court
did not have to mention each study and that Wemakor is
unreliable, as the authors themselves admit that their findings
are “compatible with confounding by depression as indication
or other associated factors/exposures.” We conclude that this
was not an error because it is clear the District Court
considered Wemakor in the Daubert hearing. Even if the
District Court had failed to consider Wemakor, we would find
no error because it did not require replicated, statistically
significant findings as a legal requirement.
37
Milward v. Acuity Specialty Prods. Grp., Inc., 639 F.3d 11,
17 (1st Cir. 2011) (internal quotation marks and citation
omitted).
38
JA 5652-56.
17
conclusions drawn from these techniques satisfy the Bradford
Hill criteria and support the existence of a causal
connection. 39
Pfizer does not seem to contest the reliability of the
Bradford Hill criteria or weight of the evidence analysis
generally; the dispute centers on whether the specific
methodology implemented by Dr. Jewell is reliable. Flexible
methodologies, such as the “weight of the evidence,” can be
implemented in multiple ways; despite the fact that the
methodology is generally reliable, each application is distinct
and should be analyzed for reliability. In In re Paoli R.R.
Yard PCB Litigation, this Circuit noted that while differential
diagnosis—also a flexible methodology—is generally
accepted, “no particular combination of techniques chosen by
a doctor to assess an individual patient is likely to have been
generally accepted.” 40 Accordingly, we subjected the
expert’s specific differential diagnosis process to a Daubert
inquiry. 41 We noted that “to the extent that a doctor utilizes
standard diagnostic techniques in gathering this information,
the more likely we are to find that the doctor’s methodology
is reliable.” 42 While we did not require the expert to run
specific tests or ascertain full information in order for the
differential diagnosis to be reliable, we did require him to
explain why his conclusion remained reliable in the face of
39
Pfizer argues that PSC did not previously use the “weight
of the evidence” terminology for the method followed by Dr.
Jewell. We assume for the sake of argument that this was the
purported methodology all along.
40
In re Paoli, 35 F.3d 717, 758 (3d Cir. 1994).
41
Id.
42
Id.
18
alternate causes. 43
This standard, while articulated with respect to
differential diagnoses, applies to the weight of the evidence
analysis. We have briefly encountered the Bradford Hill
criteria/weight of the evidence methodology in Magistrini v.
One Hour Martinizing Dry Cleaning, a nonprecedential
affirmance of the District of New Jersey’s exclusion of an
expert. 44 The expert followed the weight of the evidence
methodology, including epidemiological findings assessed
using the Bradford Hill criteria. The District Court
acknowledged that although the weight of the evidence
methodology was generally reliable, “[t]he particular
combination of evidence considered and weighed here has not
been subjected to peer review.” 45 Similar concerns are
arguably present for the Bradford Hill criteria, which are
43
Id. at 760 (“[T]he district court abused its discretion in
excluding that opinion under Rule 702 unless either (1) Dr.
Sherman or DiGregorio engaged in very few standard
diagnostic techniques by which doctors normally rule out
alternative causes and the doctor offered no good explanation
as to why his or her conclusion remained reliable, or (2) the
defendants pointed to some likely cause of the plaintiff's
illness other than the defendants’ actions and Dr. Sherman or
DiGregorio offered no reasonable explanation as to why he or
she still believed that the defendants' actions were a
substantial factor in bringing about that illness.”).
44
Magistrini v. One Hour Martinizing Dry Cleaning, 68 F.
App’x 356 (3d Cir. 2003).
45
Magistrini v. One Hour Martinizing Dry Cleaning, 180 F.
Supp. 2d 584, 602 (D.N.J. 2002).
19
neither an exhaustive nor a necessary list. 46 An expert can
theoretically assign the most weight to only a few factors, or
draw conclusions about one factor based on a particular
combination of evidence. The specific way an expert
conducts such an analysis must be reliable; “all of the
relevant evidence must be gathered, and the assessment or
weighing of that evidence must not be arbitrary, but must
itself be based on methods of science.” 47 To ensure that the
Bradford Hill/weight of the evidence criteria “is truly a
methodology, rather than a mere conclusion-oriented
selection process . . . there must be a scientific method of
weighting that is used and explained.” 48 For this reason, the
specific techniques by which the weight of the
evidence/Bradford Hill methodology is conducted must
themselves be reliable according to the principles articulated
in Daubert. 49
In short, despite the fact that both the Bradford Hill
and the weight of the evidence analyses are generally reliable,
46
Milward, 639 F.3d at 17.
47
Magistrini, 180 F. Supp. 2d at 602.
48
Id. at 607.
49
There has been very little circuit authority regarding the
application of the Bradford Hill criteria in the weight of the
evidence analysis. The First Circuit has warned against
“treat[ing] the separate evidentiary components of [the]
analysis atomistically, as though [the] ultimate opinion was
independently supported by each.” Milward, 639 F.3d at 23.
In contrast, the Tenth Circuit briefly discussed the Bradford
Hill criteria, and then separately conducted a Daubert
analysis for each body of evidence. Hollander v. Sandoz
Pharm. Corp., 289 F.3d 1193, 1204-13 (10th Cir. 2002).
20
the “techniques” used to implement the analysis must be 1)
reliable and 2) reliably applied. In discussing the conclusions
produced by such techniques in light of the Bradford Hill
criteria, an expert must explain 1) how conclusions are drawn
for each Bradford Hill criterion and 2) how the criteria are
weighed relative to one another. Here, we accept that the
Bradford Hill and weight of the evidence analyses are
generally reliable. We also assume that the “techniques” used
to implement the analysis (here, meta-analysis, trend analysis,
and reanalysis) are themselves reliable. However, we find
that Dr. Jewell did not 1) reliably apply the “techniques” to
the body of evidence or 2) adequately explain how this
analysis supports specified Bradford Hill criteria. Because
“any step that renders the analysis unreliable under the
Daubert factors renders the expert’s testimony
inadmissible,” 50 this is sufficient to show that the District
Court did not abuse its discretion in excluding Dr. Jewell’s
testimony.
1.
It was not an abuse of discretion for the District Court
to find Dr. Jewell’s application of trend analysis, reanalysis,
and meta-analysis to the body of evidence to be unreliable.
Here, we assume the techniques listed are generally reliable
and rest on the fact that they were unreliably applied. As
stated in In re Paoli, use of standard techniques bolster the
inference of reliability; 51 nonstandard techniques need to be
well-explained. Additionally, if an expert applies certain
techniques to a subset of the body of evidence and other
50
In re Paoli, 35 F.3d at 745.
51
Id. at 758.
21
techniques to another subset without explanation, this raises
an inference of unreliable application of methodology. 52
First, we find no abuse of discretion in the District
Court’s determination that Dr. Jewell unreliably analyzed the
trend in insignificant results. Dr. Jewell applied this
technique by qualitatively discussing the probative value of
multiple positive, insignificant results. In justifying this
approach, he relied on a quantitative method by which one
can calculate the likelihood of seeing multiple positive but
insignificant results if there were actually no true effect.53
However, after alluding to this presumably reliable
mathematical calculation technique for analyzing trends in
even insignificant results, Dr. Jewell did not actually
implement it; instead he qualitatively discussed the general
trend in the data. In light of the opportunity to actually
conduct such quantitative analysis, his refusal to do so—
without explanation—suggests that he did not reliably apply
his stated methodology. 54
Even assuming the reliability of Dr. Jewell’s version of
52
See Magistrini, 180 F. Supp. 2d at 607 (noting that a
scientific method of weighting must be explained to prevent a
“conclusion-oriented selection process.”).
53
Dr. Jewell used this as an illustrative example in his report
and at the Daubert hearing but on appeal PSC identifies this
technique as Fisher’s combined probability test. Insofar as
this is part of a meta-analysis or is sensitive to the same
heterogeneity issues articulated by Dr. Jewell, we reiterate
our concerns below.
54
JA 69 (“[T]he Court finds Dr. Jewell’s failure to apply the
methodology he outlined to the studies he reviewed
problematic.”).
22
trend analysis, Dr. Jewell identified trends and interpreted
insignificant results differently based on the outcome of the
study. The District Court concluded that Dr. Jewell
“selectively emphasize[d] observed consistency . . . only
when the consistent studies support his opinion.” 55 Dr. Jewell
emphasized the insignificance of results reporting odds ratios
below 1 but not the insignificance of those reporting odds
ratios above 1. He also paid attention to the upper bounds of
the confidence intervals associated with odds ratios below 1,
but not to the lower bounds.
Second, we interpret the District Court’s discussion of
heterogeneity as raising the concern that Dr. Jewell
selectively used meta-analyses. He did this in two ways:
First, without explanation, Dr. Jewell performed a meta-
analysis on two studies but not on any of the other studies.
The District Court questioned why Dr. Jewell did not conduct
a meta-analysis on the remaining studies instead of using the
qualitative general trend analysis. While Dr. Jewell was not
required to do specific tests, the lack of explanation made his
inconsistent application of meta-analysis to certain studies
unreliable. 56 Second, when he did perform a meta-analysis,
Dr. Jewell only included two studies utilizing “exposed” and
“paused” groups even though each had a different definition
55
JA 69.
56
Dr. Jewell admitted that he did not “attempt to do a meta-
analysis where [he] defined an a priori – an a priori
inclusion/exclusion set of criteria, generated a return set of
studies, assessed heterogeneity and then considered whether
by further adjustment or accommodation, [he] could come up
with a meaningful set of statistics.” He cryptically claimed
that he “determined you couldn’t.” JA 4898.
23
of “paused,” without an adequate explanation for why these
studies can be lumped together. He also inexplicably
excluded another study (Kornum (2010)) utilizing similar
methodology. Again, while there may have been legitimate
reasons for these inconsistencies, the fact that he did not give
an adequate explanation for doing so makes his testimony
unreliable.
Finally, Dr. Jewell reanalyzed two studies to control
for confounding by indication. The need for conducting this
reanalysis on Huybrechts (2014) was unclear. Dr. Jewell said
that he wanted to control for indication by comparing the
outcomes for “paused” Zoloft users to “exposed” Zoloft
users; however, the study already controlled for indication. If
Dr. Jewell wanted to correct for misclassification, the original
study already controlled for that as well through extensive
sensitivity analyses. 57 Given that the study originally
concluded that Zoloft was not associated with a statistically
significant increase in the likelihood of birth defects, this
reanalysis seems conclusion-driven.
Ultimately, the fact that Dr. Jewell applied these
techniques inconsistently, without explanation, to different
subsets of the body of evidence raises real issues of
reliability. Conclusions drawn from such unreliable
application are themselves questionable.
57
It is true that these sensitivity analyses had less power
because they involved looking at a subset of the population,
making them less likely to find a significant difference;
however, we could not find that Dr. Jewell has raised this
point as a reason for reanalysis.
24
2.
Using the techniques discussed above, Dr. Jewell went
on to evaluate the Bradford Hill criteria. While Dr. Jewell did
discuss the applicable Bradford Hill criteria and how he
weighed the factors together, he did not explain how he drew
conclusions for certain criteria, namely the strength of
association and consistency.
Dr. Jewell concluded that the strength of association
weighs in favor of causality. In doing so, he focused on
studies reporting odds ratios between two and three (Colvin
(2011), 58 Jimenez-Solem (2012), Malm (2011), 59 Pedersen
(2009), and Louik (2007)). He rationalized that such a large
association is unlikely to be associated with confounding
alone. 60 He later bolstered this argument by estimating the
percent of the effect generally attributable to confounding by
indication. He estimated this percent by observing the
percent decrease in odds ratios after controlling for indication
over a few studies. When pressed by counsel at the Daubert
hearing, Dr. Jewell admitted that this was not a scientifically
58
JA 6011-28. Lyn Colvin, et al., Dispensing Patterns and
Pregnancy Outcomes for Women Dispensed Selective
Serotonin Reuptake Inhibitors in Pregnancy, 91 Birth Defects
Res. A Clin. Mol. Teratol. 142 (2011).
59
JA 7697-7707. Heki Malm, et al., Selective Serotonin
Reuptake Inhibitors and Risk for Major Congenital
Anomalies, 118 Obstetrics & Gynecology 111 (2011).
60
Dr. Jewell also notes that the link between depression and
cardiac defects being missing undercuts the confounding by
indication argument. JA 7468-69.
25
rigorous adjustment. 61 Such reliance on ad hoc adjustments
supports the District Court’s decision to exclude Dr. Jewell’s
testimony.
Similarly, while Dr. Jewell found that the causal effect
of Zoloft on cardiac birth defects is consistent, it is not clear
how he drew this conclusion. As noted above, Dr. Jewell
classified insignificant odds ratios above one as supporting a
“consistent” causality result, downplaying the possibility that
they support no association between Zoloft use and cardiac
birth defects. While an insignificant result may be consistent
with a causal effect, Dr. Jewell’s discussion is too far-
reaching, sometimes understating the importance of statistical
significance. For example, Furu (2015)—a study that
incorporated almost all the data in Pedersen (2009), Jimenez-
Solemn (2012), and Kornum (2010)—included a larger
sample but, unlike the former three studies, reported no
significant association between Zoloft and cardiac birth
defects. Insignificant results can occur merely because a
study lacks power to produce a significant result, and, all else
being equal, a larger sample size increases the power of a
test. 62 Unless there are other significant differences, we
61
JA 7470-71 (“I said, I didn't put that in my report. I put in
that if you wanted as a statistician, if somebody came to me
now as you're sort of hinting at and said [Colvin] didn’t adjust
for confounding, well, that could make a big impact, I agree,
it could, just if I knew nothing else. . . . [A] statistician knows
from doing simulations and computation that we alluded to
yesterday how much of an impact could you take -- get from
adjusting for confounding even though in this particular
population we [aren’t] able to do it. It’s not a definitive
result.”)
62
Insofar as Dr. Jewell finds Furu to be less powerful than the
26
would expect Furu to be better able to capture a true effect
than the preceding three studies. While an insignificant result
from a low-powered study does not necessarily undermine a
statistically significant result from a higher-powered study,
the opposite argument (i.e., that an insignificant finding from
a presumably better-powered study is evidence of consistency
with significant findings from lower-powered studies)
requires further explanation. 63 While there may be a reason
that such a result could be consistent with the past significant
effects, Dr. Jewell did not meaningfully discuss why this may
be. 64 Without adequate explanation, this argument
understates the importance of statistical significance. Like
the expert in Magistrini, Dr. Jewell should have “sufficiently
discredit[ed] other studies that found no association or a
negative association with much more precise confidence
intervals, [or] sufficiently explain[ed] why he did not accord
weight to those studies.” 65 Claiming a consistent result
without meaningfully addressing these alternate explanations,
as noted in In re Paoli, undermines reliability. 66
previous studies based on factors other than sample size, he
has not articulated this argument.
63
For example, Dr. Jewell could have argued that, despite
having a larger sample, Furu (2015) was not better powered
for other reasons or utilized flawed methodology.
64
In fact, upon appeal, the PSC argues that Furu (2015) is
consistent with Dr. Jewell’s causal result merely because it
reports odds ratios above one (1.05 and 1.13).
65
Magistrini, 180 F. Supp. 2d at 607 (emphasis added).
66
In re Paoli, 35 F.3d at 760 (noting the importance of
explaining why a conclusion remains reliable in the face of
alternate explanations).
27
For these reasons, the District Court determined that
Dr. Jewell did not consistently assess the evidence supporting
each criterion or explain his method for doing so. Thus, it
was not an abuse of discretion to find that Dr. Jewell’s
application of the Bradford Hill criteria was unreliable.
This is not to suggest that all of the District Court’s
criticisms were necessarily justified. For example, the fact
that in his reanalysis Dr. Jewell drew a different conclusion
from a study than its authors did is not necessarily a problem.
Similarly, his imposition of a different assumption about the
“exposed” group in Huybrechts (2014) did not require expert
knowledge about psychology; he was merely testing the
robustness of the results to Huybrechts’ original assumption.
Similarly, the District Court credited the claim that
overlapping samples did not provide replicated results,
despite the fact that Dr. Jewell claimed it provided some
informational value. 67 These inquiries are more appropriately
left to the jury.
On the whole, however, the District Court did not
improperly usurp the jury’s role in assessing Dr. Jewell’s
credibility. There is sufficient reason to find Dr. Jewell’s
testimony was unreliable. Indeed, “any step that renders the
analysis unreliable under the Daubert factors renders the
expert’s testimony inadmissible.” 68 The fact that Dr. Jewell
unreliably applied the techniques underlying the weight of the
evidence analysis and the factors of the Bradford Hill analysis
satisfies this standard for inadmissibility.
67
JA 7164 (noting that overlapping analysis still “provides a
modicum of replication”).
68
In re Paoli, 35 F.3d at 745.
28
III.
This case involves complicated facts, statistical
methodology, and competing claims of appropriate standards
for assessing causality from observational epidemiological
studies. Ultimately, however, the issue is quite clear. As a
gatekeeper, courts are supposed to ensure that the testimony
given to the jury is reliable and will be more informative than
confusing. Dr. Jewell’s application of his purported methods
does not satisfy this standard. By applying different
techniques to subsets of the data and inconsistently discussing
statistical significance, Dr. Jewell does not reliably analyze
the weight of the evidence. Selecting these conclusions to
discuss certain Bradford Hill factors also contributes to the
unreliability. While the District Court may have flagged a
few issues that are not necessarily indicative of an unreliable
application of methods, there is certainly sufficient evidence
on the record to suggest that the court did not abuse its
discretion in excluding Dr. Jewell as an expert on the basis of
the unreliability of his methods. For these reasons, we will
affirm the orders of the District Court, excluding the
testimony of Dr. Jewell and granting summary judgment in
favor of Pfizer.
29