State v. Sharpe

Notice: This opinion is subject to correction before publication in the PACIFIC REPORTER. Readers are requested to bring errors to the attention of the Clerk of the Appellate Courts, 303 K Street, Anchorage, Alaska 99501, phone (907) 264-0608, fax (907) 264-0878, email corrections@akcourts.us. THE SUPREME COURT OF THE STATE OF ALASKA STATE OF ALASKA, ) ) Supreme Court Nos. S-16191/16193/ Petitioner and Cross-Respondent, ) 16214/16449 (Consolidated) ) v. ) Court of Appeals No. A-12452 ) Superior Court No. 3PA-14-00877 CR JYZYK J. SHARPE, ) ) OPINION Respondent and Cross-Petitioner. ) ) No. 7326 – January 4, 2019 ) STATE OF ALASKA, ) ) Court of Appeals Nos. A-11423/11433 Petitioner and Cross-Respondent, ) Superior Court No. 3AN-09-11088 CR ) v. ) ) THOMAS HENRY ALEXANDER, ) ) Respondent and Cross-Petitioner. ) ) ) JEFFERY K. HOLT, ) ) Court of Appeals No. A-12219 Appellant, ) Superior Court No. 3HO-11-00515 CR ) v. ) ) STATE OF ALASKA, ) ) Appellee. ) ) Petition for Hearing in File Nos. S-16191/16214 from the Court of Appeals of the State of Alaska, on appeal from the Superior Court of the State of Alaska, Third Judicial District, Palmer, Eric Smith, Judge. Petition for Hearing in File Nos. S-16193/16214 from the Court of Appeals of the State of Alaska, on appeal from the Superior Court of the State of Alaska, Third Judicial District, Anchorage, Gregory Miller, Judge. Certified Question in File No. S-16449 from the Court of Appeals of the State of Alaska, on appeal from the Superior Court of the State of Alaska, Third Judicial District, Homer, Charles T. Huguelet, Judge. Appearances: Diane L. Wendtland, Assistant Attorney General, Office of Criminal Appeals, Anchorage, and Jahna Lindemuth, Attorney General, Juneau, for Petitioner and Cross-Respondent and Appellee State of Alaska. Sharon Barr, Assistant Public Defender, and Quinlan Steiner, Public Defender, Anchorage, for Respondents and Cross-Petitioners Sharpe and Alexander. Brooke Berens, Assistant Public Advocate, and Richard Allen, Public Advocate, Anchorage, for Appellant Holt. Gordon L. Vaughan, Vaughan & DeMuro, Colorado Springs, Colorado, and Gavin Kentch, Law Office of Gavin Kentch, LLC, Anchorage, for Amicus Curiae American Polygraph Association. Before: Stowers, Chief Justice, Winfree, Maassen, Bolger, and Carney, Justices. STOWERS, Chief Justice. I. INTRODUCTION In each of the three underlying criminal cases in this consolidated appeal, the defendant sought to introduce expert testimony by a polygraph examiner that the defendant was truthful when he made exculpatory statements relating to the charges -2- 7326 against him during a polygraph examination conducted using the “comparison question technique” (CQT). In two of the cases, the superior courts found that testimony based on a CQT polygraph examination satisfied the requirements for scientific evidence under Daubert v. Merrell Dow Pharmaceuticals, Inc.1 and State v. Coon.2 In the third case, the superior court reached the opposite conclusion and found the evidence inadmissible. We are now asked to revisit the appellate standard of review for rulings on the admissibility of scientific evidence and to determine the admissibility of CQT polygraph evidence. We conclude that appellate review of Daubert/Coon determinations should be conducted under a hybrid standard: the superior court’s preliminary factual determinations are reviewed for clear error; based on those findings and the evidence available, whether a particular scientific theory or technique has been shown to be “scientifically valid” under Daubert and Coon is a question of law to which we apply our independent judgment; and where proposed scientific evidence passes muster under that standard, the superior court’s case-specific determinations and further evidentiary rulings are reviewed for abuse of discretion. Applying this standard here, we conclude that CQT polygraph evidence has not been shown to be sufficiently reliable to satisfy the Daubert/Coon standard. II. BACKGROUND A. State v. Alexander Thomas Alexander was charged with multiple counts of sexual abuse of a minor. Before trial, Alexander hired David Raskin, Ph.D., a polygraph examiner, to administer a CQT polygraph examination. Based on the polygraph results, Dr. Raskin concluded that Alexander answered truthfully when he denied committing the acts with 1 509 U.S. 579 (1993). 2 974 P.2d 386 (Alaska 1999). -3­ 7326 which he was charged. At Alexander’s request, Superior Court Judge Gregory Miller held an evidentiary hearing to address the admissibility of the polygraph results. For the purpose of that hearing, Alexander’s case was consolidated with an unrelated criminal case pending before Superior Court Judge pro tem Daniel Schally because the two cases involved similar polygraph testimony by the same polygraph examiner, Dr. Raskin.3 The two judges held a joint evidentiary hearing over the course of two days, spanning more than ten hours of testimony. Dr. Raskin testified for the defense in support of admitting testimony about the polygraph results, while William Iacono, Ph.D., a research psychologist at the University of Minnesota, testified for the State in opposition. Both sides also submitted copious evidence in the form of declarations by the two experts, scientific studies, treatises, etc. The judges issued a joint order for both cases concluding that CQT polygraph testing satisfies the Daubert/Coon requirements for scientific validity. The judges also concluded that the proposed testimony was not otherwise excluded by the Alaska Rules of Evidence relating to relevance, unfair prejudice, credibility bolstering, expert testimony, or hearsay. Their order held that the polygraph evidence would be admissible, but on the condition that the defendants first testified at their respective trials and subjected themselves to cross-examination. Their ruling was also premised on each defendant agreeing to sit for a second polygraph test administered by the State, which the judges reasoned would mitigate concerns relating to possible bias by a “friendly” 3 The other defendant later pleaded guilty to the charged offense and is not a party on appeal. -4- 7326 examiner4 and add additional “guarantees of trustworthiness.”5 B. State v. Sharpe In a case unrelated to Alexander’s, Jyzyk Sharpe was charged with murder and manslaughter in connection with the death of his girlfriend’s two-year-old son. Sharpe also hired Dr. Raskin to administer a polygraph examination, after which Dr. Raskin concluded that Sharpe answered truthfully when he denied the charges against him. Before trial, the State moved to preclude Sharpe’s polygraph evidence and Dr. Raskin’s testimony. As in Alexander’s case, the State argued that polygraph examinations are not supported by valid science and that additional accuracy problems are presented in the case of a “friendly” polygraph examiner. For those reasons, the State argued that the polygraph testimony should be excluded under Alaska Evidence Rule 403 because its probative value would be outweighed by risks of unfair prejudice, confusion, delay, and wasted time. The State also argued that the proposed testimony included inadmissible hearsay, that the testimony was inadmissible as expert testimony under Daubert/Coon and under the Alaska Rules of Evidence, and that the testimony was 4 The “friendly examiner” bias hypothesis was explored at the evidentiary hearing. The hypothesis posits that when a polygraph examiner is hired by the defense and the test is administered to the defendant without giving the prosecution notice or an opportunity to observe, various factors might work together to bias the examination in ways favorable to the defendant “passing” the test. The validity of this hypothesis and the extent to which a “friendly” examiner might affect the results of a polygraph examination are disputed. See PAUL C. GIANNELLI ET AL., 1 SCIENTIFIC EVIDENCE § 8.03[f], at 460 (5th ed. 2012). 5 It appears the superior court was under the belief that Alexander had already been subjected to a polygraph examination administered by the Department of Corrections. It was later clarified that no such test had taken place, but Alexander did agree to sit for a State-administered exam. The parties appear to have proceeded with the understanding that doing so was a prerequisite for admitting the polygraph evidence. -5- 7326 inadmissible character evidence under Evidence Rule 608. No new Daubert/Coon hearing was held; instead, Superior Court Judge Eric Smith relied on the record and evidence presented in Alexander’s Daubert/Coon evidentiary hearing. The superior court held that the testimony would be admissible pursuant to the same reasoning as in that case. However, the court added the additional limiting instruction that the polygraph examiners — Dr. Raskin and the State’s examiner — could testify only to whether Sharpe “believed what he was saying” and not to whether he was “telling the truth”; the court reasoned that the latter would impermissibly imply that a polygraph test can reveal whether a statement is objectively accurate. During a second polygraph test, administered for the State by former FBI agent Kendall Shull, Sharpe prematurely terminated the examination when Shull asked Sharpe if he was using countermeasures6 against the polygraph test. The State asked the court to reconsider the admissibility of Dr. Raskin’s testimony based on Sharpe’s lack of cooperation with the second examination. The court ultimately reaffirmed its original decision, ruling that Dr. Raskin’s testimony was admissible but that the State could present evidence of Sharpe’s lack of cooperation in rebuttal. C. State v. Holt Jeffery Holt was charged with five counts of first-degree sexual assault. Before trial, Holt hired Dr. Raskin to administer a polygraph examination, after which 6 The term “countermeasures” refers to conscious efforts by an examinee to manipulate the results of a polygraph examination by altering the physiological indicators measured by the polygraph. Classes of countermeasures include using drugs or alcohol to suppress responses to questions; physical techniques such as breath control, biting one’s tongue, or contracting various muscles to create artificial responses; or mental techniques such as disassociation or counting backward to either suppress or create responses. See generally GIANNELLI ET AL., supra note 4 § 8.03[d], at 458-59; NAT’L RESEARCH COUNCIL, THE POLYGRAPH AND LIE DETECTION 4-5, 139-48 (2003), https://doi.org/10.17226/10420. -6- 7326 Dr. Raskin concluded Holt was being truthful when he denied the charges on the grounds that the alleged victim consented to sexual activity. In lieu of a Daubert/Coon hearing, both parties suggested and the court agreed it could determine the admissibility of Dr. Raskin’s testimony by reviewing the record of the hearing and subsequent order in Alexander’s case. The parties also submitted additional scholarly articles on polygraph testing, an audio recording of Holt’s polygraph examination, the raw data from that examination, and the prosecutor’s recorded interview of Dr. Raskin about the procedure used in that examination. Superior Court Judge Charles Huguelet reviewed the evidence from Alexander’s case, heard oral argument, and then concluded that polygraph evidence is not sufficiently reliable to be admitted. The court further concluded that Dr. Raskin’s testimony would in any case be inadmissible under the evidence rules governing character evidence, bolstering, and prior consistent statements, as well as under the Rule 403 balancing test. After a jury trial, Holt was convicted of one count of first-degree sexual assault and four counts of second-degree sexual assault; he was sentenced to 28 years imprisonment with 8 suspended. D. Proceedings In The Court Of Appeals In Alexander’s case, the State filed a petition for review to the court of appeals challenging the conclusion that the proposed polygraph testimony was admissible; Alexander filed a cross-petition challenging the conditions that he agree to testify and agree to submit to a State-administered polygraph exam.7 In its decision, the court of appeals observed that in accordance with our opinion in Coon, determinations regarding the validity of scientific evidence are reviewed on appeal only for abuse of 7 State v. Alexander, 364 P.3d 458, 460 (Alaska App. 2015). -7- 7326 discretion.8 The court expressed concern about applying such a deferential standard and suggested that this court should revisit Coon and adopt a more probing standard of review.9 The court explained: As it happened, [Judges Miller and Schally] reached the same conclusion regarding the scientific validity of polygraph examinations. But, as illustrated by the competing testimony offered by Dr. Raskin and Dr. Iacono, this is clearly a matter on which reasonable people can differ — and on which they do differ. Thus, the two judges in this case might easily have reached differing conclusions regarding the scientific validity of polygraph examinations, even though they heard exactly the same evidence. And if the two judges had reached different conclusions, we apparently would have been required to affirm both of the conflicting decisions under the “abuse of discretion” standard of review. .... This essentially means that the scientific validity of polygraph evidence will never be judicially resolved at an appellate level: it will remain an open question, and it will need to be litigated anew each time the issue is raised.[10] Ultimately, applying the abuse of discretion standard of review, the court of appeals affirmed the order admitting Dr. Raskin’s testimony.11 The court also upheld the 8 Id. at 466. 9 Id. at 466, 468. 10 Id. (emphasis in original). 11 Id. at 471. -8- 7326 conditions on admissibility imposed by the superior court.12 In Sharpe’s case, the State again filed a petition for review challenging the ruling admitting Dr. Raskin’s testimony; the court of appeals denied the petition based on its ruling in Alexander. The State filed petitions for hearing to this court in both cases; Alexander and Sharpe filed a joint cross-petition challenging the requirement that they agree to testify before their respective polygraph evidence could be admitted.13 We granted all three petitions and consolidated the cases for briefing. Holt appealed his convictions and his sentence to the court of appeals. One of Holt’s grounds for appeal was Judge Huguelet’s order excluding Dr. Raskin’s testimony. The court of appeals reasoned that the polygraph issue in Holt’s case was the same as the one in State v. Alexander, and that the trial court’s decision “present[ed] the very problem that [the court] noted when [it] decided Alexander: the problem that reasonable judges who heard exactly the same evidence concerning polygraph testing could rationally reach differing conclusions as to whether polygraph evidence meets the Daubert test for admission.” Because we had already granted review of Alexander’s and Sharpe’s cases, the court of appeals severed Holt’s polygraph question and certified it to this court, again asking us to revisit the applicable standard of review.14 We accepted certification and consolidated Holt’s case with Sharpe’s and Alexander’s. 12 Id. 13 Sharpe and Alexander are no longer challenging the requirement that they submit to a state-administered polygraph exam if requested to do so. 14 We are not presented with the other issues and arguments raised in Holt’s initial appeal to the court of appeals, and we do not address them. -9- 7326 III. STANDARD OF REVIEW Broadly speaking, we review the admission or exclusion of evidence for abuse of discretion.15 But whether the trial court applied the correct legal rule is a question of law subject to de novo review.16 Similarly, “[w]hen the admissibility of evidence ‘turns on . . . the correct scope or interpretation of a rule of evidence, we apply our independent judgment.’ ”17 Findings of fact underlying a judgment of the superior court are reviewed for clear error, which we will find “if a review of the entire record leaves us with a definite and firm conviction that a mistake has been made.”18 In State v. Coon we addressed the applicable standards of review for a decision admitting or excluding scientific evidence and concluded that a “determination of reliability under Daubert” is “best left to the discretion of the trial court.”19 However, whether to revisit the standard outlined in Coon is one of the issues raised on appeal and 15 Timothy W. v. Julia M., 403 P.3d 1095, 1100 (Alaska 2017) (citing State v. Carpenter, 171 P.3d 41, 63 (Alaska 2007)). 16 Id. (citing Carpenter, 171 P.3d at 63). 17 Sanders v. State, 364 P.3d 412, 419-20 (Alaska 2015) (cleaned up) (quoting Barton v. N. Slope Borough Sch. Dist., 268 P.3d 346, 350 (Alaska 2012)). 18 Kiva O. v. State, Dep’t of Health & Soc. Servs., Office of Children’s Servs., 408 P.3d 1181, 1186 (Alaska 2018) (quoting Bigley v. Alaska Psychiatric Inst., 208 P.3d 168, 178 (Alaska 2009)). We have not previously stated explicitly what standard of review applies to findings of fact preliminary to evidentiary rulings. However, under Alaska Evidence Rule 104(b), “[w]hen the relevancy of evidence depends upon the fulfillment of a condition of fact, the court shall admit it upon, or subject to, the introduction of evidence sufficient to support a finding of the fulfillment of the condition.” Thus, the relevant question on appeal is whether there is sufficient evidence in the record to support the necessary factual finding, i.e., whether that finding is clearly erroneous. 19 974 P.2d 386, 399 (Alaska 1999). -10- 7326 one which the court of appeals has explicitly urged us to reconsider. When deciding whether to overrule a prior decision, we will do so only when “clearly convinced that the rule was originally erroneous or is no longer sound because of changed conditions, and that more good than harm would result from a departure from precedent.”20 A previous decision may be considered “originally erroneous” if it “proves to be unworkable in practice.”21 IV. DISCUSSION A. The Daubert/Coon Standard Under Alaska Evidence Rule 702(a), a qualified expert witness may testify to “scientific, technical, or other specialized knowledge” if that knowledge “will assist the trier of fact to understand the evidence or to determine a fact in issue.” In Daubert v. Merrell Dow Pharmaceuticals, Inc.,22 the United States Supreme Court set forth new requirements for admitting scientific evidence under the equivalent Federal Rule of Evidence. Prior to Daubert the prevailing standard had been established in Frye v. United States, under which an “expert opinion based on a scientific technique is inadmissible unless the technique is ‘generally accepted’ as reliable in the relevant scientific community.”23 Daubert concluded that the Frye test was superseded by the 20 Young v. State, 374 P.3d 395, 413 (Alaska 2016) (quoting Pratt & Whitney Canada, Inc. v. Sheehan, 852 P.2d 1173, 1176 (Alaska 1993)). 21 Thomas v. Anchorage Equal Rights Comm’n, 102 P.3d 937, 943 (Alaska 2004) (quoting Pratt & Whitney Canada, Inc., 852 P.2d at 1176). 22 509 U.S. 579 (1993). 23 Id at 584 (citing Frye v. United States, 293 F. 1013, 1014 (D.C. App. 1923)). -11- 7326 adoption of the Federal Rules of Evidence.24 The new standard laid out in Daubert is two-pronged. First, the court must determine whether the proffered testimony is based on “scientific knowledge,” meaning that it is “derived by the scientific method” and “supported by appropriate validation”25 — in short, that it is “scientifically valid.”26 Second, because Evidence Rule 702 requires that the testimony must “assist the trier of fact to understand or determine a fact in issue,” the court must determine “whether the reasoning or methodology underlying the testimony . . . properly can be applied to the facts in issue.”27 The Daubert Court also outlined a number of key considerations relevant to the determination of scientific validity, although it noted that these considerations were not “a definitive checklist or test.”28 The first question is whether the scientific theory or technique in question can be and has been empirically tested.29 The second is whether the theory or technique “has been subjected to peer review and publication.”30 But the Supreme Court cautioned that publication, including in a peer-reviewed journal, “does not necessarily correlate with reliability”; rather, the Court reasoned that publication and peer review is relevant because “submission to the scrutiny of the scientific community is a component of ‘good science,’ in part because it increases the 24 Id. at 587. 25 Id. at 590. 26 Id. at 593. 27 Id. at 592-93. 28 Id. at 593. 29 Id. 30 Id. -12- 7326 likelihood that substantive flaws in the methodology will be detected.”31 The third consideration that the Court found relevant is “the known or potential rate of error, and the existence and maintenance of standards controlling the technique’s operation.”32 And finally, although Daubert rejected general acceptance in the scientific community as an absolute prerequisite to admissibility, the Supreme Court recognized that “[w]idespread acceptance can be an important factor in ruling particular evidence admissible, and ‘a known technique which has been able to attract only minimal support within the community,’ may properly be viewed with skepticism.”33 In 1999 we adopted Daubert as the applicable admissibility standard for scientific expert testimony under the Alaska Rules of Evidence in State v. Coon.34 B. Polygraph Testing And The Comparison Question Technique This opinion concerns the admissibility of expert testimony regarding the results of a polygraph examination, informally known as a “lie detector test.” However, it does not concern the entire field of polygraph testing; rather, it involves the technique known as the “comparison question test” or “control question test” (CQT).35 The following is a summary of the undisputed aspects of CQT polygraph testing. 31 Id. 32 Id. at 594 (internal citations omitted). 33 Id. (quoting United States v. Downing, 753 F.2d 1224, 1238 (3d Cir. 1985)). 34 974 P.2d 386, 393-94 (Alaska 1999). 35 The technique was originally known as the “control question” technique; “comparison question” is now the preferred term because the technique does not use a “control” as that term is understood in the scientific community. See GIANNELLI ET AL., supra note 4 § 8.02[a], at 437. For simplicity, we refer to the technique primarily by the shorthand “CQT.” -13- 7326 In all polygraph examinations, whether the CQT or some other approach is used, the examinee is connected to a polygraph, an instrument that measures multiple physiological phenomena: pulse rate, blood pressure, respiration rate, and galvanic skin response in the hands and fingers.36 It is generally accepted that the polygraph is a highly sensitive instrument capable of measuring these physiological phenomena.37 The CQT exams Dr. Raskin administered in these cases are a form of specific-incident polygraph testing, as opposed to a polygraph examination for screening or background check purposes.38 Screening tests ask about a broad range of conduct, such as whether the examinee has ever committed a crime or used illegal drugs, but specific-incident tests, like the ones Dr. Raskin administered, focus on a particular crime, event, or other occurrence under investigation.39 The CQT examiner asks three types of questions: “neutral” or “irrelevant” questions (“Is your name Thomas?”), broad “control” or “comparison” questions (“During the first 35 years of your life, did you ever engage in a sexual act of which you should be ashamed?”), and specific “relevant” 36 NAT’L RESEARCH COUNCIL, supra note 6, at 12-13; John Synnott et al., A Review of the Polygraph: History, Methodology and Current Status, 1 CRIME PSYCH. REV. 59, 62-65 (2015). Galvanic skin response, also known as electrodermal response, refers to the electrical conductivity of the skin, which is affected by activity in the skin’s sweat glands. See NAT’L RESEARCH COUNCIL, supra note 6, at 81, 155. 37 See GIANNELLI ET AL., supra note 4 § 8.02[c], at 439. 38 See NAT’L RESEARCH COUNCIL, supra note 6, at 1 (“Polygraph testing is used for three main purposes: event-specific investigations (e.g., after a crime); employee screening, and preemployment screening. The different uses involve the search for different kinds of information and have different implications.”). 39 Id. at 23-24. -14- 7326 questions (“Did you ever touch G.B.’s breast?”).40 Each comparison question will ask about a broad category of past conduct, similar to but excluding the specific occurrence being investigated, and each question will be specifically designed to be ambiguous, broad, and vague but elicit a “No” answer.41 Because the comparison questions are broadly worded and address sensitive topics, the examinee is assumed to be deceptive or at least unsure of his answer.42 The underlying rationale of the CQT is that deceptive subjects will feel more threatened by the relevant questions and will view the comparison questions as less important; thus, deceptive subjects will have a stronger physiological reaction to the relevant questions.43 In contrast, truthful subjects are expected to feel more threatened by the comparison questions and will have a stronger physiological reaction than to the truthfully answered relevant questions.44 There are two reasons for 40 See GIANNELLI ET AL., supra note 4 § 8.02[e], at 442-43; NAT’L RESEARCH COUNCIL, supra note 6, at 254-55; David C. Raskin & Charles R. Honts, The Comparison Question Test, in HANDBOOK OF POLYGRAPH TESTING 1, 5-27 (Murray Kleiner ed., 2001). 41 Raskin & Honts, supra note 40, at 15. If the examinee answers a comparison question affirmatively, indicating that some past event matches the described conduct, the examiner will elicit an explanation of that event before repeating the question in a way that excludes the admitted conduct (“Other than what you told me, . . . did you ever . . . .”). Id. at 16. In a variant of the CQT known as the “directed lie test,” the examinee is simply instructed to lie to the comparison question and informed that the results will be inconclusive if there is not a strong enough response. Id. at 23; see also GIANNELLI ET AL., supra note 4 § 8.02[e], at 444; Synnot et al., supra note 36, at 67-68. 42 See Raskin & Honts, supra note 40, at 15. 43 GIANNELLI ET AL., supra note 4 § 8.02[e], at 441; NAT’L RESEARCH COUNCIL, supra note 6, at 14-15, 70-71, 255. 44 GIANNELLI ET AL., supra note 4 § 8.02[e], at 441; NAT’L RESEARCH COUNCIL, supra note 6, at 14-15, 70-71, 255. -15- 7326 this expectation: first, the sensitive topic of the comparison questions is assumed to generate a response; second, the examiner will have explained prior to the exam that the examinee’s reactions to the comparison questions are important to the ultimate test result.45 Thus, the CQT is based on the premise that the relative magnitudes of the examinee’s reactions to the relevant and comparison questions are indicative of his truthfulness or lack thereof when answering the relevant questions.46 The examiner asks the examinee a list of prepared questions multiple times.47 For each relevant question, the examiner will compare the subject’s reaction to his reaction to an adjacent comparison question.48 Each measured parameter is given a numerical score for each question pair, for example from -3 to +3, with a positive number indicating a stronger reaction to the comparison question and a negative number indicating a stronger reaction to the relevant question.49 The examiner totals the numerical scores:50 a high positive overall score is interpreted as indicating a truthful result; a high negative score is interpreted as indicating deception; a score close to zero, 45 Raskin & Honts, supra note 40, at 15-16. 46 GIANNELLI ET AL., supra note 4 § 8.02[e], at 441; NAT’L RESEARCH COUNCIL, supra note 6, at 14-15, 70, 255; Raskin & Honts, supra note 40, at 7, 18-21. 47 Raskin & Honts, supra note 40, at 17-18. 48 Id at 7, 19. 49 GIANNELLI ET AL., supra note 4 § 8.02[f], at 445-46; Raskin & Honts, supra note 40, at 19. 50 Depending on the circumstances and the need for particularized test results, the scores may be totaled either for the test as a whole or for each relevant question individually. Raskin & Honts, supra note 40, at 20. -16- 7326 whether positive or negative, is considered inconclusive.51 As will be explained in further detail below, the main scientific criticisms of CQT polygraph testing relate to the validity and testability of the assumptions underlying the technique. C. The Appellate Standard Of Review For Scientific Evidence Rulings The first question we must address is what standard of review the appellate court should apply to appeals from a Daubert/Coon determination made by the trial court. Our current standard, which the court of appeals urges us to reconsider, is the one laid out in State v. Coon: abuse of discretion.52 In Coon the superior court held an evidentiary hearing to determine whether proffered expert testimony on spectrographic voice identification would be admissible under Frye’s general-acceptance standard; the superior court then admitted the testimony.53 After an initial appeal, we remanded the case with directions to the superior court to enter findings of fact and conclusions of law relating to Evidence Rule 703, as well as detailed findings of fact and conclusions of law under both the Frye and Daubert standards; the superior court on remand determined the testimony was admissible under both standards.54 On appeal again we expressly adopted the Daubert standard,55 and we then considered the superior court’s ruling admitting the evidence under this newly 51 GIANNELLI ET AL., supra note 4 § 8.02[f], at 446; Raskin & Honts, supra note 40, at 20. 52 974 P.2d 386 (Alaska 1999). 53 Id. at 388. 54 Id. at 389. 55 Id. at 389-98. -17- 7326 adopted standard.56 The superior court’s conclusion was based on a number of preliminary findings: it found that the technique of spectrographic voice identification “had been empirically tested,” that it “had been subjected to peer review and publication,” that “when properly performed . . . voice spectrography has a known error rate of less than one percent,” that “when voice spectrography is properly performed by a qualified person, it has attained widespread acceptance within the relevant scientific community,” that “the reasoning and methodology underlying [the expert’s] testimony were scientifically valid,” and that the expert in that case “had properly performed the voice spectrographic analysis.”57 We examined each of those preliminary findings in turn, and concluded for each finding that the superior court “did not err” in making it.58 We then reviewed for abuse of discretion the superior court’s definition of the “relevant scientific community” and its ultimate determination, in light of its preliminary findings, that the evidence presented satisfied the Daubert standard.59 We noted that “the majority of the federal circuits have chosen to apply the abuse of discretion standard when reviewing district court decisions under Daubert,” and that “the Supreme Court [had] recently 56 Id. at 398-403. 57 Id. at 400. 58 Id. at 401-02 (“[T]he trial court did not err in finding on remand that this technique has been subjected to empirical testing. . . . [T]he trial court did not err in finding on remand that the technique had been subjected to peer review and publication . . . . The trial court did not err in finding on remand that the known error rate . . . was sufficiently low to make this evidence reliable. . . . [W]e do not find that the trial court clearly erred in making its general acceptance finding . . . .”). 59 See id. (“[W]e conclude that the trial court did not abuse its discretion in determining the relevant scientific community[,] . . . in ruling that the evidence satisfied Daubert[,] . . . [or] in finding the voice spectrographic evidence admissible . . . .”). -18- 7326 approved the abuse of discretion standard in General Electric Co. v. Joiner.”60 Justice Fabe dissented from the court’s opinion. She argued that applying “an abuse of discretion standard of review to the validity of scientific techniques will most likely lead to inconsistent treatment of similarly situated claims.”61 This non- uniformity, she suggested, “must be reconciled at the appellate level. Otherwise, inconsistent jury verdicts, widely disparate compensation for similar injuries, and erroneous criminal verdicts will continue to erode public confidence in our justice system.”62 Justice Fabe explained that “[t]he reliability of scientific evidence does not change from one case to the next; a scientific method is either reliable or unreliable.”63 For that reason, her dissent advocated reviewing “the question of the validity of scientific information” de novo, while reviewing for abuse of discretion “a trial judge’s assessment of the competency of a particular expert witness to render an opinion.”64 Prior to our decision in Coon, a number of commentators had criticized the federal courts’ abuse of discretion standard and proposed a hybrid standard similar to the one described in Justice Fabe’s dissent.65 For example, Professor David Faigman argued 60 Id. at 399 (citing cases from the Courts of Appeal for the First, Second, Fourth, Fifth, Sixth, Eighth, Ninth, Tenth, and D.C. Circuits, and citing General Electric Co. v. Joiner, 522 U.S. 136 (1997)). 61 Id. at 404 (Fabe, J., dissenting). 62 Id. (Fabe, J., dissenting) (quoting Jay P. Kesan, An Autopsy of Scientific Evidence in a Post-Daubert World, 84 GEO. L.J. 1985, 2037 (1996)). 63 Id. at 404-05 (Fabe, J., dissenting). 64 Id. at 405 (Fabe, J., dissenting). 65 See, e.g., Confronting the New Challenges of Scientific Evidence, 108 HARV. L. REV. 1509, 1528 (1995); David L. Faigman, Appellate Review of Scientific (continued...) -19- 7326 in a 1997 law review article that the relevance and reliability of scientific evidence “involves several layers of scientific work” and that different standards of review should apply to each.66 According to Faigman, “[w]hen the scientific evidence transcends the particular case, the appellate court should apply a ‘hard-look’ or de novo review to the basis for the expert opinion,”67 but “[w]hen the scientific evidence involves facts specific to the particular case, the appellate court should defer to the trier of fact below.”68 Although all federal circuits have adopted Joiner’s69 abuse of discretion standard for appellate review,70 a number of state courts have ruled to the contrary and 65 (...continued) Evidence Under Daubert and Joiner, 48 HASTINGS L.J. 969, 976 (1997); David L. Faigman et al., Check Your Crystal Ball at the Courthouse Door, Please: Exploring the Past, Understanding the Present, and Worrying About the Future of Scientific Evidence, 15 CARDOZO L. REV. 1799, 1822 (1994); Michael H. Gottesman, From Barefoot to Daubert to Joiner: Triple Play or Double Error?, 40 ARIZ. L. REV. 753, 776-80 (1998); Jay P. Kesan, An Autopsy of Scientific Evidence in a Post-Daubert World, 84 GEO. L.J. 1985, 2038 (1996). 66 Faigman, Appellate Review, supra note 65, at 976. 67 Id. 68 Id. 69 General Elec. Co. v. Joiner, 522 U.S. 136 (1997). 70 See Hughes v. Kia Motors Corp., 766 F.3d 1317, 1331 (11th Cir. 2014); Calhoun v. Yamaha Motor Corp., U.S.A., 350 F.3d 316, 320 (3d Cir. 2003); Dura Auto. Sys. of Indiana, Inc. v. CTS Corp., 285 F.3d 609, 617 (7th Cir. 2002); Raskin v. Wyatt Co., 125 F.3d 55, 65-66 (2d Cir. 1997); United States v. Kayne, 90 F.3d 7, 11 (1st Cir. 1996); Duffee ex rel. Thornton v. Murray Ohio Mfg. Co., 91 F.3d 1410, 1411 (10th Cir. 1996); Benedi v. McNeil-P.P.C., 66 F.3d 1378, 1384 (4th Cir. 1995); Pedraza v. Jones, 71 F.3d 194, 197 (5th Cir. 1995); American & Foreign Ins. Co. v. General Elec. Co., 45 F.3d 135, 137 (6th Cir. 1995); Hose v. Chicago N.W. Transp. Co., 70 F.3d 968, 972 (8th Cir. 1995); United States v. Chischilly, 30 F.3d 1144, 1152 (9th Cir. 1994); Joy v. Bell (continued...) -20- 7326 adopted a stricter standard of review. For example, the New Mexico Supreme Court held in Lee v. Martinez that the validity of a particular scientific theory is a form of “legislative fact” not specific to the circumstances of any particular case, and it therefore applies de novo review to such questions.71 Other states that have adopted a hybrid or de novo standard of review for Daubert determinations include Oklahoma,72Washington,73 Kentucky,74 New Hampshire,75 West Virginia,76 and 70 (...continued) Helicopter Textron, Inc., 999 F.2d 549, 567 (D.C. Cir. 1993). 71 96 P.3d 291, 296 (N.M. 2004). 72 Taylor v. State, 889 P.2d 319, 331-32 (Okla. Crim. App. 1995) (“[A] trial judge’s decision to admit novel scientific evidence” is subject to “an independent, thorough review . . . not limited by deference to the trial judge’s discretion”). 73 State v. Cauthron, 846 P.2d 502, 505 (Wash. 1993) (“We review the trial court’s decision to admit or exclude novel scientific evidence de novo.”), overruled in part on other grounds by State v. Buckner, 941 P.2d 667 (Wash. 1997). 74 Miller v. Eldridge, 146 S.W.3d 909, 915 (Ky. 2004) (explaining that “findings of fact, i.e. reliability or non-reliability” are reviewed for clear error and “discretionary decisions, i.e. whether the evidence will assist [the] trier of fact and the ultimate decision as to admissibility” are reviewed for abuse of discretion). 75 State v. Dahood, 814 A.2d 159, 161 (N.H. 2002) (“Generally, we review the trial court’s rulings on evidentiary matters, including those regarding the reliability of novel scientific evidence, with considerable deference . . . . When the reliability or general acceptance of novel scientific evidence is not likely to vary according to the circumstances of a particular case, however, we review that evidence independently.”). 76 State v. Beard, 461 S.E.2d 486, 492 n.5 (W. Va. 1995) (explaining that West Virginia appellate courts review de novo whether “the reasoning or methodology underlying the testimony is scientifically valid,” but that whether the scientific evidence “will assist the trier of fact to understand the evidence or to determine a fact in issue” is reviewed under the abuse of discretion standard). -21- 7326 Oregon.77 In states that continue to apply the Frye standard of general acceptance, most apply de novo review on appeal.78 The primary concern raised by jurisdictions applying abuse of discretion review, as well as by commentators and Justice Fabe’s dissent in Coon, is the potential for inconsistent rulings in similarly situated cases. Our opinion in Coon dismissed this concern, finding it unlikely “that the inconsistency will be of such magnitude as to ‘compromise the integrity of the judiciary in the eyes of the public.’ ”79 In light of the posture of the cases now before us, we may have been too optimistic. If two defendants offer similar scientific testimony and — after separate evidentiary hearings — one judge deems the testimony to be scientifically valid while another does not, that could be the result of differences between the particular cases and differences in the evidence presented at the hearings. But when the judge in the latter case relied on the evidentiary hearing from the first, and reached the opposite conclusion based on identical evidence, it is clear that the difference in outcome cannot be attributed to a difference in the amount or quality of the evidence. That is essentially what happened in these cases: the scientific evidence 77 State v. Lyons, 924 P.2d 802, 805 (Or. 1996) (“Notwithstanding the usual deference to trial court discretion, we review [a] ruling on the admissibility of scientific evidence de novo.” (emphasis in original) (internal citation omitted)). 78 See, e.g., Goeb v. Tharaldson, 615 N.W.2d 800, 814 (Minn. 2000) (explaining that under Minnesota’s Frye-Mack standard, “the trial judge defers to the scientific community’s assessment of a given technique, and the appellate court reviews de novo the legal determination of whether the scientific methodology has obtained general acceptance in the scientific community”); Brim v. State, 695 So. 2d 268, 274 (Fla. 1997) (explaining that “[a]ppellate review of a Frye determination will be treated as a matter of law” and be reviewed de novo). 79 State v. Coon, 974 P.2d 386, 399 (Alaska 1999) (quoting Coon, 974 P.2d at 404 (Fabe, J., dissenting)). -22- 7326 Alexander and Sharpe presented was deemed valid and admissible by the judges in their cases; essentially identical evidence based on the same scientific principles was deemed unreliable as a matter of law and inadmissible in Holt’s case, even though the trial judge relied on the very testimony presented at Alexander’s Daubert hearing.80 This raises at least the appearance of arbitrariness, i.e., the appearance that the outcome of a Daubert determination in our courts depends more on which judge was assigned to the case than on the objective application of law to the evidence presented. Regardless of how accurate this appearance might be, it certainly has the potential to raise serious questions in the eyes of the public about the integrity of our judicial system, particularly when such inconsistencies occur in the context of serious criminal proceedings. We explained in Coon that “the premise that the scientific validity of a technique is a legal issue which does not turn on case-sensitive facts” fails to “adequately take account of the reality of the judicial process and the variable state of science.”81 We quoted with approval the New Mexico Supreme Court’s reasoning that the idea that appellate courts are best suited to rule on the validity of a scientific theory or technique assumes “that the record on appeal contains all of the relevant, most recent data concerning the scientific method” and that “there is always a reservoir of scientific literature that an appellate court might independently reference in a de novo review.”82 We also expressed concern about making determinative rulings at all, again noting the New Mexico Supreme Court’s reasoning that “the state of science is not constant; it 80 An evidentiary hearing in which the judge considers the admissibility of expert testimony is also known as a Daubert hearing, and will be hereafter referred to as such. 81 Coon, 974 P.2d at 399. 82 Id. (quoting State v. Alberico, 861 P.2d 192, 205 (N.M. 1993)). -23- 7326 progresses daily.”83 We explained that “[t]he principal reason for adopting the Daubert standard is to give the courts greater flexibility in determining the admissibility of expert testimony, so as to keep pace with science as it evolves,” and concluded that abuse of discretion review “best comports with these aims.”84 We do not take these concerns lightly: the record on appeal is limited to the testimony and exhibits in the superior court’s case file,85 so there is a non-negligible risk that reviewing the validity of scientific evidence de novo could lead us or the court of appeals to decide a case involving the admissibility of scientific evidence based on incomplete information. But the superior court is also limited to the testimony and evidence presented at the hearing. And appellate courts will often have more time than trial courts to mitigate that risk through careful study of secondary sources such as scientific treatises and surveys of academic literature in the relevant field. Overturning a prior appellate decision requires showing that the decision was either “originally erroneous or is no longer sound because of changed conditions.”86 If an appellate court has made a Daubert determination and then new scientific research becomes available, or if a litigant identifies research that the appellate court overlooked, the trial court would be justified in holding an evidentiary hearing to make a complete record and rule in the alternative. The appellate court would then have the ability to reconsider admissibility under Daubert and Coon. In either case, presenting this new or overlooked evidence is no more of a burden on litigants than the burden they would 83 Id. (quoting Alberico, 861 P.2d at 205). 84 Id. 85 Alaska R. App. P. 210(a). 86 Young v. State, 374 P.3d 395, 413 (Alaska 2016) (quoting Pratt & Whitney Canada, Inc. v. Sheehan, 852 P.2d 1173, 1176 (Alaska 1993)). -24- 7326 otherwise have to present relevant evidence at an original Daubert hearing. In short, Coon’s fears that de novo review of Daubert determinations would result in the law of scientific evidence becoming set or stagnant and unchanging appear somewhat exaggerated. However, for the reasons discussed above, de novo review will not necessarily allow appellate courts to decide once and for all time whether a particular technique is scientifically valid, as the court of appeals seems to hope. Nonetheless, adopting a less deferential standard of review on appeal would allow trial courts and parties to avoid repeatedly relitigating the validity of scientific evidence, saving the court and parties the time, effort, and cost of a Daubert hearing — at least absent new or previously overlooked research and evidence. It would also ensure that the admissibility of scientific evidence is consistent throughout the courts of this state. For these reasons, we agree with the court of appeals — and with the dissent in Coon — that a more probing standard of review is warranted in an appeal from a Daubert determination.87 As explained above, our decision in Coon reviewed the preliminary findings underlying the superior court’s application of the Daubert standard — whether the technique had been tested, whether it had been subject to publication and peer review, etc. — for clear error, but reviewed the court’s ultimate determination of reliability for abuse of discretion.88 Going forward, we will instead apply our independent judgment to the question whether — based on the evidence presented and 87 This approach is consistent with our standard of review in a number of other contexts. For example, we have explained in the context of reviewing a denial of a motion to suppress evidence that although “[t]he trial court’s findings of fact will not be disturbed unless they are clearly erroneous,” the question “[w]hether the trial court’s findings support its legal conclusions is a question we answer with our independent judgment.” State v. Wagar, 79 P.3d 644, 650 (Alaska 2003) (quoting State v. Joubert, 20 P.3d 1115, 1118 (Alaska 2001)). 88 Coon, 974 P.2d at 400-02. -25- 7326 the scientific literature available — the technique or theory underlying the proposed expert testimony is sufficiently reliable to satisfy Daubert and Coon.89 In sum, we will limit our independent review to the broad question whether the underlying scientific theory or technique is “scientifically valid” under the first prong of the Daubert analysis.90 D. Admissibility 1. Alaska’s case law on polygraph testing Although we have not previously addressed the admissibility of polygraph evidence under Daubert and Coon, a discussion of our pre-Daubert case law on the subject provides useful context and perspective. In 1970 we concluded in Pulakis v. State that polygraph evidence offered in a criminal trial is generally inadmissible.91 Pulakis was convicted of larceny after a jury trial.92 At trial the prosecution introduced testimony from a police polygraph examiner that Pulakis underwent two polygraph examinations and that, in the examiner’s opinion, “the examinations revealed that deceptive answers were given to four crucial questions.”93 Pulakis challenged his 89 Whether the evidence being offered is ultimately admissible will also depend on case-specific factors, including whether the evidence is helpful to the trier of fact, whether the relevant scientific theory or technique “properly can be applied to the facts in issue,” and whether the proposed expert testimony satisfies or runs afoul of other evidentiary rules. Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 592-95 (1993); see also Alaska R. Evid. 702. These questions generally fall within the discretion of the trial court, and we will review them accordingly. 90 Daubert 509 U.S. at 592-95. 91 476 P.2d 474, 478-79 (Alaska 1970). 92 Id. at 474-75. 93 Id. at 477. -26- 7326 conviction on appeal, arguing that admitting the polygraph testimony was plain error.94 Citing Frye, as well as language from some of our previous opinions, we observed that “[t]he general rule is that the results of polygraph tests are not admissible in evidence.”95 We explained that “judicial antipathy” to polygraph evidence had not diminished significantly since Frye was decided in 1923, and that court decisions considering the issue “reflect a high degree of sensitivity to the numerous potential sources of error in the ascertainment of deception through polygraph examinations.”96 We concluded that the “central problem regarding admissibility is not that polygraph evidence has been proved unreliable, but that polygraph proponents have not yet developed persuasive data demonstrating its reliability.”97 We therefore held that, although we were “not prepared to say whether polygraph examiners’ opinions are reliable[,] . . . the results of polygraph examinations should not be received in evidence over objection.”98 However, we ultimately upheld Pulakis’s conviction because he had waived objection to the evidence at trial and we did not “find polygraph tests so demonstrably unreliable as to require a finding of plain error.”99 After we decided Pulakis, several cases in the court of appeals dealt not with the admissibility of polygraph evidence directly, but rather with the admissibility 94 Id. at 476. 95 Id. at 477 (quoting Gafford v. State, 440 P.2d 405, 410 (Alaska 1968)). 96 Id. at 478. 97 Id. at 479. 98 Id. 99 Id. at 479-80. -27- 7326 of references in other testimony to a party’s willingness to submit to a polygraph test.100 The court of appeals noted that “[d]espite its unreliability, polygraph evidence might be perceived by the jury as a complete answer to questions of credibility” and “could also lull the jury into a false sense of security and result in the jury failing to carefully scrutinize conflicting witness testimony.”101 Similarly, the court of appeals was concerned that “a jury may conclude that a witness’s willingness to take a polygraph test is circumstantial evidence that the witness is telling the truth,” and therefore concluded that even references to polygraph tests should be either inadmissible or subject to significant limiting instructions.102 The court of appeals first considered the admissibility of polygraph test results in Haakanson v. State.103 In that case the court was asked to reconsider Pulakis and find polygraph testimony admissible in light of alleged changes in polygraph technology and increased “acceptance among polygraph examiners of the polygraph’s reliability to show truthfulness.”104 The court of appeals applied Frye’s general acceptance standard: it concluded that for purposes of that analysis, the relevant question could not be limited to the acceptance of polygraph testing among polygraph examiners; rather, the court decided that under our decision in Contreras v. State, the “relevant scientific community” includes the “professions which have studied and/or utilized [the technique] for clinical, therapeutic, research and investigative applications” and 100 See, e.g., Willis v. State, 57 P.3d 688 (Alaska App. 2002); Leonard v. State, 655 P.2d 766 (Alaska App. 1982). 101 Leonard, 655 P.2d at 770; see also Willis, 57 P.3d at 692. 102 Willis, 57 P.3d at 692; see also Leonard, 655 P.2d at 771. 103 760 P.2d 1030 (Alaska App. 1988). 104 Id. at 1031-32. -28- 7326 specifically excludes “those whose involvement with [the technique] is strictly limited to that of practitioner.”105 Applying that standard, the court of appeals concluded that there was “considerable controversy over the reliability of polygraphs as a scientific process,” and that “Haakanson ha[d] not established that there [was] a consensus among the experts regarding the reliability of the polygraph technique.”106 The court of appeals also expressed “concern[] about the disproportionate impact polygraph evidence may have on a jury.”107 Citing its previous concerns about polygraph testimony being “perceived by the jury as a complete answer to questions of credibility” and its potential to “lull the jury into a false sense of security,” the court of appeals held that “[a]ny evidence which has such great potential to mislead or prejudice the jury should be excluded unless its probative value clearly outweighs the prejudice.”108 The court of appeals found the “probative value of polygraph evidence [to be] insubstantial because the polygraph has not been proven reliable”; thus, the polygraph evidence in that case was inadmissible.109 2. Polygraph evidence under Daubert in other states Other jurisdictions that apply the Daubert test have also rejected evidence based on the CQT method. For example, in State v. Porter the Connecticut Supreme Court adopted Daubert as the relevant standard for scientific evidence and upheld its 105 Id. at 1034 (quoting Contreras v. State, 718 P.2d 129, 135 (Alaska 1986)). 106 Id. at 1035. 107 Id. 108 Id. (quoting Leonard v. State, 655 P.2d 766, 770 (Alaska App. 1982)). 109 Id. -29- 7326 traditional per se ban on admitting polygraph evidence.110 Jurisdictions that have adopted Daubert and maintain a per se exclusion of polygraph evidence include Idaho,111 West Virginia,112 Hawaii,113 Vermont,114 the District of Columbia,115 and the Court of Appeals for the Fourth Circuit.116 In United States v. Scheffer the Supreme Court held 110 State v. Porter, 698 A.2d 739, 742 (Conn. 1997). 111 State v. Perry, 81 P.3d 1230, 1235-36 (Idaho 2003) (concluding that polygraph evidence is “useful to bolster [the examinee’s] credibility but do[es] not provide the trier of fact with any additional information” and that it is inadmissible “because it does not assist the trier of fact to understand the evidence or to determine a fact in issue”). 112 State v. Beard, 461 S.E.2d 486, 492-493 (W. Va. 1995) (“Despite Appellant’s noteworthy efforts at trying to elevate the image of polygraph results, we remain convinced that the reliability of such examinations is still suspect and not generally accepted within the relevant scientific community. Therefore, any speculation that our position . . . regarding polygraph admissibility is in question due to the Daubert/Wilt rulings is put to rest today.” (emphasis in original) (footnote omitted)). 113 State v. Okumura, 894 P.2d 80, 94 (Haw. 1995) (reaffirming Hawaii’s per se exclusion of polygraph evidence), abrogated on other grounds by State v. Cabagbag, 277 P.3d 1027, 1038-39 (Haw. 2012). 114 Rathe Salvage, Inc. v. R. Brown & Sons, Inc., 46 A.3d 891, 897-901 (Vt. 2012) (affirming denial of Daubert hearing on polygraph reliability on grounds that even assuming polygraph evidence satisfies Daubert it is still inadmissible under Rule 403). 115 See Rowland v. United States, 840 A.2d 664, 673-74 (D.C. 2004) (citing Proctor v. United States, 728 A.2d 1246, 1249 (D.C. 1999) and Peyton v. United States, 709 A.2d 65, 65 (D.C. 1998)) (excluding polygraph testimony). The D.C. Court of Appeals only recently adopted Daubert, see Motorola Inc. v. Murray, 147 A.3d 751, 756-57 (D.C. 2016), and it does not appear to have since heard a case involving polygraph testimony. 116 See United States v. Prince-Oyibo, 320 F.3d 494, 501 (4th. Cir. 2003). In addition, the Sixth Circuit has held that, although it “has never adopted a per se (continued...) -30- 7326 that a per se rule excluding polygraph evidence does not infringe on the constitutional rights of an accused to present evidence in his defense;117 implied in the Court’s reasoning is the corollary conclusion that such a rule is also not inconsistent with Daubert.118 According to one treatise on scientific evidence, a majority of states still followed this “traditional rule” of excluding polygraph evidence as of 2012, when Alexander’s evidentiary hearing took place.119 The superior court in Alexander’s case surveyed polygraph admissibility in “all 50 states and the federal circuits” at the time of the hearing and found that “30 jurisdictions still have a per se ban, 17 admit polygraph results based upon stipulation, and 12 leave the decision to the trial court’s discretion on a case-by-case basis.” Of the jurisdictions that allow polygraph evidence based on the judge’s discretion, New Mexico is a notable example. Unlike the Alaska Evidence Rules, the New Mexico Rules of Evidence (NMRE) specifically address polygraph examinations. Under NMRE 11-707, the opinion of a polygraph examiner “as to the truthfulness of a person’s answers in a polygraph examination may be admitted” if a number of specific 116 (...continued) prohibition on the introduction of polygraph evidence,” it “generally disfavor[s] admitting the results of polygraph evidence” because “the results of a polygraph are inherently unreliable.” United States v. Thomas, 167 F.3d 299, 308 (6th Cir. 1999). Furthermore, the Sixth Circuit has “repeatedly held that ‘unilaterally obtained polygraph evidence is almost never admissible under Evidence Rule 403.’ ” Id. at 309 (quoting United States v. Sherlin, 67 F.3d 1208, 1216 (6th Cir. 1995), and citing Wolfel v. Holbrook, 823 F.2d 970, 973-75 (6th Cir. 1987); Barnier v. Szentmiklosi, 810 F.2d 594, 597 (6th Cir. 1987)). 117 523 U.S. 303, 317 (1998) 118 See id at 309-12. 119 See GIANNELLI, ET AL., supra note 4 § 804[b], at 465 & n.173. -31- 7326 criteria regarding the examiner’s qualifications and the test procedure are met.120 In Lee v. Martinez the New Mexico Supreme Court held that when the expert’s qualification and the examination meet this rule’s standards, “polygraph examination results are sufficiently reliable to be admitted” under the Daubert standard and NMRE 11-702 — New Mexico’s equivalent to Alaska Evidence Rule 702.121 However, the court also concluded that NMRE 11-707 only makes polygraph evidence admissible subject to the discretion of the trial judge’s balancing of probative value against unfair prejudice.122 3. The Daubert factors, applied Both the Supreme Court in Daubert and our court in Coon explained that the listed factors should not be seen as a determinative checklist, but that the standard is a flexible one.123 Because the Daubert factors are a good starting point, and the superior court started with them in Alexander, these factors will be discussed in turn here. i. Empirical testing The first relevant question is whether CQT polygraphy can be, and has been, empirically tested. The superior court in Alexander found that “the hypotheses underlying the polygraph can be and ha[ve] been tested repeatedly, including tests by both Drs. Raskin and Iacono.” In light of the record before us and the scientific literature available, this finding is at least partly erroneous. 120 N.M. R. Evid. 11-707 (2018). 121 96 P.3d 291, 293-94 (N.M. 2004). 122 Id. at 294. 123 Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 594-95 (1993) (“The inquiry envisioned by Rule 702 is, we emphasize, a flexible one . . . . The focus, of course, must be solely on principles and methodology, not on the conclusions that they generate.”); State v. Coon, 974 P.2d 386, 395 (Alaska 1999) (“The factors identified in Daubert provide a useful approach . . . . Other factors may apply in a given case.”). -32- 7326 It is true that Dr. Raskin and Dr. Iacono both testified about a number of studies — conducted by them and others — that have tested the practical application of CQT polygraphy. But one central criticism that Dr. Iacono’s testimony raised was the lack of studies testing the psychological hypotheses that serve as the underlying premise of polygraph testing. For a CQT polygraph test to yield reliable inferences about deception,124 it must be the case that (1) deception on relevant and comparison questions produce different psychological states; (2) these psychological states produce measurably different physiological responses; (3) these physiological responses include the ones that the polygraph instrument measures; (4) these physiological responses are unlikely to arise from causes other than deception; (5) the scoring system captures the physiological differences relevant to deception; and (6) examiners accurately assign conclusions of deception or honesty to certain score values when they interpret scores.125 Many of these assumptions and hypotheses appear not to have been tested; even more important, some may not be readily testable. In particular, CQT polygraph examinations are based on the theory that while a truthful person will respond more strongly to the comparison questions, a deceptive person will have a stronger reaction to the relevant questions. Dr. Iacono criticized this as an unfounded assumption, arguing for example that a truthful person might react strongly to the relevant questions due to the implications of a false accusation, while a guilty person outside of laboratory studies might have a reduced 124 This is the concept of criterion validity, or the degree to which an empirical measure actually “matches a phenomenon that the test is intended to capture.” NAT’L RESEARCH COUNCIL, supra note 6, at 31. 125 See id. at 67. -33- 7326 reaction to the relevant questions due to the phenomenon of habituation.126 On those grounds, Dr. Iacono concluded that “the CQT has . . . a weak theoretical foundation.” He testified that this underlying theory has not been properly tested, in part because laboratory studies cannot duplicate all of the considerations that might be relevant in the field — like habituation or a truthful examinee reacting to the relevant questions out of fear of being falsely accused — and in part because field studies have difficulties establishing the “ground truth” of whether an examined person was actually lying. Determining ground truth presents practical problems that are difficult, perhaps even impossible, to overcome, meaning that true accuracy rates may not be empirically verifiable. Dr. Iacono testified that many field studies focus on criminal cases and use confessions to determine ground truth, but noted that this is problematic because whether or not a defendant passes or fails a polygraph exam affects how likely he is to subsequently confess.127 126 The term “habituation” refers to a “decline in responsiveness to a stimulus due to repeated exposure.” Habituation, AMERICAN HERITAGE DICTIONARY (5th ed. 2014). In the context of a polygraph test administered to a criminal defendant, this phenomenon could influence the test results because the relevant questions on the test are directed at the same conduct the defendant has already been accused of and charged with: “[I]f the individual has discussed the crime at length or on numerous occasions, they may have become habituated to talking about the case and no arousal is detected.” Erin M. Oksol & William T. O’Donohue, A Critical Analysis of the Polygraph, in HANDBOOK OF FORENSIC PSYCHOLOGY 601, 621 (William O’Donohue & Eric Levensky eds., 2003); see also Lee v. Martinez, 96 P.3d 291, 318 (N.M. 2004). 127 Confessions may also be unreliable measures of ground truth for other reasons. The Innocence Project reports that of the more than 360 DNA exoneration cases in the United States, roughly 28% involved a false confession in the initial conviction. DNA Exonerations in the United States, INNOCENCE PROJECT (2017), https://www.innocenceproject.org/dna-exonerations-in-the-united-states/ (last visited Oct. 16, 2018). It is not possible to infer the overall rate of false confessions from this (continued...) -34- 7326 Several studies and surveys of polygraph research have reached similar conclusions. For example, a 2003 review of the scientific evidence on polygraphy by the National Research Council concluded that “[p]olygraph research has not developed and tested theories of the underlying factors that produce the observed responses.”128 Similarly, a more recent survey of academic literature concluded that “[i]t appears unlikely that the proponents of the CQT will be able to reconcile the theoretical flaws of their technique in the foreseeable future.”129 Although there have been numerous studies testing the practical applications of the comparison question technique, our review of the record and the available academic literature reveals no studies actually testing the underlying psychological theories. Ultimately, given the fact that certain assumptions of polygraph testing not only are untested, but may be functionally untestable, we conclude that this factor weighs decidedly against admitting polygraph testimony as scientific evidence. ii. Peer review The superior court in Alexander found that CQT polygraphy has been the subject of various publications, many of which were peer reviewed. This finding is amply supported by the record, and the State does not suggest otherwise. However, as the Supreme Court explained in Daubert, the mere fact of publication in a peer-reviewed journal is not itself probative of a technique’s validity; rather, peer review and “submission to the scrutiny of the scientific community” is relevant because “it increases 127 (...continued) data, but it is enough to raise questions about how accurately confessions establish ground truth. 128 NAT’L RESEARCH COUNCIL, supra note 6, at 2. 129 Synnott et al., supra note 36, at 76. -35- 7326 the likelihood that substantive flaws in the methodology will be detected.”130 As discussed above, the published studies on CQT testing have been subject to substantial scrutiny, and a vigorous debate has arisen about substantive flaws in the theoretical underpinnings of the technique. Notwithstanding this debate, which has been ongoing for decades,131 the practice of CQT polygraph testing does not appear to have developed in any significant way. Most of the studies cited by Dr. Raskin in support of the technique are from the 1980s and 1990s, with some dated as far back as the late 1970s; and although the superior court’s Daubert hearing was conducted in 2012, Dr. Raskin did not cite to any studies published more recently than 2003.132 Thus, although studies regarding CQT polygraphy have been published in peer-reviewed journals, it does not appear that this has resulted in the kind of refinement and development that makes publication and peer review relevant to a Daubert analysis. For this reason, although the superior court in Alexander did not clearly err in finding that polygraph testing has been the subject of publication and peer review, we give this finding little weight. iii. Acceptable error rate The superior court in Alexander found that the error rate of CQT polygraph testing is “sufficiently reliable” to be acceptable. The court reasoned that the studies cited by Dr. Raskin showed an accuracy rate of 89% to 98%, while those cited by Dr. Iacono had accuracy rates from 51% to 98%, with an average of 71%. Dr. Raskin 130 Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 594-95 (1993). 131 See United States v. Scheffer, 523 United States 303, 309-10 (1998) (citing sources debating the validity of CQT polygraphy dating to the late 1980s). 132 Again, 2003 was the year the National Research Council concluded that polygraph research had not developed or tested the psychological theories assumed to underlie the physical responses the polygraph measures. NAT’L RESEARCH COUNCIL, supra note 6, at ii, 2. -36- 7326 estimated that the overall accuracy rate of CQT polygraph testing was around 90%. The court recognized a number of concerns that might affect the accuracy rate of polygraph exams in practice, including the “friendly examiner” hypothesis and the possibility of examinees using countermeasures to “beat” the test. But the court concluded that these concerns “are already built in to the error rate” and are relevant to the weight the jury should assign to the testimony, not to admissibility. As a preliminary matter, the superior court appears to have misunderstood Dr. Iacono’s testimony. As discussed above, Dr. Iacono criticized each study he discussed, testifying that the accuracy rates reported in those studies were either invalid or not applicable to practical applications of the CQT technique in the field; he concluded that “it’s not possible to accurately estimate the error rate of the controlled question test when it’s used in real life applications.” The court’s conclusion that the various concerns discussed are “already built in to the error rate” has no support in the record: while individual studies may have tested specific variables such as countermeasures, neither expert cited any laboratory study that controlled for all of them. Dr. Iacono also testified that field studies on polygraph testing are unreliable and often “contain a bias of potentially serious magnitude toward overestimating the accuracy” of the test. A typical study, according to Dr. Iacono, would look at cases where the defendant took a polygraph test and later confessed; in such cases, the polygraph chart would be blindly rescored and then compared to the confession. But Dr. Iacono testified that failing a polygraph test often pressures a defendant into confessing, while passing the test substantially decreases the chance of a confession. As such, he explained, field studies are subject to a substantial selection bias: a case is most likely to end up in the study only if the defendant failed a polygraph test and subsequently confessed. When the study then rescores the polygraph chart, Dr. Iacono testified that it is not surprising the results exceed 90% accuracy. -37- 7326 In addition to potential flaws in the perceived accuracy rates of CQT tests, the empirical basis for polygraph examinations suffers from another fault: the lack of a reliable “base rate.”133 In the three cases currently before this court, each defendant was said to have passed his polygraph test; the relevant question for the factfinder is whether, given this fact, the defendant was likely truthful or whether the test was a false negative. To determine this likelihood, more information is required; specifically, information about the base rate of deceptive and truthful subjects. The lack of a reliable base rate estimate was the underlying reason for the Connecticut Supreme Court upholding its traditional per se ban on admitting polygraph evidence in State v. Porter.134 Noting “wide disagreement” about the accuracy rates for “a well run polygraph exam,” the court decided that, even if the estimates of polygraph proponents were accepted, the technique would still be “of questionable validity.”135 The court cited a field study by Dr. Raskin indicating a sensitivity of 87% and a specificity of 59%:136 “In other words, 13 percent of those who are in fact deceptive will be labeled 133 The “base rate” refers to the probability “of the target condition in the population or in the sample at hand — for security screening, this might refer to the proportion of spies or terrorists or potential spies or terrorists among those being screened.” NAT’L RESEARCH COUNCIL, supra note 6, at 46. A sample population of criminal suspects, for example, may have a higher base rate of deceivers than other sample populations. Id. at 47. 134 698 A.2d 739, 766-69 (Conn. 1997). 135 Id. at 764, 766. 136 “There are two distinct aspects to accuracy. One is sensitivity. A perfectly sensitive indicator of deception is one that shows positive whenever deception is in fact present: it is a test that gives a positive result for all the positive (deceptive) cases; that is, it produces no false negative results. The greater the proportion of deceptive examinees that appear as deceptive in the test, the more sensitive the test. Thus, a test (continued...) -38- 7326 as truthful . . . [and] 41 percent of subjects who are, in fact, truthful will be labeled as deceptive.”137 The court further reasoned that, even if a test is accurate, its probative value as scientific evidence depends on its “predictive value” — the likelihood “that a person really is lying given that the polygraph labels the subject as deceptive” and the likelihood “that a subject really is truthful given that the polygraph labels the subject as not deceptive.”138 This predictive value, the court explained, depends not only on the accuracy of the test but also “on the ‘base rate’ of deceptiveness among the people tested by the polygraph.”139 Because the Porter court found a “complete absence of reliable data on base rates,” it concluded that it had no possible way of assessing the test’s probative value.140 With that in mind, the court concluded that even if polygraph 136 (...continued) that shows negative when an examinee who is being deceptive uses certain countermeasures is not sensitive to deception. The other aspect of accuracy is specificity. An indicator that is perfectly specific to deception is one that always shows negative when deception is absent (is positive only when deception is present). It produces no false positive results. The greater the proportion of truthful examinees who appear truthful on the test, the more specific the test. Thus, a test that shows positive when a truthful examinee is highly anxious because of a fear of being falsely accused is not specific to deception because it also indicates fear.” NAT’L RESEARCH COUNCIL, supra note 6, at 38. 137 Porter, 698 A.2d at 766. 138 Id. 139 Id. at 766-67 (footnote omitted). 140 Id. at 768. As the Porter court described, “[t]he base rate is important because it can greatly accentuate the impact of the false positive and false negative rates arising from any given specificity and sensitivity values.” Id. at 767 n.53. For example, “[i]f one assumes base rates progressively higher than 50 percent, then, by definition, the number of deceptive examinees increases and the number of honest examinees (continued...) -39- 7326 evidence satisfies the Daubert standard, which it assumed without deciding, the probative value of such evidence is very low and substantially outweighed by its prejudicial effects.141 As in Porter, the record before us is devoid of reliable data about the base rate of deceptiveness among polygraph examinees outside of lab tests; we also have not found such data in academic literature. Absent some reliable estimate of this base rate there is no way to estimate the reliability of polygraph results, and thus no way to determine whether any particular accuracy rate is acceptable. We conclude that the superior court clearly erred in finding the error rate of CQT polygraph testing to be “sufficiently reliable.” Accordingly, this factor weighs against admitting polygraph evidence. iv. Standards for operation Under Daubert the court should consider “the existence and maintenance of standards controlling the technique’s operation.”142 The superior court in Alexander found “that although there is no single published protocol that all polygraphers must follow, that nonetheless there are published protocols and training criteria” that are sufficiently utilized so as to be considered standard. Additionally, the court found there was no indication that “Dr. Raskin did not properly administer the two exams.” Standards do control some aspects of polygraph testing and many states 140 (...continued) decreases.” Id. Thus, “even holding specificity and sensitivity rates constant, as the base rate increases the number of false negatives (the labeling of deceptive subjects as truthful) also rises and the number of false positives (the labeling of truthful subjects as deceptive) falls.” Id. 141 Id. at 768-69 142 Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 594-95 (1993). -40- 7326 also have statutes governing polygraph test administration, examinees’ privacy rights, and licensing of examiners.143 To describe the standards for administration of polygraphs, Dr. Raskin pointed to New Mexico Evidence Rule 11-707 as providing “clear standards for tests to be offered as evidence” and described the rule as “a superior model for national standards.” He also referenced standards adopted by national polygraph organizations and standards imposed by government agencies. Rule 11-707 provides that a polygraph examiner’s opinion testimony is admissible if the examiner is qualified, the scoring method used is “generally accepted as reliable by polygraph experts,” the examiner was informed of relevant information regarding the examinee prior to the exam, two or more relevant questions were asked, three or more charts were taken, and the exam was recorded.144 However, what constitutes a “generally accepted” scoring method is not further defined. A “relevant question” is simply defined as “a clear and concise question which refers to specific objective facts directly related to the purpose of the examination and does not allow rationalization in the answer.”145 Even if we were to conclude that these standards are sufficient to “control[] the technique’s operation,”146 Rule 11-707 is not a national standard. As both the court in Alexander and Dr. Raskin acknowledged, there is no one “controlling” industry standard and there may be great differences in “generally accepted principles.” 143 See, e.g., La. Stat. Ann. §§ 37:2831-2854 (2018); Me. Rev. Stat. tit. 32, §§ 7351-7390 (2018); Nev. Rev. Stat. Ann. §§ 648.183-.199 (West 2017); Or. Rev. Stat. Ann. §§ 703.010-.310 (West 2018); Vt. Stat. Ann. tit. 26, §§ 2901-2910 (2018). 144 N.M. R. Evid. 11-707(C). 145 Id. 11-707(A)(4). 146 Daubert, 509 U.S. at 594. -41- 7326 It is clear that some aspects of the test lack standards, or at least consistent standards. Specifically, the formulation and ordering of questions,147 the conducting of the pretest interview,148 the choice of scoring system,149 and the evaluation of the examinee’s demeanor150 leave much to the examiner’s discretion. While the superior court’s finding regarding CQT protocols was not clearly erroneous, we conclude that the lack of clear controlling standards for CQT administration weighs against its admissibility. v. General acceptance The superior court found that the record is “inconclusive as to whether there is general acceptance within the relevant scientific community.” The State argues that CQT polygraphy has not gained general acceptance, while the defendants appear to argue primarily that “inconclusiveness on this factor goes to the weight and not the admissibility of the evidence.” 147 See Synnott et al., supra note 36, at 68 (“The number of total questions asked, the order in which . . . questions are placed and whether any or all questions are repeated . . . [depend] on the situation, examiner’s preference and the school the examiner subscribes to.”). 148 Id. at 67 (“[D]epending on the situation, examiner’s personal preferences and the ‘polygraph school’ the examiner subscribes to, . . . [much of] the pre-test interview can vary greatly . . . . [and it] can last anywhere between 30 min and 2 h . . . .”). 149 Id. at 68 (describing examiner discretion to set cut-off points for numerical scoring systems and outlining several types of computerized scoring systems). 150 See NAT’L RESEARCH COUNCIL, supra note 6, at 16 (“[T]he polygraph examiner is likely to form impressions of the examinee’s truthfulness, based on the examinee’s demeanor . . . . These impressions are likely to affect the conduct and interpretation of the examination and might, therefore, influence the outcome and the validity of the polygraph examination.”). -42- 7326 Both Dr. Raskin and Dr. Iacono testified about a variety of surveys regarding the acceptance of polygraphy. Dr. Iacono also testified about a number of scientific publications that conclude polygraph examinations are unreliable. Based on a review of this evidence and literature, it appears that the parts of the scientific community who regularly utilize polygraphy have — perhaps unsurprisingly — widely accepted the technique, while the broader scientific community views the technique more skeptically.151 In light of this record and the scientific literature, the superior court’s finding that it is “inconclusive” whether polygraphy is generally accepted is not clearly erroneous. But as the Supreme Court noted in Daubert, “ ‘a known technique which has been able to attract only minimal support within the community’ may properly be viewed with skepticism.”152 The Supreme Court’s comment appears particularly apt in this case. Given the decades-long debate over the validity of polygraph evidence, the apparent lack of development in the technique as a response to that debate, and the apparently lackluster support for the technique outside the community of practicing polygraph examiners, we conclude that this factor also weighs against admitting polygraph evidence. vi. Other relevant factors As noted above, both Daubert and Coon recognize that other factors than 151 We note that under Contreras v. State, 718 P.2d 129, 135 (Alaska 1986), the “relevant scientific community” for a general acceptance analysis excludes “those whose involvement with [the technique] is strictly limited to that of practitioner.” This would not exclude those who, like Dr. Raskin, both conduct research into polygraph testing and administer polygraph examinations. But it would exclude those who do only the latter. 152 Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 594 (1993) (quoting United States v. Downing, 753 F.2d 1224, 1238 (3d Cir. 1985)). -43- 7326 those discussed above may be relevant in some cases. For example, Coon briefly mentions the possibility of “ ‘independent’ research funded by tobacco companies” carrying with it “the danger of a hidden litigation motive.”153 This is a relevant consideration in this case. Dr. Raskin, who testified at the Daubert hearing in favor of admitting polygraph evidence, is himself a practicing polygraph examiner and has financial ties to one manufacturer of polygraphs, earning royalties from the sale of polygraph machines he invented. Many of the studies cited as approving polygraph testing as scientifically valid were performed by Dr. Raskin or by other practicing examiners, and a number of the studies were published in polygraph industry publications. While we do not entirely discount this research and have examined it on its merits, we recognize that the polygraph industry has an obvious financial interest in confirming polygraph testing as valid and promoting its use and admissibility in court. vii. Conclusion In light of each of the factors discussed above, we conclude that on the evidence before us, CQT polygraph testing has not been shown to satisfy the standard for scientific evidence set forth in Daubert and Coon. We reiterate what we said in Pulakis: “polygraph proponents have not yet developed persuasive data demonstrating its reliability.”154 Absent such data, we are unconvinced that the opinion of polygraph examiners amounts to “scientific, technical, or other specialized knowledge” that “will assist the trier of fact to understand the evidence or to determine a fact in issue,” as required under Evidence Rule 702. Our opinion here does not mean that CQT polygraph testing will never be sufficiently reliable to pass muster as scientific evidence, but absent substantial evidence demonstrating that CQT polygraph testing produces reliable results 153 State v. Coon, 974 P.2d 386, 395 (Alaska 1999). 154 Pulakis v. State, 476 P.2d 474, 479 (Alaska 1970). -44- 7326 based on sound, verifiable science, the results of CQT polygraph examinations cannot be admitted in evidence over objection. V. CONCLUSION We REVERSE the judgment of the court of appeals affirming the superior court’s order admitting Alexander’s polygraph evidence. We REVERSE the superior court’s order admitting Sharpe’s polygraph evidence. We AFFIRM the superior court’s order excluding Holt’s polygraph evidence. We REMAND Alexander’s and Sharpe’s cases to the superior court for further proceedings consistent with this opinion relating to their respective criminal charges. We also REMAND Holt’s case to the court of appeals for further proceedings as appropriate on Holt’s remaining points of appeal. We do not retain jurisdiction. -45- 7326