This interlocutory appeal, in which the Commonwealth is the appellant, arises from three murder proceedings in the Superior Court: the defendant Vao Sok is charged with the kidnapping, rape, and murder of Anmorian Or; the defendant Henry Juan Williams is charged with the murder and armed assault of Zachariah Johnson; and the defendants Herdius Evans and James Ware are jointly charged with the armed robbery and murder of Allan Lawrence Hill (and with other crimes). The Commonwealth sought to introduce in evidence against each defendant deoxyribonucleic acid (DNA) test results derived from the polymerase chain reaction (PCR) method. Citing Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), the defendant Evans filed a “motion in limine to exclude evidence of DNA testing or in the alternative to conduct a Daubert hearing,” which the other defendants joined. At the request of the Regional Administrative Justice, a judge in the Superior Court conducted an evidentiary hearing on the defendants’ consolidated claims. The judge subsequently entered a memorandum and order in which he ruled that the results of the DQa PCR-based DNA testing could be admitted in evidence against each defendant, but that the results of the Polymarker (PM) PCR-based DNA testing and the D1S80 PCR-based DNA testing should be excluded in each case.2 The Commonwealth filed a petition for relief under G. L. c. 211, § 3, challenging the judge’s decision with respect to the PM and D1S80 test results. The petition was considered by a single justice, who reserved and reported the cases along with three questions noted below.3 Since the entire matter is before us, we can resolve the issues without the necessity of framing responses to the reported *789questions. We conclude that the PCR-based tests at issue in this case meet the test of scientific reliability under Commonwealth v. Lanigan, 419 Mass. 15, 24, 26 (1994), and that there must be further proceedings in the Superior Court to decide whether the PM and D1S80 test results in these cases should be admitted in evidence.
1. Background, (a) The genetic and molecular basis of DNA typing is described in the appendix to our decision in Commonwealth v. Curnin, 409 Mass. 218, 227-231 (1991). There we pointed out that forensic DNA testing is directed at the examination and comparison of the characteristics of several “highly polymorphic alleles.” The Cumin case described the meaning of the scientific term “highly polymorphic alleles” as follows:
“A single DNA molecule contains approximately three billion rungs, or base pairs. Certain types of human genes . . . can occur in alternate forms (that is, with differing sequences of base pairs), each of which is capable of occupying a gene’s position on the DNA ladder. These alternate forms of genes are called ‘alleles,’ and are highly variable from one person to another. Alleles of a particular gene contain a different number of base pairs, and therefore are of different lengths.
“Most of the sequences of base pairs in all human DNA molecules are identical. However, roughly three million base pairs are alleles that vary in sequence among humans. The areas on the DNA ladder in which the DNA sequence varies are called ‘polymorphic sites.’ Some such sites are more polymorphic than others. Forensic DNA testing makes use of sites which are ‘highly polymorphic.’ ” Curnin, supra at 228.4
*790The variations or polymorphisms between alleles at a particular location may occur either in the particular sequence of the base pairs at a particular locus or in the length of a DNA fragment between two defined endpoints. It is the ability to detect and compare the alleles at a particular locus in one sample of DNA with the alleles at that same locus in another sample of DNA that forms the basis for the use of the technology in a forensic setting.
There are different methods used in DNA forensic testing including Restriction Fragment Length Polymorphism (RFLP) and PCR amplification and allele identification. RFLP analysis requires a larger segment of DNA than does PCR analysis and involves a time-consuming testing process. RFLP targets loci on DNA molecules that are known to have different lengths because of variations in the number of times that a sequence of base pairs is repeated. These loci are referred to as variable number tandem repeats (VNTRs). RFLP uses the seven-stage process described in Commonwealth v. Curnin, supra at 228-230, to examine and compare the length of various alleles and then engages in a series of calculations to determine how often those combinations of alleles occur in a given population. The RFLP process was not used in the cases before us.
PCR-based testing is an alternative method of analyzing forensic DNA samples that compares polymorphic DNA sequences through “allele-specific probe analysis,” a very different process from RFLP, which looks at the length of the DNA sequence. The goal of a PCR-based approach is to determine whether certain alleles are actually present or absent in a sample. Two DNA samples taken from the same individual will contain the same alleles. Samples from different individuals are apt to contain different alleles. Thus, finding the same alleles in two different samples supports the conclusion that the samples have a common source. As with RFLP, if a match is identified, calculations are performed to determine how often such a match is likely to occur in the population. Where an individual’s sample matches a provided sample of DNA, the individual cannot definitely be excluded as the possible source *791of the provided sample of DNA, but where key alleles do not match, the individual is excluded as the source.
PCR-based analysis involves the making of millions of copies of particular short segments of DNA in an amplification process, similar to the mechanism by which DNA normally replicates itself.5 After the segments are replicated, different genetic marker typing tests are performed, depending on the particular polymorphic locus being probed. The tests in the cases before us involved three polymorphic loci: the DQA1; the polymarker loci6; and the D1S80 (each locus was not examined for each defendant). Different typing kits were used to amplify and detect the genetic markers.7 After the genetic markers have been identified, the DNA profile is compared to another profile from a known source. If an appropriate correlation appears, a statistical analysis is performed, based on population databases, and the probability of a random match for a particular sequence is estimated from the frequency with which that sequence appears in the relevant databases.
PCR-based testing is extremely valuable for forensic science. It permits DNA profiling of samples containing much smaller quantities of DNA — such as saliva on a cigarette butt — than can be tested by the RFLP method, and test results are available *792promptly, often within twenty-four hours. “[M]ost PCR tests permit exact identification of each allele at a particular locus, eliminating the measurement imprecision of RFLP” and its accompanying statistical analysis. United States v. Lowe, 954 F. Supp. 401, 409 (D. Mass. 1996). “Indeed, defendants in criminal cases have been known to be as interested in securing [the] use [of PCR analysis] in this context as prosecutors.”8 Id., and cases cited.
(b) All the DNA testing in the cases before us was done at CBR Laboratories in Boston (CBR) in 1994 and 1995. CBR is a wholly owned subsidiary of the Center for Blood Research, a non-profit research institution affiliated with Harvard Medical School. The testing was performed for the State police or Boston police department that had investigated the crimes by Dr. David H. Bing, the director of CBR, and two of his assistants. PCR amplification and typing were performed, as deemed appropriate in each case, by means of three available scientific DNA genetic marker kits: the Amplitype HLA DQa Amplification and Typing Kit9; the AmpliType PM PCR Amplification and Typing Kit; and the AmpliFLP D1S80 PCR Amplification Kit. These kits are marketed by Perkin-Elmer Corporation, Inc., under licenses from Roche Molecular Systems, which holds the patents on the kits. In his memorandum, the judge described in considerable detail the steps taken in the use of the kits, and explicitly set forth the testing process in each case, the results of the tests, and Dr. Bing’s conclusions, as reported to the investigating authorities.
We need not summarize the contents of the reports beyond noting the following. The Vao Sok and Williams cases involved test results from the DQA1, PM, and D1S80 loci, and the Evans and Ware cases involved results from the DQA1 and PM loci. Each report stated conclusions as to whether the particular defendant could or could not be excluded as the donor of sample DNA that had been extracted from the different items provided to CBR by the police. Where relevant, the reports also contained *793an estimation of population frequencies. It is important to point out here that the evidentiary value of PCR-based tests lies in their combination with one another. The likelihood of a coincidental match between different samples of DNA is relatively high if only DQA1 testing is done. When that testing is combined with PM testing, however, the joint power of discrimination of the two tests is over 99.9%; that is, over 99.9% of the time, two randomly-chosen individuals will have a different combination of DQa and PM types. When D1S80 testing is used in conjunction with DQA1 and PM testing, the power of discrimination increases to 99.99%. Thus, testing at multiple genetic loci increases the likelihood of detecting sampling errors or contamination, and the PM and D1S80 tests provide an important tool in cross-validation of DQA1 testing to ensure accuracy in the exclusion of an individual based on a comparison of different DNA samples.
2. The judge’s decision. We turn now to the judge’s decision. A considerable portion of his memorandum is devoted to the following matters: a description of the status of forensic application of DNA technology at the time of the evidentiary hearing, which concluded on October 20, 1995; a summary of five decisions of this court involving DNA testing; a description of the PCR method, its use of allele-specific probe analysis, and the processes in the three kits identified above; the background of CBR and Dr. Bing; the professional background of the other witnesses who testified10; the test results and conclusions reached by Dr. Bing in the cases before us and in the McNickles case (which is not before us, see note 2, supra); and a summary of appellate decisions from seventeen other States discussing the PCR method and, but for one decision, testing at DQA1.
The judge saw the problem before him as calling, at the outset, for a determination of the “reliability of [the PCR-based] methodology and [the DQA1, PM, and D1S80 test kits] in general, and specifically their reliability when employed in a forensic setting.” The judge recognized that resolution of the *794problem required application of the principles set forth in Commonwealth v. Lanigan, 419 Mass. 15, 25-26 (1994), with the overarching issue being the scientific “reliability of the process underlying the expert testimony.” Id. at 27.
The judge described twenty “Problems with the PCR Based Analysis in General” that might inhere in any PCR analysis. We set out these concerns in the margin.11 With respect to these concerns, the judge stated that, “[ajlthough there were substantial differences of opinion as to the degree of seriousness of the various problems among the experts who testified, in most instances the experts on both sides seemed to agree that the problems do exist, at least potentially.”
*795We summarize the judge’s conclusions (expressing his reasoning with respect to the merits and his rulings of law) as follows:
(1) The PCR method is a scientifically valid and reliable method of amplifying samples of DNA, and both allele-specific probe analysis and fragment length polymorphism analysis are scientifically valid methods of comparing DNA samples if the primers and probes have been accurately synthesized and prescribed procedures are followed.
(2) The three kits used by CBR are “undoubtedly . . . designed and intended to produce reliable results.” The kits will provide such results only if they are used in accordance with the protocols established by their manufacturer and are processed by knowledgeable and well-trained analysts with an adequate background in forensic DNA analysis.
(3) There is a considerable amount of subjective judgment involved in the interpretation of the results of PCR-based tests, and experience is necessary to interpret the data.
(4) At the time of the hearing, there was no provision for mandatory accreditation of laboratories doing DNA testing or their personnel.12 There is a recognized need for a workable quality assurance program that should be supported by blind proficiency testing.13 The lack of such controls, however, should not result in a moratorium on the forensic use of DNA identification methodology, and forensic scientists should not be precluded from using new techniques, provided the reliability of the techniques is established.
(5) With the exception of the testing done in the McNicldes case, the testing done at CBR by Dr. Bing and his assistants ap*796pears to have been done in accordance with established protocols. No serious deviation from protocols was raised as to the testing in any of the cases except the McNickles case.
(6) The DQA1 technique and the Amplitype HLA DQa Amplification and Typing Kit are reliable, and the DQa results obtained in each of the cases may be admitted in evidence. The judge based this ruling on scientific articles he had examined, the testimony at the hearing, and the “rather overwhelming support that has been given to the [DQa] technique by the appellate courts of other jurisdictions.”
(7) The Commonwealth did not sustain its burden of proof with regard to the reliability of the PM and D1S80 test kits, and the results of tests utilizing those kits are therefore excluded.
3. Standards of review. The judge’s conclusions are to be examined under the standard governing an offer of scientific evidence and that applicable to review of a trial court decision on the admissibility of such evidence. With respect to scientific evidence, we have recently stated the governing principles in Commonwealth v. Sands, 424 Mass. 184, 185-186 (1997), as follows:
“In Commonwealth v. Fatalo, 346 Mass. 266, 269 (1963), we adopted the ‘general acceptance’ test of Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923), which required that courts consider ‘whether the community of scientists involved generally accepts the theory or procéss’ underlying the evidence to be introduced. Subsequently, in Commonwealth v. Lanigan, 419 Mass. 15, 26 (1994), we also adopted, in part, the reasoning of Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), and held that ‘a proponent of scientific opinion evidence may demonstrate the reliability or validity of the underlying scientific theory or process by some other means, that is, without establishing general acceptance,’ because the touchstone of admissibility is reliability, and not necessarily general acceptance within the scientific community. Lanigan, supra at 24, 26, and cases cited. We also stated that, ‘[w]e suspect that general acceptance in the relevant scientific community will continue to be the significant, and often the only, issue.’ Id. at 26. Thus, a party seeking to introduce scientific evidence may lay a foundation either by showing that the underlying scientific theory is gener*797ally accepted within the relevant scientific community, or by showing that the theory is reliable or valid through other means. See id.” (Footnote omitted.)
In considering the issue of scientific validity, our review is de novo because a trial judge’s conclusion will have applicability beyond the facts of the case before him. “ ‘The question of general acceptance of a scientific technique, while referring to only one of the criteria for admissibility of expert testimony, in another sense transcends that particular inquiry, for, in attempting to establish such general acceptance for purposes of the case at hand, the proponent will also be asking the court to establish the law of the jurisdiction for future cases.’ Jones v. United States, 548 A.2d 35, 40 (D.C. App. 1988). Application of less than a de novo standard of review to an issue which transcends individual cases invariably leads to inconsistent treatment of similarly situated claims.” Brim v. State, 695 So. 2d 268, 274 (Fla. 1997), citing People v. Miller, 173 Ill. 2d 167, 203 (1996) (McMorrow, J., specially concurring), cert, denied, 117 S. Ct. 1338 (1997). The question of the validity of a particular scientific methodology is thus entitled to the same standard of review as a conclusion of law. See In re Paoli R.R. Yard PCB Litig., 35 F.3d 717, 749 (3d Cir. 1994), cert, denied sub nom. General Elec. Co. v. Ingram, 513 U.S. 1190 (1995) (“evaluating the reliability of scientific methodologies and data does not generally involve assessing the truthfulness of the expert witnesses and thus is often not significantly more difficult on a cold record”).14
Once the issue of scientific validity is resolved, the question *798of admissibility turns to traditional evidentiary inquiries. Here we are primarily concerned with whether the particular test kits were reliable when utilized by CBR. The burden is on the proponent of test results to show that the testifying expert properly performed a scientifically valid methodology in arriving at his opinion. See Commonwealth v. Beausoleil, 397 Mass. 206, 220-221 (1986) (trial judge must make threshold inquiry concerning reliability of testing procedures before determining admissibility); Commonwealth v. Whynaught, 377 Mass. 14, 19 (1979) (determination of accuracy of testing left to trial judge’s discretion). See also United States v. Martinez, 3 F.3d 1191, 1198 (8th Cir. 1993) (error in application of valid methodology can provide basis for challenging reliability of principle itself). This entails a fact-based inquiry, which is appropriately resolved by the trial judge on the facts before him, and his decision is accorded considerable deference. See Commonwealth v. Ianello, 401 Mass. 197, 200 (1987); Commonwealth v. Pikul, 400 Mass. 550, 553 (1987), and cases cited; Griffin v. General Motors Corp., 380 Mass. 362, 365-366 (1980). Similarly, while the truthfulness of an expert witness is not a concern when addressing the validity of scientific methodology, where a particular laboratory is attempting to justify its application of that methodology, the credibility of a witness before the trial judge may have bearing on his determination that the results of tests as performed are not sufficiently reliable to be put before the jury. See Ward v. Commonwealth, 407 Mass. 434, 438-439 & n.2 (1990). This, too, is a fact-based inquiry and is entitled to deference. See Commonwealth v. Pikul, supra.
4. Application of the standards of review. The conclusions in the judge’s decision set forth above are directed in large part to *799the issue of the scientific validity of PCR-based testing. In examining these conclusions de novo, we keep in mind that forensic DNA typing “is a rapidly developing field, and new understanding may be expected as more studies and tests are conducted.” Lanigan, supra at 27. Thus, consideration of the scientific validity of the DNA tests in issue is not frozen at the date almost two years ago when the hearing was held. Rather, our determination must be made in light of current scientific knowledge.
DNA profiling evidence from RFLP analysis has been shown to be a scientifically valid methodology. See Commonwealth v. Lanigan, supra; Commonwealth v. Daggett, 416 Mass. 347, 350 n.1 (1993). Similarly, DNA profiling evidence by PCR-based analysis, both in general and specifically at the DQA1 locus, has been demonstrated to be a valid technique, as noted both in scientific research and in a large number of appellate cases.15 Thus, although not directly in issue in this appeal, the judge’s conclusions validating both the RFLP and the PCR-based methodologies are clearly correct.
We next confront the issue of the validity of PCR-based testing at the PM and D1S80 loci. We do not consider the problems identified in the judge’s memorandum, see note 11, supra, as *800presenting any insurmountable obstacles to the admissibility of PCR-based testing at these loci as a valid scientific methodology. The judge characterized these problems as “general” in nature and related to PCR methodology as an over-all theory, but even though some of the problems described by the judge could affect testing at the DQA1 locus, he ruled that testing at this locus was reliable. In addition, several of the problems are, the judge said, either infrequent or rare. Further, the judge concluded that the PCR-based tests at all three loci are designed to be reliable, so long as they are used in accordance with established protocols.
The judge’s ruling with respect to the PM and D1S80 test kits was based, at least in part, on his determination that the underlying methodology for these tests had not yet reached a stage of acceptance sufficient to support a ruling of scientific validity. In so deciding, the judge considered the following: (a) two Commonwealth expert witnesses were naturally, and perhaps understandably, biased in favor of the admissibility of the PCR-based tests16; (b) other Commonwealth experts were not nearly as enthusiastic in their endorsement of the PM and D1S80 techniques as they were of the DQa technique17; (c) the PM technique has a weakness not present in the DQa technique because the PM test must use a compromise temperature to test the multiple genetic markers; and (d) at the time of the evidentiary hearing, the PM technique had not been reviewed by any appellate court, and the D1S80 technique had received only cursory review by the Court of Appeals of Georgia in Redding v. State, 219 Ga. App. 182 (1995).
We conclude, based on events occurring since the judge *801concluded the hearing, that testing at the PM and D1S80 loci has been scientifically validated. An authoritative work in this field by the National Research Council, The Evaluation of Forensic DNA Evidence (1996) (1996 NRC Report), has found that PM testing is “beginning to be widely used” and has been validated with tests for “robustness” (reliability) by various studies. Id. at 72. The Report goes on to find that the “value [of the D1S80 technique] for forensic analysis has been validated in a number of tests.” Id. at 72, 117. The Report concludes that the “technology [for DNA profiling] and the methods for estimating frequencies and related statistics have progressed to the point where the admissibility of properly collected and analyzed DNA data should not be in doubt.” Id. at 73. In the Lanigan decision, supra, we treated the preceding NRC Report published in 1992 as persuasive, and we do the same with respect to the current study.
In addition to the 1996 NRC Report, other validation studies have been issued that demonstrate the reliability of PM and D1S80 testing. See, e.g., Gross, HLA DQA1 and Polymarker Validations for Forensic Casework: Standard Specimens, Reproducibility, and Mixed Specimens, 41 J. Forensic Sci. 1022 (1996) (concluding that DQa and PM tests are suitable for forensic casework); Walkinshaw, DNA Profiling in Two Alaskan Native Populations Using HLA-DQA1, PM, and D1S80 Loci, 41 J. Forensic Sci. 478 (1996) (validating statistical independence of DQa, PM, and D1S80 sites); Word, Summary of Validation Studies from Twenty-Six Laboratories in the United States and Canada on the Use of the AmpliType PM PCR Amplification and Typing Kit, 42 J. Forensic Sci. 39 (1997) (concluding that PM test meets quality assurance guidelines and is generally accepted in the scientific community for forensic use).
We have not been referred to any published scientific study or article that questions the reliability of PCR-based DNA analysis or its general acceptance in the relevant community of forensic scientists. It appears that the Federal Bureau of Investigation (FBI) now uses PM and D1S80 testing, considers it reliable, and has persuaded a judge of the United States District Court for the District of Massachusetts to admit PM and D1S80 test results in evidence. See United States v. Lowe, *802954 F. Supp. 401, 418 (D. Mass. 1997).18 Indeed, Dr. Budowle, the expert from the FBI who testified in this case and who had some hesitancy about PM and D1S80 testing in their early stages, was involved in studies that have now validated PM and D1S80 testing for forensic use. See 1996 NRC Report, supra at 72.
Further, the courts that have considered PM and D1S80 testing to date have found both to be scientifically reliable when properly done.19 Such cases probably represent the first wave of what undoubtedly will be many more decisions accepting PM and D1S80 testing for consideration by triers of fact.20 Thus we are satisfied that PCR-based testing at the PM and D1S80 loci is a scientifically valid means of comparing DNA profiles and *803that such testing is admissible as evidence when the tests are properly conducted.21
We now reach the most troubling aspect of the judge’s decision. As we have discussed, the judge’s conclusions appear to rest to a great extent on the issue of the scientific validity of PCR-based PM and D1S80 testing, a point we have now resolved in favor of the Commonwealth. There is another part of the judge’s decision, however, where he appears to find that the results of the PM and D1S80 tests, as they were performed by CBR, were not sufficiently trustworthy to be admitted. The judge took a view of CBR, and based on the evidence before him, he found that (i) CBR’s experience with the PM and D1S80 testing kits was substantially less than their experience with *804testing at the DQa locus22; (ii) CBR was in the process of completely revamping its protocols for test interpretations, implying, but not explicitly finding, that at least for those tests with which the laboratory had scant experience, results might have been improperly tabulated; (iii) no written policies or protocols existed at CBR as to how such interpretations were to be made and no witness was able to articulate any such policies or protocols “to [the judge’s] satisfaction”; (iv) at the time that the PM and D1S80 tests were run on the defendants’ DNA samples, neither the Amplitype PM PCR Amplification and Typing Kit nor the AmpliFLP D1S80 PCR Amplification Kit contained user’s guides, as contrasted with the kit used to test at DQA1. These findings suggest that, because the judge did not have confidence in the PM and D1S80 testing done at CBR in these cases, he decided, as a matter of fact, that the results should not be admitted.
The findings concerning the reliability of the test kits, however, were not referred to by the judge in his final conclusions. The Commonwealth points out that, in those final conclusions, the judge stated that the test kits were designed to produce reliable results if properly used, that CBR performed the tests in accordance with established protocols, and that no serious deviation from the protocols was raised as to the testing in any of the cases except the McNickles case. In the McNickles case, the judge was careful to detail the deficiencies in the actual testing. Further, the judge’s reference to CBR’s interpretation of test results in two of his findings has no impact on the question of admissibility, but rather is a matter of the weight to be given to the results, if they were otherwise properly determined.23 The judge’s ultimate conclusion, referred to above, was that the *805Commonwealth had not met its burden of proof “with regard to the reliability of the [PM] and D1S80 kits.” This conclusion follows other conclusions and discussion by the judge directed more at the validity of the PM and D1S80 techniques than at CBR’s actual performance of the tests. The nagging impression left by the judge’s conclusions, when considered in context, is that he may have excluded the PM and D1S80 results because the techniques were simply “too new” to be trusted. There is no way of knowing what the judge might have done had he had access to the plethora of material before this court in support of the scientific validity of the PM and D1S80 techniques.
As might be expected, the internal inconsistency in the judge’s decision has confused the parties, with the Commonwealth, on the one hand, arguing that the judge’s decision is based solely on scientific validity, and the defendants contending, on the other hand, that the decision is one rooted in fact and credibility. In fairness to both sides, we think it necessary for the concerns outlined by the judge to be reexamined in further proceedings in the Superior Court.24 The questions will be whether, notwithstanding the lack of user guides for the PM and D1S80 test kits and CBR’s lack of experience in these types of testing at the time it conducted the tests, CBR in fact properly utilized the kits to achieve reliable results,25 and whether the methods used by CBR to interpret the results, in *806the opinion of those who perform and interpret these tests, were sufficient to allow admission of CBR’s interpretations in evidence, subject to questions as to their weight. As the proponent of the evidence, the Commonwealth will have the initial burden of furnishing the additional evidence it thinks is required to resolve the ambiguity, with the defendants, of course, having the opportunity to challenge the Commonwealth’s evidence through cross-examination and other evidence of their own.26 After the further proceedings, the judge will make appropriate further findings and rulings.27
5. Additional considerations. With respect to any of the test results that may actually be introduced at trial, the Commonwealth asks us to decide whether the so-called ceiling principle must be factored into the calculations of allele frequencies in PCR-based testing. See Commonwealth v. Lanigan, 419 Mass. 15, 26-27 (1994) (discussing the ceiling principle and allowing it to be used in calculating the statistical significance of RFLP test results). The question did not come up directly in these cases, but it is in issue in Commonwealth v. Rosier, post 807, 815-817 (1997), and reference may be had to that decision for guidance.
6. Disposition. The case is remanded to the county court for the entry of a judgment consistent with this opinion.
So ordered.
The judge also ordered the exclusion of the results of PCR-based DNA testing performed in the case of another defendant, Robert McNickles, who was charged with rape, murder, and other crimes. The McNickles case is not before us. The Commonwealth accepted the judge’s order and, apparently, has had further PCR-based DNA testing in that case performed by a different laboratory.
“Whether DNA evidence obtained through PCR genetic testing, including through the DQa, Polymarker, and D1S80 tests is admissible under Commonwealth v. Lanigan, 419 Mass. 15 (1994).
“Whether the basis for the judge’s exclusion of the Polymarker and D1S80 tests was based on a finding that these tests were not reliable in principle or was based on a finding that these particular tests in this particular lab were *789unreliable because of the way in which these tests were performed and problems at the lab in question.
“If the tests were excluded because of problems in the testing at the lab in question and the Full Court decides they are reliable in principle, whether the tests should be admitted into evidence and the question as to the weight given the tests due to problems at the lab should be left to the jury.”
Some additional definitions of terms used in this opinion are provided to assist in the understanding of the technology involved. The position that a gene or other DNA fragment occupies on the DNA molecule is called a locus. Each person has two genes, each with possibly a multiple of alleles, at each *790locus. An individual’s genotype is Ms or her gene makeup. The genotype for a group of loci that have been analyzed is the individual’s DNA profile. A genetic marker is “an easily detected gene or cMomosome region used for identification.” See National Research Council, The Evaluation of Forensic DNA Evidence 13, 217 (1996) (1996 NRC Report).
This amplification process consists of three steps as follows:
“First, each double-stranded segment is separated into two strands by heating. Second, these single-stranded segments are hybridized with primers, short DNA segments (20-30 nucleotides in length) that complement and define the target sequence to be amplified. Third, in the presence of the enzyme DNA polymerase, and the four nucleotide building blocks (A, C, G, and T), each primer serves as the starting point for the replication of the target sequence. A copy of the complement of each of the separated strands is made, so that there are two double-stranded DNA segments. This three-step cycle is repeated, usually 20-35 times. The two strands produce four copies; the four, eight copies; and so on until the number of copies of. the original DNA is enormous.”
1996 NRC Report, supra at 69-70.
The polymarker loci consist of five different loci: LDLR (low-densitylipoprotein receptor); GYPA (glycophorin A); HBGG (hemoglobin gamma globin); D758 (an anonymous genetic marker on chromosome seven); and GC (group-specific component).
The DQA1 and polymarker loci have discrete alleles which are detected by sequence-specific probes, while the D1S80 has discrete allelic products, which can be separated by electrophoresis on sequencing gels (a process using semisolid mediums to separate molecules by their rate of movement in an electric field). The resulting allelic products are then visualized directly.
PCR-based DNA analysis has been used to clear persons suspected of a crime and to exonerate defendants who have been wrongfully convicted. See generally E. Connors, T. Lundregan, N. Miller, & T. McEwen, Convicted by Juries, Exonerated by Science: Case Studies in the Use of DNA Evidence to Establish Innocence After Trial (U.S. Dep’t of Justice 1996) (study of twenty-eight cases in which PCR-based DNA evidence exonerated defendants after conviction).
DQa is the product of the DQA gene, found at the DQA1 locus.
In addition to Dr. Bing and his two assistants, six other expert witnesses testified. Dr. Bing and the six other expert witnesses all hold doctorates and, between them, are experienced in the areas of molecular biology, microbiology, biochemistry, DNA technology, DNA testing, forensic uses of PCR testing and quality control issues, molecular genetics, and human population genetics. The judge also examined 135 exhibits and took a view of CBR.
(l) Contamination of forensic samples with other human DNA, which causes the contaminant to be amplified and could lead to a false result.
(2) Lack of power to discriminate among alleles.
(3) Typing too much or too little DNA.
(4) Problems with the differential extraction procedure used to break open cells and release the DNA.
(5) Problems with the thermal cycler (the machine used to amplify the DNA).
(6) Problems with the primers (segments of DNA used as signals for the DNA polymerase to attach to the DNA and begin forming complementary strands).
(7) Failure to detect all alleles.
(8) Insufficient control testing within each test kit, used to test whether all alleles that should be detected are detected.
(9) Allelic dropout, where a sequence of DNA that is originally present in the sample is not formed during the copying process, resulting in a mistyping.
(10) Differential or preferential amplification, when one allele is amplified to a greater degree than another allele.
(11) The presence of inhibitors (molecules that hinder accurate typing).
(12) Degradation of the DNA so that no conclusive testing can be done.
(13) Running a yield gel to determine whether the primers have attached to the DQA1 locus.
(14) Cross-hybridization, when a probe attaches to an allele other than the allele it was designed to detect.
(15) Misincorporation and base pair mismatches.
(16) Pseudo-genes, which involve mistyping when two genes are detected.
(17) Problems with the interpretation of “dots” on the test strips.
(18) Mixtures of the concentration of DNA from two individuals.
(19) Problems with frequency calculations.
(20) Lack of regulation and accreditation for forensic laboratories performing PCR testing.
The judge observed that, at the time of the hearing, only the Federal Bureau of Investigation (FBI) laboratory was required to follow the so-called TWGDAM Guidelines. These are guidelines for assuring quality control, developed by the Technical Working Group on DNA Analysis and Methods (hence TWGDAM), a group of forensic DNA analysts from government and private laboratories. The guidelines have been endorsed by the American Society of Crime Laboratory Directors-Laboratory Accreditation Board, which, along with other organizations, provides standards for accreditation.
TWGDAM recommends that a laboratory have one fully blind proficiency test a year. This is a test of evidence samples for DNA and is called “blind” because the laboratory personnel are not told the test is being conducted. The test looks for errors in routine and other operations. “However, the logistics of constructing fully blind proficiency tests are formidable. The ‘evidence’ samples have to be submitted through an investigative agency so as to mimic a real case, and unless that is done very convincingly, a laboratory might well suspect that it is being tested.” 1996 NRC Report, supra at 24.
In concluding that the question of scientific validity is subject to de novo review, we depart from the standard of review utilized by the Federal appellate courts after the United States Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993). The Federal courts generally apply an “abuse of discretion” or “manifestly erroneous” standard to the review of a District Court’s ruling on the admissibility of expert testimony. See, e.g., United States v. Dorsey, 45 F.3d 809, 812 (4th Cir.), cert, denied, 515 U.S. 1168 (1995); Habecker v. Clark Equip. Co., 36 F.3d 278, 284, 290 (3d Cir. 1994), cert, denied, 514 U.S. 1003 (1995). But see In re Paoli R.R. Yard PCB Litig., 35 F.3d 717, 750 (3d Cir. 1994), cert, denied sub nom. General Elec. Co. v. Ingram, 513 U.S. 1190 (1995) (appellate court will give “more stringent review” or “hard look” to District Court exclusionary ruling with respect to scientific opinion testimony where that ruling results in summary or directed judgment). In Joiner v. General Elec. Co., 78 F.3d 524, 529, 535 (11th Cir. 1996), cert, granted, 117 S. Ct. 1243 (1997), the United States Court of Appeals for the Eléventh Circuit held that a “particularly *798stringent standard of review” was applicable to a trial court’s exclusion of expert testimony. The Supreme Court granted certiorari to answer the question, “What is standard of appellate review for trial court decisions excluding expert testimony under Daubert[, supra].” 65 U.S.L.W. 3619 (Mar. 18, 1997).
By contrast, many State appellate courts have continued to analyze the admissibility of scientific evidence under Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923), and apply a de novo standard of review. See, e.g., State v. Bogan, 183 Ariz. 506, 509 (Ct. App. 1995). Although we have, in concept, adopted Daubert, we nonetheless shall continue to review decisions concerning scientific validity under a de novo standard, because the validity of a scientific methodology does not vary according to the circumstances of a particular case and because a lower court’s conclusion will have applicability beyond the facts in the case before it.
See United States v. Hicks, 103 F.3d 837, 846-847 (9th Cir. 1996), cert, denied, 117 S. Ct. 1483 (1997); Seritt v. State, 647 So. 2d 1, 4-5 (Ala. Crim. App. 1994); Harmon v. State, 908 P.2d 434, 442 (Alaska Ct. App. 1995); State v. Bogan, 183 Ariz. 506, 509 (Ct. App. 1995); People v. Morganti, 43 Cal. App. 4th 643, 656 (1996); State v. Hill, 257 Kan. 774, 786 (1995); State v. Spencer, 663 So. 2d 271, 275 (La. Ct. App. 1995); People v. Lee, 212 Mich. App. 228, 281-282 (1995); State v. Hoff, 904 S.W.2d 56, 59 (Mo. Ct. App. 1995); State v. Moore, 268 Mont. 20, 34-36 (1994), overruled on other grounds, State v. Gollehon, 274 Mont. 116 (1995); State v. Dishon, 297 N.J. Super. 254, 277 (App. Div. 1997); State v. Lyons, 324 Or. 256, 263-269 (1996); State v. Moeller, 548 N.W.2d 465, 483 (S.D. 1996); Campbell v. State, 910 S.W.2d 475, 479 (Tex. Crim. App. 1995), cert, denied, 517 U.S. 1140 (1996); Spencer v. Commonwealth, 240 Va. 78, 91, cert. denied, 498 U.S. 908 (1990); State v. Gentry, 125 Wash. 2d 570, 581, cert, denied, 516 U.S. 843 (1995). See also Coleman v. Thompson, 798 F. Supp. 1209, 1213-1214, 1217 (W.D. Va.), cert, denied, 504 U.S. 992 (1992) (discussing PCR-based DNA test results, without reaching the question of their admissibility). But see Murray v. State, 692 So. 2d 157, 163 (Fla. 1997) (reversing ruling that evidence of PCR-based DNA testing was admissible, citing inadequate analysis of methodology, both testing and statistical calculations); State v. Carter, 246 Neb. 953, 984 (1994) (error to adm
These two experts were Dr. Bing, the scientist in charge of the testing at CBR, and another expert, who was employed by the manufacturer of the test kits in question, Roche Molecular Systems, Inc.
On this point, the judge referred to testimony by Dr. Bruce Budowle, the director of the FBI research laboratory in Quantico, Virginia. Dr. Budowle indicated that, at the time of the hearing, the PM technique had been in use for about a year and, because of its limited amount of information, was not going to be used in paternity or diagnostics. He also stated, in response to a question asking for his opinion of the reliability of the PM testing system, that he thought “it can be used in a reliable fashion to develop valid results for forensic applications.” With regard to D1S80, Dr. Budowle indicated that it had just recently begun to be used for case work in the FBI and that the FBI does not use the D1S80 kit that was used by CBR. The judge also referred to testimony by another expert that none of the kits used here is used in his laboratory.
This decision exhaustively reviews the PCR method and the reliability of PM and D1S80 testing and concludes as follows:
“Based on the favorable description by the National Research Council’s Commission on Forensic DNA Science, the peer-reviewed studies, the expert testimony at the Daubert hearing, and the lack of any scientific evidence disputing the reliability of the PCR methodology at any of the three loci [viz., DQA1, PM, and D1S80], the Court finds that the PCR methodology passes Daubert muster with respect to the DNA profiling at the Polymarker and D1S80 loci. The relative lack of experience with the D1S80 loci testing system (as contrasted with the other loci) may affect the weight of the evidence, but the government has demonstrated that the methodology is reliable.”
United States v. Lowe, 954 F. Supp. 401, 418 (D. Mass. 1997).
See United States v. Beasley, 102 F.3d 1440, 1447 (8th Cir. 1996), cert, denied, 117 S. Ct. 1856 (1997) (validating DQA1 and PM testing); United States v. Shea, 957 F. Supp. 331, 338 (D.N.H. 1997) (validating DQA1, PM, and D1S80 testing; analysis done by FBI laboratory); United States v. Lowe, supra (DQA1, PM, and D1S80); Brodine v. State, 936 P.2d 545 (Alaska Ct. App. 1997) (validating PM, DQA1, and D1S80 testing); Redding v. State, 219 Ga. App. 182, 185 (1995) (validating DQA1 and D1S80 testing); People v. Pope, 284 Ill. App. 3d 695 (1996) (vatidating DQA1 and PM testing); State v. Pooler, 696 So. 2d 22, 52-53 (La. Ct. App. 1997) (no error in allowing jury to view DQA1, PM, and D1S80 test results, on charts, in jury room); People v. Morales, 227 A.D.2d 648, 649-650 (N.Y. 1996) (validating DQA1 and PM testing); Keen v. Commonwealth, 24 Va. App. 795, 801-806 (1997) (testimony on DQA1 and PM tests admissible).
In Commonwealth v. Rosier, post 807 (1997), we have approved yet another type of PCR-based testing involving short tandem repeats of a few nucleotide units (STRs). This process permits testing at a large number of sites, and “[a]s more STRs are developed and validated, this system is coming into wide use.” 1996 NRC Report, supra at 71. This is yet another indication of the rapid pace at which PCR-based technology is developing and the *803scientific recognition that new systems are receiving as they are validated for forensic use.
In so concluding, we are not troubled by two concerns that appear also to have influenced the judge.
(a) The absence of mandatory accreditation and blind proficiency testing in the field of DNA analysis generally should not pose an automatic bar to the use of otherwise reliable DNA evidence. The TWGDAM guidelines require that DNA laboratories engage in open rather than blind proficiency testing, which is difficult to do. See 1996 NRC Report, supra at 79. See also notes 12 and 13, supra. The National Institute of Justice, pursuant to the DNA Identification Act of 1994, 42 U.S.C. §§ 14131-14134, has undertaken a study to review current testing programs and determine whether blind proficiency tests are feasible in alternative forms. See 1996 NRC Report, supra at 79-80.
(b) Although the judge’s conclusion that results of PM testing may be weakened because the process uses a compromise temperature to test the multiple markers derives from testimony of one of the defense experts, there appears to be no scientific literature or empirical study that reaches a similar conclusion, and the 1996 NRC Report, which makes a comprehensive evaluation of PCR-based DNA analysis, makes no mention of any difficulty with the temperature factor for PM testing. This suggests that the NRC does not view the use of a compromise temperature as a legitimate issue in PM testing.
We also note one other relevant consideration. There are common threads among the three tests. DQA1 testing, which the judge found reliable, and PM testing, which he found unreliable, share virtually identical methodologies and protocols but use different primers and probes because each targets different genetic sites. People v. Morales, supra at 650. D1S80 testing, at its second stage, uses gel electrophoresis, see note 7, supra. Electrophoresis is the same procedure used in RFLP testing, the scientific validity of which is well established. Thus, it may not be an oversimplification to say that the methodologies used in PM and D1S80 testing have support with respect to reliability because they use techniques that have been validated. This point implicitly underlies the acceptance of PM and D1S80 testing, along with DQA1 testing, in the 1996 NRC Report.
At the time of the evidentiary hearing in these cases, CBR had conducted over 2,750 PCR amplifications for DQa testing in forensic cases. By contrast, CBR had only begun to perform PM testing in 1993 and D1S80 testing in the fall of 1994. The tests conducted in the cases before the judge were conducted in November and December, 1994 (PM testing of defendant Vao Sok); February, 1995 (D1S80 testing of defendant Vao Sok); February, 1995 (PM testing of defendants Evans and Ware); November and August, 1995 (PM testing of defendant Williams); and September, 1995 (D1S80 testing of defendant Williams). At the time of the hearing, the laboratory had used the D1S80 kit in only six cases, including the Vao Sok and Williams cases involved in the evidentiary hearings.
The judge, of course, was well' aware of the distinction between issues of admissibility and weight. In another part of his decision, he declined to exclude the test results based on the defendants’ assertion that the samples *805provided to CBR for testing had been contaminated to the point where they could not be properly treated. The judge stated that this assertion raised an issue as to weight of the evidence, which could be understood and resolved by the juries that will hear the cases.
The reexamination will have to be conducted by another judge, as the judge whose decision we are considering has since retired.
We note that, in one of the scientific studies validating the PM test, the laboratories used the AmpliType PM PCR Amplification and Typing Kit and “interpreted the results based on guidelines detailed in the package insert.” Word, Summary of Validation Studies from Twenty-Six Forensic Laboratories in the United States and Canada on the Use of the AmpliType PM PCR Amplification and Typing Kit, 42 J. Forensic Sci. 39, 40 (1997) (Summary of Validation Studies). In those studies, “[d]ata were reviewed, analyzed, and interpreted according to procedures and protocols adapted by each laboratory for the PM kit.” Id. Dr. Bing is listed as one of the authors of this study, but it is not clear whether his laboratory participated in the analysis used to validate the PM test kit. Further, since the kit in question contained a user’s guide, we conclude that it could not have been the same kit used in the testing in the cases before us. If the Commonwealth can show that the results and protocol of the PM testing performed by CBR in the cases in question conformed to that of the testing validated in ¿he Summary of Validation Studies, supra, this *806may allow these test results to be admissible against those defendants whose DNA samples were subjected to PM analysis.
The Commonwealth, of course, may also decide not to pursue this further inquiry and choose either to abandon introduction of the PM and D1S80 test results, or, if feasible, to seek new testing by CBR or another laboratory that will furnish a fresh set of results, such as appears to have been done in the McNickles case.
If the test results are deemed admissible, issues as to weight (such as the assertion of contamination and the probative force of the interpretations) will be for the juries. See Commonwealth v. Phoenix, 409 Mass. 408, 422 (1991); Commonwealth v. Gomes, 403 Mass. 258, 273 (1988).