In Re Proportionality Review Project (II)

The opinion of the Court was delivered by

PORITZ, C.J.

This matter completes the second phase in a two-part review of the Court’s proportionality review procedures undertaken pursuant to our Order in State v. Loftin, 157 N.J. 253, 454-55, 724 A.2d 129 CLoftin II), cert. denied, 528 U.S. 897, 120 S.Ct. 229, 145 L.Ed.2d 193 (1999).

In Loftin II, the Court found that the proportionality review methodologies we had been using were seriously flawed and that the time had come for a “careful reconsideration” of our approach to the proportionality phase of death penalty review. Id. at 286, 724 A.2d 129. We established a process for reconsideration that included the designation of a Special Master and the submission of a report to the Court covering “four discrete areas of concern: the size of the universe of comparison cases; particular issues in respect of individual proportionality review; questions relating to the statistical models used in both individual and systemic proportionality review; and the status of proportionality review as a separate proceeding in death penalty appeals.” Ibid. The first phase of this project was completed on April 28, 1999, when the Honorable David S. Baime, a Presiding Judge of the Appellate Division appointed Special Master for the Supreme Court, submitted an initial report wherein he made findings and recommendations regarding the size of the universe, individual proportionality *209review and the models used for that purpose, and the feasibility of consolidating direct death penalty appeals with proportionality review. See David S. Baime, Report to the New Jersey Supreme Court: Proportionality Review Project (Apr. 28, 1999) (Baime Report I); see also In re Proportionality Review Project, 161 N.J. 71, 735 A.2d 528 (1999) (Proportionality Review I) (adopting Baime Report I with modifications). Upon our consideration of Baime Report I, this Court issued determinations regarding those issues in August 1999, thus establishing baseline procedures for individual proportionality review.

On December 1, 1999, Special Master Baime issued Report to the New Jersey Supreme Court: Systemic Proportionality Review Project (Dec. 1, 1999) (Baime Report II). Baime Report II, as its name suggests, “deals with questions pertaining to systemic proportionality review,” that is, “whether ethnic, racial or gender bias exists in the administration of our capital sentencing laws.” Id. at 1. After reviewing briefs submitted by the Attorney General, the Public Defender and amici curiae, Association of Criminal Defense Lawyers of New Jersey and New Jersey State Conference of NAACP Branches, and on hearing oral argument, we adopt Baime Report II with modifications as outlined in this opinion.

I

Eight years ago, we stated that “we can never dispense with the obligation to assure that the burden of the past does not create a genuine risk that defendants will be sentenced to death either because of their race or the race of the victim.” State v. Marshall, 130 N.J. 109, 219, 613 A.2d 1059 (1992) (Marshall II), cert. denied, 507 U.S. 929, 113 S.Ct. 1306, 122 L.Ed.2d 694 (1993). We turned to “[proportionality review therefore [as] a means through which to monitor ... and thereby to prevent any impermissible discrimination in [the] imposition of] the death penalty.” State v. Ramseur, 106 N.J. 123, 327, 524 A.2d 188 (1987); see also Loftin II, supra, 157 N.J. at 315, 724 A.2d 129 (“This Court is committed to a course of review that is capable of discerning possible racial discrimination in our capital sentencing system.”).

*210Yet, the development of a sound methodology for the purpose of systemic proportionality review has proved an elusive goal. Loftin II, supra, 157 N.J. at 305-16, 724 A.2d 129. The first statistical models conceived by Professor David C. Baldus, the Special Master appointed to oversee the development of a proportionality review system for the Court, were not designed to test for racial discrimination, but rather were focused on whether defendants with roughly equivalent levels of culpability were treated similarly. See id. at 310, 724 A.2d 129. In his report, Baldus informed the Court that

race valuables [were included] in the culpability models to ensure that variables for legitimate case characteristics were not carrying any possible race effects. It was in the course of this work that we observed the race effects____ Because discrimination was not the primary mandate in this project, we consider these results to be strictly preliminary. More work will be required to determine if they persist under closer scrutiny and alternative analyses, to determine, for example, whether they are statistical artifacts or flukes, and to assess them legal and practical significance.
[David C. Baldus, Death Penalty Proportionality Review Project: Final Report to the New Jersey Supreme Court 100-01 (Sept. 24, 1991) (Baldus Report).]

Despite this disclaimer, defendants have alleged racial discrimination in the administration of the death penalty premised largely on the statistical models created by Professor Baldus.

Last term, after having conducted four proportionality reviews between 1992 and 1995, we evaluated the reliability of the multiple regression models Baldus had devised.1 Loftin II, supra, 157 N.J. at 308-16, 724 A.2d 129; see also State v. DiFrisco, 142 N.J. 148, 662 A.2d 442 (1995) (DiFrisco III) (using Baldus-created models for proportionality review), cert. denied, 516 U.S. 1129, 116 S.Ct. 949, 133 L.Ed.2d 873 (1996); State v. Martini, 139 N.J. 3, 651 A.2d 949 (1994) (Martini II) (same), cert. denied, 516 U.S. 875, 116 S.Ct. 203, 133 L.Ed.2d 137 (1995); State v. Bey, 137 N.J. 334, 645 *211A.2d 685 (1994) (Bey IV) (same), cert. denied, 513 U.S. 1164, 115 S.Ct. 1131, 130 L.Ed.2d 1093 (1995); Marshall II, supra (same). We acknowledged that the models contained fundamental problems that precluded reliance on their results and that those problems were due both to intrinsic and extrinsic factors, ie., the structure or design of the models themselves and too few cases. Loftin II, supra, 157 N.J. at 310-15, 724 A.2d 129. We had asked retired Judge Richard Cohen, with the assistance of a world-renowned statistician, Dr. John W. Tukey,2 to assess those factors and others, and to report back to the Court within a short period of time. Id. at 302-03, 724 A.2d 129. See generally Richard S. Cohen, Report to the Supreme Court of New Jersey (Jan. 27, 1997) (Cohen Report). Because the Public Defender claimed that the models demonstrated an impermissible race effect, understanding the nature and impact of the models’ shortcomings was critical.

Judge Cohen informed us that the multiple regressions were inherently unreliable because they were not parsimonious, a requisite for a reliable regression. Loftin, supra, 157 N.J. at 310-11, 724 A.2d 129. “A statistical model with a relatively small number of well-crafted parameters is known as a parsimonious model.” Id. at 311, 724 A.2d 129. The lack of parsimony in the Baldus models created a risk of overfitting, possibly resulting in the false attribution of effect to a variable. Id. at 311 n. 13, 724 A.2d 129. In other words, considering the relatively small number of cases in the database, particularly death-sentenced cases, the regression models contained an excessive number of variables and, thus, statistical results suggesting racial discrimination may have reflected a methodological flaw rather than reality. Id. at 311, 724 A.2d 129. In addition to the problems related to the small size of the database, various data-coding decisions appeared to affect the model results in unintended and inappropriate ways. Id. at 311— 12, 724 A.2d 129. Those concerns about the validity of the *212statistical models were substantiated by the disparate ratings of capitally-prosecuted defendants derived from the regressions and from culpability rankings produced by a survey of judges conducted for comparison purposes. Id. at 310, 724 A.2d 129. In short, the exploratory work of Judge Cohen, and the advice of Dr. Tukey, demonstrated the need for a more thorough consideration of parsimonious models, further exploration of other methodologies, and a good hard look at some of our earlier decisions about data-coding classifications.

Loftin II provides a detailed review of these matters. Id. at 302-16, 724 A.2d 129. Suffice it to say here that the need for further work led to a detailed charge to our present Special Master, Judge Baime, that included consideration of both individual and systemic proportionality review. Id. at 453-57, 724 A.2d 129. In addition to a review of data-coding choices and the projected size of the database over time, we specifically ordered that

(6) [t]he Special Master shall attempt to develop parsimonious statistical models for more reliable regression studies of race effect and shall consider whether the process of purging, i.e., the removal of the indirect effects of race from variables that appeal to be unrelated to race, produces results that are useful; [and that]
(7) [t]he Special Master shall consider Special Master Cohen’s recommendation, submitted in State v. Loftin, supra, that the Court appoint a panel of judges to perform periodic assessments of penalty-trial outcomes, along with the composition and mandate of such an independent judicial panel, as independent verification of the culpability ratings derived from the models.
[Id at 456, 724 A.2d 129.]

In Baime Report II, Judge Baime addressed those questions and others. With assistance from Professors David Weisburd3 and Joseph Naus,4 he considered new approaches to the study of *213systemic discrimination and made recommendations to the Court. We consider Judge Baime’s systemic recommendations in this opinion.

Before we begin our discussion of the Special Master’s report, however, some further reflection on systemic review is in order. The study of system-wide discrimination requires the use of statistical techniques in complex socio-political settings. The process is far more complicated than counting the number of defendants by race and the number of death penalties meted out, although it certainly includes such elementary comparative analy-ses. A myriad of discretionary decisions are made at every level in the system, and sorting out their relationship to the race of either defendants or victims is complex and difficult. We have learned that statistical modeling for that purpose is largely untested and that its usefulness is uncertain. The improvements and additions we approve today will need further review down the road. We make these choices because we know of no other means by which the relationship, if any, between race and the death penalty system in New Jersey may be reviewed. The importance of understanding whether racial discrimination infects our system of capital punishment requires that we make this effort.

II

Judge Baime recommends a process for monitoring the possible presence of racial discrimination in the administration of the death penalty. Since the Baldus models were not developed with this goal in mind, his proposal would “constitute[ ] our first systematic effort to develop a statistical framework devised for the specific purpose of analyzing the possibility of racial discrimination in New Jersey’s capital punishment scheme.” Baime Report II at 6. He explains in broad terms:

[T]he monitoring system I propose rests on the assumption that there is no single method that is sufficiently reliable to provide convincing evidence of a race effect in death penalty sentencing. I recommend a multifaceted approach consisting of bivariate analysis, regression studies, case exploratory analysis, and a precedent-seeking type review of the cases.
[Id. at 36-37.]

*214We analyze each component of the proposed system in turn.

A.

Judge Baime first describes a series of bivariate analyses as part of the “multifaceted” approach he suggests. In a bivariate analysis, there is only one independent variable. Here, because we are testing for the presence of racial discrimination, race is that single independent variable. See Romero v. City of Pomona, 665 F.Supp. 853, 859 (C.D.Ca.1987). These analyses are designed then to test whether there are statistically significant differences based on the race of the defendant or the race of the victim in respect of the rate at which defendants eligible for the death penalty aré sentenced to death, the rate at which those defendants are prosecuted capitally, and the rate at which juries sentence capitally-prosecuted defendants to death. See Baime Report II at 49-55.

We adopt Judge Baime’s recommendation to include a series of bivariate analyses as part of our systemic review.5 This approach allows us to observe racial distributions in capital sentencing based on the raw data and without consideration of any variables other than race and sentence. The comparison is simple and easily understood. Nevertheless, we recognize, as does Judge Baime, that the utility of bivariate analyses is limited. Bivariate analyses do not take other factors, such as each defendant’s deathworthiness, into account, but rather consider only the unadjusted relationship between race and outcome, ie., advancement to penalty trial or imposition of the death penalty. This inability to control for other factors could cause misleading results, including “false positives” or the attribution of race effects in instances *215where no race effects exist. Conversely, bivariate analyses could produce false negatives, in which case the analyses would not show race effects despite the presence of racial discrimination.

Because of its inability to control for nonracial factors, a series of bivariate analyses cannot be the only methodology we use to examine racial discrimination in capital sentencing. See David Weisburd and Joseph Naus, Report to Special Master Baime Concerning Systemic Proportionality Review 24 (Nov. 24, 1999) (Weisburd and Naus Report) (attached as Technical Appendix to Baime Report II) (“It is important to keep in mind at the outset, that it is not possible to develop a reliable monitoring system of race effects using only bivariate methods. Any bivariate relationship between race and death sentencing is likely to be confounded by other factors that also influence death penalty sentencing.”). Despite its inherent limitations, we include bivariate analyses in the system we approve today because of their simplicity and because, as part of a multi-faceted approach, they may help to shed some light on any relationship between race and sentencing outcome.

B.

Creating reliable multiple regression models has been the biggest challenge in systemic proportionality review. With the assistance of Professors Weisburd and Naus, Judge Baime proposes a methodology for the development of parsimonious multiple regressions that would be the second component in the monitoring system he recommends to the Court.

As discussed earlier, see ante at 211-12, 757 A.2d at 170-71, aside from coding and other like issues, the regression models developed by Baldus suffer from a fundamental defect — a small number of cases and far too many variables “to achieve [even] a minimal degree of statistical reliability....” Loftin II, supra, 157 N.J. at 311, 724 A.2d 129. Intuitively, we expect that many different variables influence the likelihood of receiving a capital sentence and so we want to include all of them in our model. We *216have learned, however, that too many factors and too few cases can result in a finding of race effect where it does not exist, a problem we noted in connection with bivariate analysis, and also can prevent accurate measurement of the true effects of each factor. Cohen Report at 27-28. Because our database contains only fifty-three death-sentenced cases, see Baime Report II at 53-55, multiple regressions using the imposition of a death sentence as the dependent variable must have a limited number of independent variables for there to be a reasonably parsimonious model. Loftin II, supra, 157 N.J. at 311, 724 A.2d 129. According to Judge Cohen and Dr. Tukey, given the number of cases progressing through our system, multiple regressions measuring the factors that are “designed to test racial bias should employ between five and ten parameters or variables....” Ibid.; see, John W. Tukey, Report to the Special Master 5 (Jan. 27, 1997).

Lack of parsimony was a principal reason why we abandoned the index-of-outcomes test previously used in individual proportionality review to measure defendants’ culpability levels. See Proportionality Review I, supra, 161 N.J. at 91-96, 735 A.2d 528. The inability to design parsimonious regression models for individual proportionality review, however, does not necessarily prevent the development of parsimonious regression models for systemic proportionality review. See Baime Report II at 37-39. Judge Baime explains the difference between the two as follows:

The basic premise upon which our model rests is that in assessing a race effect, as contrasted with defining culpability levels for individual proportionality review, we do not have to account for all factors that influence death penalty sentencing. Bather, we need only to include in our model those factors that are related to the outcome variable (either advancement to a penalty tidal or a death sentence) and the race variable examined. This is so because our effort is not to develop a reliable estimate of culpability level on the outcome measure, but only to control for potential confounding of the race variable. We thus seek to isolate and ultimately control for possible eonfounders____ Thus, where race is distributed equally, or in statistical terms where all else is equivalent, there is no need to take into account that variable. But where there is variability in a parameter, i.e., where race is unevenly distributed, that variable should be considered for its inclusion in the regression model. As noted by Professors Weisburd and Naus, “[the] difference between the goal of gaining a reliable prediction of the outcome measure and that *217of controlling for confounding ... provides an opportunity to develop more parsimonious models that have so far been used in [assessing] death penalty sentencing.”
[Id. at 37-38 (quoting Weisburd and Naus Report at 26) (second and third alterations in original).]

The difficulty, then, lies in choosing a smaller set of variables that is consistent with those criteria. Professors Weisburd and Naus have concluded that a “statistical approach” makes the most sense, and Judge Baime agrees. Id. at 39 (quoting Weisburd and Naus Report at 27) (internal quotations omitted). Accordingly, Judge Baime proposes a multi-step methodology to be “applied” at three points in the system — to juries’ sentencing decisions, to the entire death-eligible universe, and to death-eligible cases that advance to penalty trial. Id. at 40. We summarize:

(1) Define the base set of variables, which could be limited to statutory factors or could also include variables selected in a survey conducted for that purpose.
(2) Explore the bivariate relationship between the race variable and each variable in the base set.
(3) Exclude variables that do not have a statistically significant relationship with a race variable using a threshold of .05 for the death-eligible universe and .10 for the penalty-trial universe.
(4) “Estimate the regression model including only those variables that have reached the thresholds ... plus the relevant race variables.” Id. at 41.6 If that regression is not parsimonious, raise the threshold for statistical significance.

See id. at 40-42. Judge Baime “believe[s] that this approach can produce a more parsimonious set of regression models for monitoring race effects,” although he also cautions that the omission of variables that are important indicators of outcome would “not only render[] the model incapable of explaining the outcome sought, but [would] distort[ ] the effects of the variables which have been included.” Id. at 42.

We accept the advice of the Special Master and his consultants that, when properly designed, a series of multiple regressions can be a valuable technique for measuring whether race affects the *218death-penalty system in this state. Moreover, multiple regression techniques have an advantage not shared by bivariate techniques: multiple regressions control for other factors and therefore isolate the effects of race on capital sentencing. Put another way, unlike bivariate analyses, in assessing race effects multiple regressions account for differences in defendants’ culpability, thereby permitting a more complex understanding of the relationships between the variables examined.

Other issues related to the selection of variables for the models remain. Professor Naus suggests that the base set of variables should be limited to statutory aggravating and mitigating factors plus race-of-victim and race-of-defendant variables. Id. at 20. Although that might be “[t]he most obvious and simplest solution,” ibid., Judge Baime cautions that excluding relevant nonstatutory variables could “impair the regression model’s utility” and “mask a race effect that exists or attribute race bias when there is none.” Id. at 21. Professor Weisburd recommends including nonstatuto-ry variables within the base set by surveying judges, prosecutors, defense attorneys, and scholars to identify those factors that may influence capital-sentencing decisions. Id. at 22. The Public Defender advocates including former jurors in this survey.

Our concerns about the exclusion of pertinent variables impels us to conclude that the base set of variables must include nonstatutory, as well as statutory, factors. See Marshall II, supra, 130 N.J. at 157, 613 A.2d 1059 (“If we did not consider the circumstances of the case beyond the c(4) aggravating factors in a search for disproportionality, we would be ignoring the reality of the situation____”). This decision comports with our previous recognition that nonstatutory factors may determine whether a defendant is prosecuted capitally or sentenced to death. See, e.g., State v. Chew (Chew II), 159 N.J. 183, 210-14, 731 A.2d 1070 (1999) (considering nonstatutory factors to compare defendants’ culpability in precedent-seeking approach to individual proportionality review), cert. denied, U.S. —, 120 S.Ct. 593, 145 L. Ed.2d 493 (1999). We make the choice to include non-statutory factors *219because we believe that the exclusion of relevant variables would undermine the reliability of the models. Although we consider it unnecessary to survey prosecutors, defense attorneys, scholars, and former jurors in order to identify the variables that will be used, we do approve asking judges with experience trying capital cases to select the nonstatutory factors that are most relevant to death-sentencing decisions.

Even if parsimonious multiple regressions do not omit relevant variables, the technique presents other problems that must be confronted. Where there is inconsistent coding of variables or interdependence between cases in the database (ie., the outcome in one case affects the outcome of another case), the reliability of the results is adversely affected. Regression analysis assumes that these defects are not present.

Professors Weisburd and Naus make a number of recommendations for improving the reliability of the regressions. Among other things, they advise that compliance with the independence-of-cases assumption requires no more than one case for each defendant in the universe. Baime Report II at 17-18. To ensure consistency in the coding of variables, they recommend that the Administrative Office of the Courts (AOC) code statutory-factor variables in all cases, including those in the penalty-trial universe. Id. at 29. They also suggest examining whether changes in the use and interpretations of the c(4)(c) (torture or depravity) aggravating factor and the c(5)(h) (catch-all) mitigating factor may have fundamentally altered the meaning of those variables such that their coding is no longer valid. Id. at 24. Lastly, they would do away with Baldus’s factor-analysis methodology — “providing weights for defining new variables” — -and they would code race variables more precisely and more consistently to distinguish between whites, blacks, Latinos, Asians, and others. Id. at 22-23.

We begin with the consultants’ last suggestion in respect of the coding of race. Previously, the race-of-defendant variable was coded either black or nonblack, and the race-of-victim variable was coded white or nonwhite. Judges Cohen and Baime and their *220statistical advisors have universally criticized this coding practice. We agree with their concerns and order that both defendants and victims be coded as white, black, Hispanic, Asian, or other. That coding change would increase accuracy, be more complete, and recognize the racial and ethnic diversity that exists in New Jersey.

We accept the principle that “adoption of a method for defining a single ease for each defendant for regression analysis” is appropriate. Id. at 18. Thus, for example, Joseph Harris committed five death-eligible murders in two separate incidents. Only one of those murders will be included in the database even though to do so cuts against our instincts in respect of a defendant who murdered more than one victim in separate incidents. We rely, in making this decision, on the experts who tell us the models require that there be “no systematic relationship between measured characteristics of the eases and unmeasured characteristics that influence death penalty sentencing.” Ibid.

We recognize that there is inconsistency in the coding of variables. The use and interpretation of the c(4)(c) (torture or depravity) aggravating factor and c(5)(h) (catch-all) mitigating factor have changed over time. Also, having AOC staff members code the presence or absence of aggravating and mitigating factors obviously differs from having a jury decide whether aggravating and mitigating factors are present. We have given those issues some thought and have decided, for now, to tolerate certain inconsistencies inherent in the system. In part, we are reluctant to substitute AOC staff members’ judgments based on a cold record for the findings of a jury that sat through and participated in a capital trial.7

We expect that the new parsimonious regressions will be more reliable than the models developed by our first Special Master. Nonetheless, we are mindful still of the relatively small number of *221cases in the database. The Court has remarked before on the reluctance of New Jersey juries to impose the ultimate penalty except in the most egregious cases. See Marshall II, supra, 130 N.J. at 192-94, 613 A.2d 1059. Over the last five years, jurors have sentenced only ten defendants to death. It is ironic indeed that the result of juror reluctance to impose the death penalty is a database too small for reliable statistical analysis of race effects. That small database, and the risk of violating one or more of the assumptions upon which multiple regressions are based, requires us to act cautiously when we consider the use of multiple regressions in systemic review.

C.

Judge Baime recommends using case-sorting techniques as a third component of the proposed monitoring system. Under the sorting approach, the data is broken into various combinations of statutory and nonstatutory factors that have a statistically significant impact on either death-sentencing or penalty-trial-advancement rates. Baime Report II at 45. Judge Baime described “the steps taken by Professors Weisburd and Naus in analyzing our data base:”

(1) Divide the death-eligible universe into three groups: (1) cases involving defendants each with one case per defendant, (2) cases involving defendants with multiple cases in the death-eligible universe, and with several victims (simultaneous killings and killings on separate occasions), and (3) cases involving defendants each with one victim, but each defendant having more than one case in the data base (reversed convictions and death sentences resulting in retrials or pleas).
(2) Define the relevant set of variables. Here, as in the regression approach, we must begin with a specific set of measures that will be considered for sorting purposes.
(3) Introduce a combination of aggravating and mitigating factors. Only those factors having a statistically significant association with the outcome measured (advancement to penalty trial or death sentence) and, when placed in combination, have a sufficient number of cases to permit analysis are to be used.
(4) Select combinations of the levels of the factors, i.e., all present, all absent, only factor “A” [8] is present, only factor “B” is present. Determine which levels have *222different sentencing rates. Where neither factor is present, introduce another aggravating or mitigating factor that splits the data in a statistically significant way relating to sentencing.
(5) Analyze the data by stages. The data should be analyzed in terms of fractions of death-eligible cases advancing to penalty trial, fractions of death-eligible cases in which the death-penalty is imposed, and fractions of penalty trials resulting in imposition of a death sentence.
(6) Analyze the combinations by race of the defendant or victim.
(7) Identify categories with strong racial disparities. Conduct a precedent-seeking review of the cases in those categories by a panel of experienced judges to determine whether the defendants’ culpability levels explain the outcome measured (progression to penalty trial or imposition of death sentence).
[Id. at 46-47.]

Judge Baime also suggests that the sorting approach could be simplified through use of the salient-factors categories used in individual proportionality review, ie., that cases be classified by their most important aggravating attribute. Id. at 44-45. For the purposes of systemic proportionality review, each salient-factors-test category could be tested for the presence of a race effect. Id. at 45.

We adopt Judge Baime’s recommendation to include the sorting approach when the AOC and the Special Master provide the next set of models to the Public Defender and the Attorney General. Explanatory case analysis may assist in classifying cases and determining whether there are racial disparities in any of the case categories. We recognize that the sorting approach can examine only two variables at a time and does not control for other variables that may influence death-sentencing decisions. Consequently, like simple bivariate analyses, the results may be confounded by unexamined factors. Also, sorting can be unwieldy because there are so many possible combinations of statutory and nonstatutory factors, although we understand that only those combinations that have a statistically significant correlation with the imposition of a death sentence will be analyzed. See id. at 45.

Because the Public Defender has raised valid questions about the sorting methodology, particularly in connection with the combinations of factors, we will proceed step-by-step. Exploratory case analysis is a new approach to systemic review. We leave its *223administration to the Special Master who, we expect, will draw on the expertise of his statistical consultants. We know that decisions must be made as the technique is refined. We will consider those decisions in our first proportionality review after the new system is put in place. In that review we will evaluate whether the sorting approach is potentially useful and whether it should or should not be continued as part of the monitoring system.

D.

The Public Defender also argues that the monitoring system should include judges’ culpability ratings of defendants for comparison purposes. Judge Cohen had recommended the appointment of a panel of judges to periodically assess penalty-trial outcomes as a check against model outcomes. See Cohen Report at 45. Judge Baime nonetheless mentions using experienced judges in only two contexts: to assist in choosing a base set of variables for parsimonious models and to “determine whether the culpability levels explain the outcome measured (progression to penalty trial or imposition of death sentence)” when exploratory case analysis reveals “categories with strong racial disparities.” Baime Report II at 47. We accept Judge Baime’s recommendations and limit the use of judge surveys to those contexts.

Ill

Bivariate analyses (including the sorting approach), and multiple regression analyses suffer from certain inherent weaknesses. As a consequence, we cannot rely on the results of any single methodology. If one approach appears to indicate a race effect, we cannot be confident that the effect derives from racial discrimination rather than from one or more statistical flaws. These considerations reinforce our earlier conclusion in Loftin II that a defendant must “relentlessly document[ ] the risk of racial disparity in the imposition of the death penalty” in order to establish disproportionality. 157 N.J. at 315, 724 A.2d 129 (internal quotations omitted). In this context, application of that standard requires that we find converging outcomes produced by the *224application of a variety of techniques. Judge Baime’s proposed multifaceted approach provides the means to carry out this task.

The dissent misperceives our intent by confusing the standard we apply with “defendant’s overall burden of proof.” Ante at 232, 757 A.2d at 183 (Long, J., dissenting). But the standard is keyed directly to the statistical method and has no meaning other than in that singular context. It does “not derive” from statistics, ibid., but, rather, represents our effort to deal with the limitations in the methods available to us in a way that makes sense.

Our goal in conducting systemic proportionality review is to be assured that there is no significant likelihood that race improperly affects the administration of capital punishment in New Jersey. In reaching that goal, we are cognizant of the nature of the evidence before us and of the need to tailor our response to the reliability of the “ ‘science [we] invoke[ ].’ ” Loftin II, supra, 157 N.J. at 314, 724 A.2d 129 (quoting Cohen Report, supra, at 12). In Loftin II, we linked the relentless documentation of the risk of discrimination standard to the inherent weaknesses in the models, and we do so here. The reliability of the statistical methodology is not without doubt, and we cannot ignore that fact by accepting as dispositive admittedly flawed evidence. Judge Baime states that “an isolated finding produced by a single mode of analysis would not be enough,” and that “[a] consistent finding using multiple methods, each subject to its own limitations, is more reliable than using only one method.” Baime Report II at 37. He is, we believe, carefully explaining the kind of showing that is necessary in this unique setting. If that showing is made, we would know that there is a significant likelihood there are race effects in the death penalty system.

IV

We anticipate that the Special Master for proportionality review will annually update the database and the modes of analysis in the monitoring system, and that he will present a summary review of the models to the Public Defender and the Attorney *225General for their comments. We further anticipate that defendants will raise systemic-disproportionality claims in their appeals to the Court.9 If, after a rigorous review of the data, we conclude in one case that the defendant has not met his burden of establishing systemic disproportionality, we will rely on that conclusion in subsequent cases and until the next yearly adjustment to the data. We note that this approach is similar to our previous practice wherein we rejected systemic-disproportionality claims on stare decisis grounds. See State v. Martini, 160 N.J. 248, 275, 734 A.2d 257 (1999) (Martini V); State v. Harvey, 159 N.J. 277, 319, 731 A.2d 1121 (1999) (Harvey III), cert. denied, — U.S. —, 120 S.Ct. 811, 145 L.Ed.2d 683 (2000); Chew II, supra, 159 N.J. at 222-23, 731 A.2d 1070; State v. Cooper, 159 N.J. 55, 116, 731 A.2d 1000 (1999) (Cooper II), cert. denied, — U.S. —, 120 S.Ct. 809, 145 L.Ed.2d 681 (2000); DiFrisco III, supra, 142 N.J. at 210, 662 A.2d 442; Martini II, supra, 139 N.J. at 80, 651 A.2d 949.

Our implementation of this monitoring system does not imply either the presence or the absence of a race effect in capital sentencing. No reliable demonstration has been made to date that racial discrimination improperly affects the administration of the death penalty in this state. We nonetheless approve the system recommended by Judge Baime because “[wjhether in the exercise of statutory proportionality review or our constitutional duty to assure the equal protection and due process of law, we cannot escape the responsibility to review any effects of race in capital sentencing.” Marshall II, supra, 130 N.J. at 214, 613 A.2d 1059.

Y

Finally, there are three proportionality review cases before the Court this term in which defendants have alleged systemic *226racial discrimination. See State v. Harris, 165 N.J. 303, 757 A.2d 221 (2000); State v. Feaster, 165 N.J. 388, 757 A.2d 266 (2000); State v. Morton, 165 N.J. 235, 757 A.2d 184 (2000). Judge Baime has examined the available data under a hybrid of the methodologies used in previous proportionality reviews and under the approaches he recommends for use in future reviews. See Baime Report II at 48-66. He has concluded:

We find no reliable statistical evidence that the race of the defendant influences death sentencing either at the penalty trial stage or in the larger death-eligible sample of cases. Nor does the statistical evidence support the thesis that the race of the defendant affects which cases progress to a penalty trial. Further, the statistical evidence suggests that the race of [the] victim does not affect death-sentencing rates — killers of white victims are no more likely to receive the death penalty than killers of non-white victims. Finally application of our monitoring system discloses no consistent statistical evidence indicating that the race of the victim affects which cases progress to a penalty trial. However, some of the evidence in that respect is conflicting, and the issue should be revisited when the database increases.
[Id. at 66.]

We agree that the risk of racial discrimination has not been relentlessly documented by the models. We are concerned, however, that a bivariate analysis suggests that killers of white victims are more likely to be capitally prosecuted than killers of black victims. Because other methodologies do not corroborate those results, and because a bivariate analysis by definition does not account for other factors that may influence death-sentencing decisions, we cannot find “that race has operated as an impermissible factor in the imposition of the death penalty.” Loftin II, supra, 157 N.J. at 346, 724 A.2d 129.

VI

We direct the AOC and the Special Master to implement the monitoring system we approve in this opinion.

"Multiple-regression analysis is a statistical tool used to describe the relationship between one or more independent variables (e.g., prior murder) and a dependent variable (e.g., the death penalty).'' Loftin II, supra, 157 N.J. at 295 n. 8, 724 A.2d 129.

Dr. Tukey, who passed away july 25, 2000, was a Professor Emeritus of Statistics at Princeton University. He was a 1973 recipient of the National Medal of Science.

Dr. David Weisburd is the Director of and a Professor Min HaMinyan (Full Professor) at the Institute of Criminology at The Hebrew University. Since 1993, he has been a scientific advisor for the Death Penalty Proportionality Review Project.

Dr. Joseph Naus is a Professor of Statistics at Rutgers University. Between 1981 and 1993, Naus served at various times as Chairman of the Rutgers University Department of Statistics.

The implementation of these proposals requires that we continue to examine the larger universe of death-eligible cases and capitally prosecuted cases, as well as the more limited universe of death-sentenced cases. See Proportionality Review I, supra, 161 N.J. at 84, 735 A.2d 528 (rejecting proportionality universe of death-sentenced cases only).

By way of example, if there is a statistically significant relationship between the race of defendant variable and the prior-murder aggravating-factor variable, the prior-murder variable would be included in the model.

We also approve removing cases from the universe that no longer meet statutory requirements for capital murder and eliminating the Baldus factor-analysis methodology.

In one combination, for example, factor "A" might be c(4)(c) (torture or depravity), and factor "B” might be c(5)(d) (diminished capacity).

Judge Baime suggests that "some moratorium [on systemic proportionality would be appropriate] until the data base expands to a point that makes inquiry worthwhile.” Id. at 33. For the present, we chose to add new cases on a yearly basis and to conduct reviews thereafter.