757 A.2d 168 (2000) 165 N.J. 206

In re PROPORTIONALITY REVIEW PROJECT (II).

Supreme Court of New Jersey.

Argued February 15, 2000. Decided August 2, 2000.

*169 Claudia Van Wyk, Deputy Public Defender, II, and Mordecai D. Garelick, Assistant Deputy Public Defender, argued the cause on behalf of the Office of the Public Defender (Ivelisse Torres, Public Defender, attorney).

Catherine A. Foddai, Deputy Attorney General, argued the cause on behalf of the Attorney General of New Jersey (John J. Farmer, Jr., Attorney General, attorney).

Lawrence S. Lustberg, Newark, submitted a brief on behalf of amici curiae Association of Criminal Defense Lawyers of New Jersey and New Jersey State Conference of NAACP Branches (Gibbons, Del Deo, Dolan, Griffinger & Vecchione, attorneys).

The opinion of the Court was delivered by PORITZ, C.J.

This matter completes the second phase in a two-part review of the Court's proportionality review procedures undertaken pursuant to our Order in State v. Loftin, 157 N.J. 253, 454-55, 724 A.2d 129 (Loftin II), cert. denied, ___ U.S. ___, 120 S.Ct. 229, 145 L. Ed.2d 193 (1999).

In Loftin II, the Court found that the proportionality review methodologies we had been using were seriously flawed and that the time had come for a "careful reconsideration" of our approach to the proportionality phase of death penalty review. Id. at 286, 724 A.2d 129. We established a process for reconsideration that included the designation of a Special Master and the submission of a report to the Court covering "four discrete areas of concern: the size of the universe of comparison cases; particular issues in respect of individual proportionality review; questions relating to the statistical models used in both individual and systemic proportionality review; and the status of proportionality review as a separate proceeding in death penalty appeals." Ibid. The first phase of this project was completed on April 28, 1999, when the Honorable David S. Baime, a Presiding Judge of the Appellate Division appointed Special Master for the Supreme Court, submitted an initial report wherein he made findings and recommendations regarding the size of the universe, individual proportionality review and the models used for that purpose, and the feasibility of consolidating direct death penalty appeals with proportionality review. See David S. Baime, Report to the New Jersey Supreme Court: Proportionality Review Project (Apr. 28, 1999) (Baime Report I); see also In re Proportionality Review Project, 161 N.J. 71, 735 A.2d 528 (1999) (Proportionality Review I) (adopting Baime Report I with modifications). Upon our consideration of Baime Report I, this Court issued determinations regarding those issues in August 1999, thus establishing baseline procedures for individual proportionality review.

On December 1, 1999, Special Master Baime issued Report to the New Jersey Supreme Court: Systemic Proportionality Review Project (Dec. 1, 1999) (Baime Report II). Baime Report II, as its name suggests, "deals with questions pertaining to systemic proportionality review," that is, "whether ethnic, racial or gender bias exists in the administration of our capital sentencing laws." Id. at 1. After reviewing briefs submitted by the Attorney General, the Public Defender and amici curiae, Association of Criminal Defense Lawyers of New Jersey and New Jersey State Conference of NAACP Branches, and on hearing oral argument, we adopt Baime Report II with modifications as outlined in this opinion.

I

Eight years ago, we stated that "we can never dispense with the obligation to assure that the burden of the past does not create a genuine risk that defendants will be sentenced to death either because of their race or the race of the victim." State *170 v. Marshall, 130 N.J. 109, 219, 613 A.2d 1059 (1992) (Marshall II), cert. denied, 507 U.S. 929, 113 S.Ct. 1306, 122 L.Ed.2d 694 (1993). We turned to "[p]roportionality review therefore [as] a means through which to monitor ... and thereby to prevent any impermissible discrimination in [the] imposi[tion of] the death penalty." State v. Ramseur, 106 N.J. 123, 327, 524 A.2d 188 (1987); see also Loftin II, supra, 157 N.J. at 315, 724 A.2d 129 ("This Court is committed to a course of review that is capable of discerning possible racial discrimination in our capital sentencing system.").

Yet, the development of a sound methodology for the purpose of systemic proportionality review has proved an elusive goal. Loftin II, supra, 157 N.J. at 305-16, 724 A.2d 129. The first statistical models conceived by Professor David C. Baldus, the Special Master appointed to oversee the development of a proportionality review system for the Court, were not designed to test for racial discrimination, but rather were focused on whether defendants with roughly equivalent levels of culpability were treated similarly. See id. at 310, 724 A.2d 129. In his report, Baldus informed the Court that

race variables [were included] in the culpability models to ensure that variables for legitimate case characteristics were not carrying any possible race effects. It was in the course of this work that we observed the race effects.... Because discrimination was not the primary mandate in this project, we consider these results to be strictly preliminary. More work will be required to determine if they persist under closer scrutiny and alternative analyses, to determine, for example, whether they are statistical artifacts or flukes, and to assess their legal and practical significance.

[David C. Baldus, Death Penalty Proportionality Review Project: Final Report to the New Jersey Supreme Court 100-01 (Sept. 24, 1991) (Baldus Report).]

Despite this disclaimer, defendants have alleged racial discrimination in the administration of the death penalty premised largely on the statistical models created by Professor Baldus.

Last term, after having conducted four proportionality reviews between 1992 and 1995, we evaluated the reliability of the multiple regression models Baldus had devised.^[1]Loftin II, supra, 157 N.J. at 308-16, 724 A.2d 129; see also State v. DiFrisco, 142 N.J. 148, 662 A.2d 442 (1995) (DiFrisco III) (using Baldus-created models for proportionality review), cert. denied, 516 U.S. 1129, 116 S.Ct. 949, 133 L.Ed.2d 873 (1996); State v. Martini, 139 N.J. 3, 651 A.2d 949 (1994) (Martini II) (same), cert. denied, 516 U.S. 875, 116 S.Ct. 203, 133 L.Ed.2d 137 (1995); State v. Bey, 137 N.J. 334, 645 A.2d 685 (1994) (Bey IV) (same), cert. denied, 513 U.S. 1164, 115 S.Ct. 1131, 130 L.Ed.2d 1093 (1995); Marshall II, supra (same). We acknowledged that the models contained fundamental problems that precluded reliance on their results and that those problems were due both to intrinsic and extrinsic factors, i.e., the structure or design of the models themselves and too few cases. Loftin II, supra, 157 N.J. at 310-15, 724 A.2d 129. We had asked retired Judge Richard Cohen, with the assistance of a world-renowned statistician, Dr. John W. Tukey,^[2] to assess those factors and others, and to report back to the Court within a short period of time. Id. at 302-03, 724 A.2d 129. See generally Richard S. Cohen, Report to the Supreme Court of New Jersey (Jan. 27, 1997) (Cohen Report). Because *171 the Public Defender claimed that the models demonstrated an impermissible race effect, understanding the nature and impact of the models' shortcomings was critical.

Judge Cohen informed us that the multiple regressions were inherently unreliable because they were not parsimonious, a requisite for a reliable regression. Loftin, supra, 157 N.J. at 310-11, 724 A.2d 129. "A statistical model with a relatively small number of well-crafted parameters is known as a parsimonious model." Id. at 311, 724 A.2d 129. The lack of parsimony in the Baldus models created a risk of overfitting, possibly resulting in the false attribution of effect to a variable. Id. at 311 n. 13, 724 A.2d 129. In other words, considering the relatively small number of cases in the database, particularly death-sentenced cases, the regression models contained an excessive number of variables and, thus, statistical results suggesting racial discrimination may have reflected a methodological flaw rather than reality. Id. at 311, 724 A.2d 129. In addition to the problems related to the small size of the database, various data-coding decisions appeared to affect the model results in unintended and inappropriate ways. Id. at 311-12, 724 A.2d 129. Those concerns about the validity of the statistical models were substantiated by the disparate ratings of capitally-prosecuted defendants derived from the regressions and from culpability rankings produced by a survey of judges conducted for comparison purposes. Id. at 310, 724 A.2d 129. In short, the exploratory work of Judge Cohen, and the advice of Dr. Tukey, demonstrated the need for a more thorough consideration of parsimonious models, further exploration of other methodologies, and a good hard look at some of our earlier decisions about data-coding classifications.

Loftin II provides a detailed review of these matters. Id. at 302-16, 724 A.2d 129. Suffice it to say here that the need for further work led to a detailed charge to our present Special Master, Judge Baime, that included consideration of both individual and systemic proportionality review. Id. at 453-57, 724 A.2d 129. In addition to a review of data-coding choices and the projected size of the database over time, we specifically ordered that

(6) [t]he Special Master shall attempt to develop parsimonious statistical models for more reliable regression studies of race effect and shall consider whether the process of purging, i.e., the removal of the indirect effects of race from variables that appear to be unrelated to race, produces results that are useful; [and that]

(7) [t]he Special Master shall consider Special Master Cohen's recommendation, submitted in State v. Loftin, supra, that the Court appoint a panel of judges to perform periodic assessments of penalty-trial outcomes, along with the composition and mandate of such an independent judicial panel, as independent verification of the culpability ratings derived from the models.

[Id. at 456, 724 A.2d 129.]

In Baime Report II, Judge Baime addressed those questions and others. With assistance from Professors David Weisburd^[3] and Joseph Naus,^[4] he considered new approaches to the study of systemic discrimination and made recommendations to the Court. We consider Judge Baime's systemic recommendations in this opinion.

Before we begin our discussion of the Special Master's report, however, some further reflection on systemic review is in order. The study of system-wide discrimination *172 requires the use of statistical techniques in complex socio-political settings. The process is far more complicated than counting the number of defendants by race and the number of death penalties meted out, although it certainly includes such elementary comparative analyses. A myriad of discretionary decisions are made at every level in the system, and sorting out their relationship to the race of either defendants or victims is complex and difficult. We have learned that statistical modeling for that purpose is largely untested and that its usefulness is uncertain. The improvements and additions we approve today will need further review down the road. We make these choices because we know of no other means by which the relationship, if any, between race and the death penalty system in New Jersey may be reviewed. The importance of understanding whether racial discrimination infects our system of capital punishment requires that we make this effort.

II

Judge Baime recommends a process for monitoring the possible presence of racial discrimination in the administration of the death penalty. Since the Baldus models were not developed with this goal in mind, his proposal would "constitute[ ] our first systematic effort to develop a statistical framework devised for the specific purpose of analyzing the possibility of racial discrimination in New Jersey's capital punishment scheme." Baime Report II at 6. He explains in broad terms:

[T]he monitoring system I propose rests on the assumption that there is no single method that is sufficiently reliable to provide convincing evidence of a race effect in death penalty sentencing. I recommend a multifaceted approach consisting of bivariate analysis, regression studies, case exploratory analysis, and a precedent-seeking type review of the cases.

[Id. at 36-37.]

We analyze each component of the proposed system in turn.

A.

Judge Baime first describes a series of bivariate analyses as part of the "multifaceted" approach he suggests. In a bivariate analysis, there is only one independent variable. Here, because we are testing for the presence of racial discrimination, race is that single independent variable. See Romero v. City of Pomona, 665 F.Supp. 853, 859 (C.D.Ca.1987). These analyses are designed then to test whether there are statistically significant differences based on the race of the defendant or the race of the victim in respect of the rate at which defendants eligible for the death penalty are sentenced to death, the rate at which those defendants are prosecuted capitally, and the rate at which juries sentence capitally-prosecuted defendants to death. See Baime Report II at 49-55.

We adopt Judge Baime's recommendation to include a series of bivariate analyses as part of our systemic review.^[5] This approach allows us to observe racial distributions in capital sentencing based on the raw data and without consideration of any variables other than race and sentence. The comparison is simple and easily understood. Nevertheless, we recognize, as does Judge Baime, that the utility of bivariate analyses is limited. Bivariate analyses do not take other factors, such as each defendant's deathworthiness, into account, but rather consider only the unadjusted relationship between race and outcome, i.e., advancement to penalty trial or imposition of the death penalty. This inability to control for other factors could cause misleading results, including "false positives" *173 or the attribution of race effects in instances where no race effects exist. Conversely, bivariate analyses could produce false negatives, in which case the analyses would not show race effects despite the presence of racial discrimination.

Because of its inability to control for nonracial factors, a series of bivariate analyses cannot be the only methodology we use to examine racial discrimination in capital sentencing. See David Weisburd and Joseph Naus, Report to Special Master Baime Concerning Systemic Proportionality Review 24 (Nov. 24, 1999) (Weisburd and Naus Report) (attached as Technical Appendix to Baime Report II) ("It is important to keep in mind at the outset, that it is not possible to develop a reliable monitoring system of race effects using only bivariate methods. Any bivariate relationship between race and death sentencing is likely to be confounded by other factors that also influence death penalty sentencing."). Despite its inherent limitations, we include bivariate analyses in the system we approve today because of their simplicity and because, as part of a multi-faceted approach, they may help to shed some light on any relationship between race and sentencing outcome.

B.

Creating reliable multiple regression models has been the biggest challenge in systemic proportionality review. With the assistance of Professors Weisburd and Naus, Judge Baime proposes a methodology for the development of parsimonious multiple regressions that would be the second component in the monitoring system he recommends to the Court.

As discussed earlier, see ante at 211-12, 757 A.2d at 170-71, aside from coding and other like issues, the regression models developed by Baldus suffer from a fundamental defecta small number of cases and far too many variables "to achieve [even] a minimal degree of statistical reliability...." Loftin II, supra, 157 N.J. at 311, 724 A.2d 129. Intuitively, we expect that many different variables influence the likelihood of receiving a capital sentence and so we want to include all of them in our model. We have learned, however, that too many factors and too few cases can result in a finding of race effect where it does not exist, a problem we noted in connection with bivariate analysis, and also can prevent accurate measurement of the true effects of each factor. Cohen Report at 27-28. Because our database contains only fifty-three death-sentenced cases, see Baime Report II at 53-55, multiple regressions using the imposition of a death sentence as the dependent variable must have a limited number of independent variables for there to be a reasonably parsimonious model. Loftin II, supra, 157 N.J. at 311, 724 A.2d 129. According to Judge Cohen and Dr. Tukey, given the number of cases progressing through our system, multiple regressions measuring the factors that are "designed to test racial bias should employ between five and ten parameters or variables...." Ibid.; see, John W. Tukey, Report to the Special Master 5 (Jan. 27, 1997).

Lack of parsimony was a principal reason why we abandoned the index-of-outcomes test previously used in individual proportionality review to measure defendants' culpability levels. See Proportionality Review I, supra, 161 N.J. at 91-96, 735 A.2d 528. The inability to design parsimonious regression models for individual proportionality review, however, does not necessarily prevent the development of parsimonious regression models for systemic proportionality review. See Baime Report II at 37-39. Judge Baime explains the difference between the two as follows:

The basic premise upon which our model rests is that in assessing a race effect, as contrasted with defining culpability levels for individual proportionality review, we do not have to account for all factors that influence death penalty sentencing. Rather, we need only to include in our model those factors that are related to the outcome variable (either *174 advancement to a penalty trial or a death sentence) and the race variable examined. This is so because our effort is not to develop a reliable estimate of culpability level on the outcome measure, but only to control for potential confounding of the race variable. We thus seek to isolate and ultimately control for possible confounders.... Thus, where race is distributed equally, or in statistical terms where all else is equivalent, there is no need to take into account that variable. But where there is variability in a parameter, i.e., where race is unevenly distributed, that variable should be considered for its inclusion in the regression model. As noted by Professors Weisburd and Naus, "[the] difference between the goal of gaining a reliable prediction of the outcome measure and that of controlling for confounding... provides an opportunity to develop more parsimonious models that have so far been used in [assessing] death penalty sentencing."

[Id. at 37-38 (quoting Weisburd and Naus Report at 26) (second and third alterations in original).]

The difficulty, then, lies in choosing a smaller set of variables that is consistent with those criteria. Professors Weisburd and Naus have concluded that a "statistical approach" makes the most sense, and Judge Baime agrees. Id. at 39 (quoting Weisburd and Naus Report at 27) (internal quotations omitted). Accordingly, Judge Baime proposes a multi-step methodology to be "applied" at three points in the systemto juries' sentencing decisions, to the entire death-eligible universe, and to death-eligible cases that advance to penalty trial. Id. at 40. We summarize:

(1) Define the base set of variables, which could be limited to statutory factors or could also include variables selected in a survey conducted for that purpose.

(2) Explore the bivariate relationship between the race variable and each variable in the base set.

(3) Exclude variables that do not have a statistically significant relationship with a race variable using a threshold of .05 for the death-eligible universe and .10 for the penalty-trial universe.

(4) "Estimate the regression model including only those variables that have reached the thresholds ... plus the relevant race variables." Id. at 41.^[6] If that regression is not parsimonious, raise the threshold for statistical significance.

See id. at 40-42. Judge Baime "believe[s] that this approach can produce a more parsimonious set of regression models for monitoring race effects," although he also cautions that the omission of variables that are important indicators of outcome would "not only render[ ] the model incapable of explaining the outcome sought, but [would] distort[ ] the effects of the variables which have been included." Id. at 42.

We accept the advice of the Special Master and his consultants that, when properly designed, a series of multiple regressions can be a valuable technique for measuring whether race affects the death-penalty system in this state. Moreover, multiple regression techniques have an advantage not shared by bivariate techniques: multiple regressions control for other factors and therefore isolate the effects of race on capital sentencing. Put another way, unlike bivariate analyses, in assessing race effects multiple regressions account for differences in defendants' culpability, thereby permitting a more complex understanding of the relationships between the variables examined.

Other issues related to the selection of variables for the models remain. Professor Naus suggests that the base set of variables should be limited to statutory aggravating and mitigating factors plus *175 race-of-victim and race-of-defendant variables. Id. at 20. Although that might be "[t]he most obvious and simplest solution," ibid., Judge Baime cautions that excluding relevant nonstatutory variables could "impair the regression model's utility" and "mask a race effect that exists or attribute race bias when there is none." Id. at 21. Professor Weisburd recommends including nonstatutory variables within the base set by surveying judges, prosecutors, defense attorneys, and scholars to identify those factors that may influence capital-sentencing decisions. Id. at 22. The Public Defender advocates including former jurors in this survey.

Our concerns about the exclusion of pertinent variables impels us to conclude that the base set of variables must include nonstatutory, as well as statutory, factors. See Marshall II, supra, 130 N.J. at 157, 613 A. 2d 1059 ("If we did not consider the circumstances of the case beyond the c(4) aggravating factors in a search for disproportionality, we would be ignoring the reality of the situation...."). This decision comports with our previous recognition that nonstatutory factors may determine whether a defendant is prosecuted capitally or sentenced to death. See, e.g., State v. Chew (Chew II), 159 N.J. 183, 210-14, 731 A.2d 1070 (1999) (considering nonstatutory factors to compare defendants' culpability in precedent-seeking approach to individual proportionality review), cert. denied,___ U.S.___, 120 S.Ct. 593, 145 L.Ed.2d 493 (1999). We make the choice to include non-statutory factors because we believe that the exclusion of relevant variables would undermine the reliability of the models. Although we consider it unnecessary to survey prosecutors, defense attorneys, scholars, and former jurors in order to identify the variables that will be used, we do approve asking judges with experience trying capital cases to select the nonstatutory factors that are most relevant to death-sentencing decisions.

Even if parsimonious multiple regressions do not omit relevant variables, the technique presents other problems that must be confronted. Where there is inconsistent coding of variables or interdependence between cases in the database (i.e., the outcome in one case affects the outcome of another case), the reliability of the results is adversely affected. Regression analysis assumes that these defects are not present.

Professors Weisburd and Naus make a number of recommendations for improving the reliability of the regressions. Among other things, they advise that compliance with the independence-of-cases assumption requires no more than one case for each defendant in the universe. Baime Report II at 17-18. To ensure consistency in the coding of variables, they recommend that the Administrative Office of the Courts (AOC) code statutory-factor variables in all cases, including those in the penalty-trial universe. Id. at 29. They also suggest examining whether changes in the use and interpretations of the c(4)(c) (torture or depravity) aggravating factor and the c(5)(h) (catch-all) mitigating factor may have fundamentally altered the meaning of those variables such that their coding is no longer valid. Id. at 24. Lastly, they would do away with Baldus's factor-analysis methodology"providing weights for defining new variables"and they would code race variables more precisely and more consistently to distinguish between whites, blacks, Latinos, Asians, and others. Id. at 22-23.

We begin with the consultants' last suggestion in respect of the coding of race. Previously, the race-of-defendant variable was coded either black or nonblack, and the race-of-victim variable was coded white or nonwhite. Judges Cohen and Baime and their statistical advisors have universally criticized this coding practice. We agree with their concerns and order that both defendants and victims be coded as white, black, Hispanic, Asian, or other. That coding change would increase accuracy, be more complete, and recognize the *176 racial and ethnic diversity that exists in New Jersey.

We accept the principle that "adoption of a method for defining a single case for each defendant for regression analysis" is appropriate. Id. at 18. Thus, for example, Joseph Harris committed five death-eligible murders in two separate incidents. Only one of those murders will be included in the database even though to do so cuts against our instincts in respect of a defendant who murdered more than one victim in separate incidents. We rely, in making this decision, on the experts who tell us the models require that there be "no systematic relationship between measured characteristics of the cases and unmeasured characteristics that influence death penalty sentencing." Ibid.

We recognize that there is inconsistency in the coding of variables. The use and interpretation of the c(4)(c) (torture or depravity) aggravating factor and c(5)(h) (catch-all) mitigating factor have changed over time. Also, having AOC staff members code the presence or absence of aggravating and mitigating factors obviously differs from having a jury decide whether aggravating and mitigating factors are present. We have given those issues some thought and have decided, for now, to tolerate certain inconsistencies inherent in the system. In part, we are reluctant to substitute AOC staff members' judgments based on a cold record for the findings of a jury that sat through and participated in a capital trial.^[7]

We expect that the new parsimonious regressions will be more reliable than the models developed by our first Special Master. Nonetheless, we are mindful still of the relatively small number of cases in the database. The Court has remarked before on the reluctance of New Jersey juries to impose the ultimate penalty except in the most egregious cases. See Marshall II, supra, 130 N.J. at 192-94, 613 A.2d 1059. Over the last five years, jurors have sentenced only ten defendants to death. It is ironic indeed that the result of juror reluctance to impose the death penalty is a database too small for reliable statistical analysis of race effects. That small database, and the risk of violating one or more of the assumptions upon which multiple regressions are based, requires us to act cautiously when we consider the use of multiple regressions in systemic review.

C.

Judge Baime recommends using case-sorting techniques as a third component of the proposed monitoring system. Under the sorting approach, the data is broken into various combinations of statutory and nonstatutory factors that have a statistically significant impact on either death-sentencing or penalty-trial-advancement rates. Baime Report II at 45. Judge Baime described "the steps taken by Professors Weisburd and Naus in analyzing our data base:"

(1) Divide the death-eligible universe into three groups: (1) cases involving defendants each with one case per defendant, (2) cases involving defendants with multiple cases in the death-eligible universe, and with several victims (simultaneous killings and killings on separate occasions), and (3) cases involving defendants each with one victim, but each defendant having more than one case in the data base (reversed convictions and death sentences resulting in retrials or pleas).

(2) Define the relevant set of variables. Here, as in the regression approach, we must begin with a specific set of measures that will be considered for sorting purposes.

(3) Introduce a combination of aggravating and mitigating factors. Only those factors having a statistically significant association with the outcome measured (advancement to penalty trial or death *177 sentence) and, when placed in combination, have a sufficient number of cases to permit analysis are to be used.

(4) Select combinations of the levels of the factors, i.e., all present, all absent, only factor "A" [^[8]] is present, only factor "B" is present. Determine which levels have different sentencing rates. Where neither factor is present, introduce another aggravating or mitigating factor that splits the data in a statistically significant way relating to sentencing.

(5) Analyze the data by stages. The data should be analyzed in terms of fractions of death-eligible cases advancing to penalty trial, fractions of death-eligible cases in which the death-penalty is imposed, and fractions of penalty trials resulting in imposition of a death sentence.

(6) Analyze the combinations by race of the defendant or victim.

(7) Identify categories with strong racial disparities. Conduct a precedent-seeking review of the cases in those categories by a panel of experienced judges to determine whether the defendants' culpability levels explain the outcome measured (progression to penalty trial or imposition of death sentence).

[Id. at 46-47.]

Judge Baime also suggests that the sorting approach could be simplified through use of the salient-factors categories used in individual proportionality review, i.e., that cases be classified by their most important aggravating attribute. Id. at 44-45. For the purposes of systemic proportionality review, each salient-factors-test category could be tested for the presence of a race effect. Id. at 45.

We adopt Judge Baime's recommendation to include the sorting approach when the AOC and the Special Master provide the next set of models to the Public Defender and the Attorney General. Explanatory case analysis may assist in classifying cases and determining whether there are racial disparities in any of the case categories. We recognize that the sorting approach can examine only two variables at a time and does not control for other variables that may influence death-sentencing decisions. Consequently, like simple bivariate analyses, the results may be confounded by unexamined factors. Also, sorting can be unwieldy because there are so many possible combinations of statutory and nonstatutory factors, although we understand that only those combinations that have a statistically significant correlation with the imposition of a death sentence will be analyzed. See id. at 45.

Because the Public Defender has raised valid questions about the sorting methodology, particularly in connection with the combinations of factors, we will proceed step-by-step. Exploratory case analysis is a new approach to systemic review. We leave its administration to the Special Master who, we expect, will draw on the expertise of his statistical consultants. We know that decisions must be made as the technique is refined. We will consider those decisions in our first proportionality review after the new system is put in place. In that review we will evaluate whether the sorting approach is potentially useful and whether it should or should not be continued as part of the monitoring system.

D.

The Public Defender also argues that the monitoring system should include judges' culpability ratings of defendants for comparison purposes. Judge Cohen had recommended the appointment of a panel of judges to periodically assess penalty-trial outcomes as a check against model outcomes. See Cohen Report at 45. Judge Baime nonetheless mentions using experienced judges in only two contexts: to assist in choosing a base set of variables for parsimonious models and to "determine *178 whether the culpability levels explain the outcome measured (progression to penalty trial or imposition of death sentence)" when exploratory case analysis reveals "categories with strong racial disparities." Baime Report II at 47. We accept Judge Baime's recommendations and limit the use of judge surveys to those contexts.

III

Bivariate analyses (including the sorting approach), and multiple regression analyses suffer from certain inherent weaknesses. As a consequence, we cannot rely on the results of any single methodology. If one approach appears to indicate a race effect, we cannot be confident that the effect derives from racial discrimination rather than from one or more statistical flaws. These considerations reinforce our earlier conclusion in Loftin II that a defendant must "relentlessly document[ ] the risk of racial disparity in the imposition of the death penalty" in order to establish disproportionality. 157 N.J. at 315, 724 A. 2d 129 (internal quotations omitted). In this context, application of that standard requires that we find converging outcomes produced by the application of a variety of techniques. Judge Baime's proposed multifaceted approach provides the means to carry out this task.

The dissent misperceives our intent by confusing the standard we apply with "defendant's overall burden of proof." Ante at 232, 757 A.2d at 183 (Long, J., dissenting). But the standard is keyed directly to the statistical method and has no meaning other than in that singular context. It does "not derive" from statistics, ibid., but, rather, represents our effort to deal with the limitations in the methods available to us in a way that makes sense.

Our goal in conducting systemic proportionality review is to be assured that there is no significant likelihood that race improperly affects the administration of capital punishment in New Jersey. In reaching that goal, we are cognizant of the nature of the evidence before us and of the need to tailor our response to the reliability of the " `science [we] invoke[ ].' " Loftin II, supra, 157 N.J. at 314, 724 A.2d 129 (quoting Cohen Report, supra, at 12). In Loftin II, we linked the relentless documentation of the risk of discrimination standard to the inherent weaknesses in the models, and we do so here. The reliability of the statistical methodology is not without doubt, and we cannot ignore that fact by accepting as dispositive admittedly flawed evidence. Judge Baime states that "an isolated finding produced by a single mode of analysis would not be enough," and that "[a] consistent finding using multiple methods, each subject to its own limitations, is more reliable than using only one method." Baime Report II at 37. He is, we believe, carefully explaining the kind of showing that is necessary in this unique setting. If that showing is made, we would know that there is a significant likelihood there are race effects in the death penalty system.

IV

We anticipate that the Special Master for proportionality review will annually update the database and the modes of analysis in the monitoring system, and that he will present a summary review of the models to the Public Defender and the Attorney General for their comments. We further anticipate that defendants will raise systemic-disproportionality claims in their appeals to the Court.^[9] If, after a rigorous review of the data, we conclude in one case that the defendant has not met his burden of establishing systemic disproportionality, we will rely on that conclusion in subsequent cases and until the next *179 yearly adjustment to the data. We note that this approach is similar to our previous practice wherein we rejected systemic-disproportionality claims on stare decisis grounds. See State v. Martini, 160 N.J. 248, 275, 734 A.2d 257 (1999) (Martini V); State v. Harvey, 159 N.J. 277, 319, 731 A.2d 1121 (1999) (Harvey III), cert. denied, ___U.S.___, 120 S.Ct. 811, 145 L.Ed.2d 683 (2000); Chew II, supra, 159 N.J. at 222-23, 731 A.2d 1070; State v. Cooper, 159 N.J. 55, 116, 731 A.2d 1000 (1999) (Cooper II), cert. denied,___ U.S. ___, 120 S.Ct. 809, 145 L.Ed.2d 681 (2000); DiFrisco III, supra, 142 N.J. at 210, 662 A.2d 442; Martini II, supra, 139 N.J. at 80, 651 A.2d 949.

Our implementation of this monitoring system does not imply either the presence or the absence of a race effect in capital sentencing. No reliable demonstration has been made to date that racial discrimination improperly affects the administration of the death penalty in this state. We nonetheless approve the system recommended by Judge Baime because "[w]hether in the exercise of statutory proportionality review or our constitutional duty to assure the equal protection and due process of law, we cannot escape the responsibility to review any effects of race in capital sentencing." Marshall II, supra, 130 N.J. at 214, 613 A.2d 1059.

V

Finally, there are three proportionality review cases before the Court this term in which defendants have alleged systemic racial discrimination. See State v. Harris, 165 N.J. 303, 757 A.2d 221 (2000); State v. Feaster, 165 N.J. 388, 757 A.2d 266 (2000); State v. Morton, 165 N.J. 235, 757 A.2d 184 (2000). Judge Baime has examined the available data under a hybrid of the methodologies used in previous proportionality reviews and under the approaches he recommends for use in future reviews. See Baime Report II at 48-66. He has concluded:

We find no reliable statistical evidence that the race of the defendant influences death sentencing either at the penalty trial stage or in the larger death-eligible sample of cases. Nor does the statistical evidence support the thesis that the race of the defendant affects which cases progress to a penalty trial. Further, the statistical evidence suggests that the race of [the] victim does not affect death-sentencing rateskillers of white victims are no more likely to receive the death penalty than killers of non-white victims. Finally application of our monitoring system discloses no consistent statistical evidence indicating that the race of the victim affects which cases progress to a penalty trial. However, some of the evidence in that respect is conflicting, and the issue should be revisited when the database increases.

[Id. at 66.]

We agree that the risk of racial discrimination has not been relentlessly documented by the models. We are concerned, however, that a bivariate analysis suggests that killers of white victims are more likely to be capitally prosecuted than killers of black victims. Because other methodologies do not corroborate those results, and because a bivariate analysis by definition does not account for other factors that may influence death-sentencing decisions, we cannot find "that race has operated as an impermissible factor in the imposition of the death penalty." Loftin II, supra, 157 N.J. at 346, 724 A.2d 129.

VI

We direct the AOC and the Special Master to implement the monitoring system we approve in this opinion.

LONG, J., concurring in part and dissenting in part.

No one can fault this Court for its commitment to systemic proportionality review. *180 We have not stinted in our efforts and have incorporated newly developed and more reliable models in our quest when prior models failed to do the job. This, our latest initiative, is certainly a step in the right direction and, with one exception, I concur in it. That exception is the requirement that a defendant present "relentless documentation" that race influenced his death sentence. Because I believe that the "relentless documentation" standard is without basis in law or reason, and that its application subverts the entire process, I dissent from its incorporation into the project.

I.

When we first took up the issue of racial discrimination and the death penalty, we focused on the "risk that defendants will be sentenced to death either because of their race or the race of the victim." State v. Marshall, 130 N.J. 109, 219, 613 A.2d 1059 (1992) (Marshall II). Because we were committed to determining whether such a risk was present, we became involved in what can only be denominated as a swamp of statistical data. Somehow as we attempted to wend our way through that swamp, our focus shifted from the possibility of racial discrimination to its likelihood. That we were waylaid is not a surprise. "What began as an optimistic experiment in social science, has become yet another example of the confusion and frustration that often surrounds social science in the courts." David Weisburd, Good for What Purpose?: Social Science, Race and Proportionality Review in New Jersey, Hebrew University of Jerusalem (visited July 21, 2000) (http://mishpatim. mscc.huji.ac.il/newsite/CrimeGroup/weisburd/ workpap.htm) (printed in Social Science, Social Policy and the Law, Russell Sage, R. Kagan, P. Ewick and A. Sarat eds. (1999)). Confusion was to be expected. What was not expected was our slow but steady movement from the notion of risk to the notion of certainty in terms of the quantum of evidence necessary to prove race effect. That shift was unfortunate, because our duty to act arises from merely knowing that the death penalty may be meted out according to race; it is not contingent on having before us overwhelming evidence that such an affront to our Constitution has occurred.

II.

The Court first alluded to "relentless documentation" in the earliest opinion concerning proportionality review. Marshall II, supra, 130 N.J. at 213, 613 A.2d 1059. In that case, we examined the Special Master's report that presented statistics on the rate at which Caucasian and African-American defendants are sentenced to death and the rate at which cases with Caucasian and African-American victims proceed to a penalty trial. Id. at 210, 613 A.2d 1059 (citing David C. Baldus, Death Penalty Proportionality Review Project: Final Report to the New Jersey Supreme Court (Sept. 24, 1991) (Baldus Report)). The Special Master found that in cases with mid-range culpability, African-American defendants may be at greater risk for receiving a death sentence than similarly situated Caucasian and Hispanic defendants. Baldus Report at 101. The Special Master also identified a "statistically significant" discrepancy in which cases proceed to a penalty trial based on the race of the victim. Id. at 103. The primary focus of the Baldus Report, however, was not discrimination. Marshall II, supra, 130 N.J. at 211, 613 A.2d 1059. Moreover, its findings were strictly preliminary. Ibid. The reliability of the findings was also compromised by the instability of certain statistical results and the limited database of cases. Id. at 213, 613 A.2d 1059.

In our analysis of the Baldus Report, we contrasted the Special Master's findings with the extensive evidence of race-based disparities in capital sentencing that had been presented in McCleskey v. Kemp, 481 U.S. 279, 328, 107 S.Ct. 1756, 1786, 95 L.Ed. 2d 262, 301 (1987). Marshall II, supra, *181 130 N.J. at 210, 213, 613 A.2d 1059. After expressing dismay at the racial bias demonstrated by the statistical data in McCleskey, we turned to the less-developed data in the Baldus Report and concluded that "we do not yet confront a record in which `[t]he statistical evidence... relentlessly documents the risk that [Marshall's] sentence was influenced by racial considerations.' " Id. at 213, 613 A.2d 1059 (citing McCleskey, supra, 481 U.S. at 328, 107 S.Ct. at 1786, 95 L.Ed.2d at 301) (Brennan, J., dissenting). In other words, Marshall's case was distinguishable from one in which the defendant's proof of racial bias was so definitive that our Constitution would compel us, without hesitation, to invalidate a death sentence. Ibid.

It is important to note what we did and did not say in Marshall II. It is certainly true that Marshall's evidence was weaker than McCleskey's evidence, but that alone is not determinative of whether Marshall raised a meritorious racial discrimination claim. Saying that we would invalidate a death sentence if a defendant presented evidence like McCleskey's is not the same as saying that only a record as extraordinary as McCleskey's would warrant a reversal. I read Marshall II as indicating only that, whatever the quantum of evidence necessary to prove a racial discrimination claim, it was surely met by Warren McCleskey. It did not suggest that only a case as frighteningly stark as McCleskey's would pass muster. McCleskey documented racial discrimination so clearly that even the opinion rejecting his claim acknowledged that the statistical data on racial disparities was so alarming as to threaten "the principles that underlie our entire criminal justice system." McCleskey, supra, 481 U.S. at 314-15, 107 S.Ct. at 1779, 95 L.Ed.2d at 293.

Our reference to "relentless documentation" in Marshall II only implied that Marshall's record was not as fully developed and did not present statistics as obviously problematic as those in McCleskey's record. See State v. Bey, 137 N.J. 334, 389, 645 A.2d 685 (1994) (Bey IV) ("Unlike the data in McCleskey, the Marshall data did not demonstrate that race played a constitutionally-significant role in death sentencing."). It did not suggest that every defendant must present evidence as glaring as McCleskey presented in order to have his sentence declared disproportionate.

Indeed, in the cases following Marshall II, we did not refer to a "relentless documentation" standard. See State v. DiFrisco, 142 N.J. 148, 210, 662 A.2d 442 (1995) (DiFrisco III); State v. Martini, 139 N.J. 3, 80, 651 A.2d 949 (1994) (Martini II); Bey IV, supra, 137 N.J. at 393-94, 645 A.2d 685. In Bey IV, for example, we rejected the defendant's racial discrimination claim because there were too few cases to support a reliable statistical evaluation of racial disparities in capital sentencing. 137 N.J. at 393, 396, 645 A.2d 685. That opinion made no mention of "relentless documentation" and, in fact, referred to the guiding standard from Marshall II as merely being whether a "substantial discriminatory effect" had been shown. Id. at 390, 645 A.2d 685 (citing Marshall II, supra, 130 N.J. at 211, 613 A.2d 1059).

Then last term, out of the blue, we adopted the "relentless documentation" standard. Loftin II, supra, 157 N.J. at 314-15, 724 A.2d 129. See also State v. Martini, 160 N.J. 248, 275, 734 A.2d 257 (1999) (Martini III) (citing as controlling precedent Court's rejection of racial discrimination claim in Loftin II for lack of relentless documentation); State v. Harvey, 159 N.J. 277, 319, 731 A.2d 1121 (1999) (Harvey III) (same); State v. Chew, 159 N.J. 183, 221-23, 731 A.2d 1070 (Chew II), cert. denied, ___ U.S. ___, 120 S.Ct. 593, 145 L.Ed.2d 493 (1999) (same); State v. Cooper, 159 N.J. 55, 116, 731 A.2d 1000 (1999) (Cooper II) (same). Loftin II was the first time that we stated that a capital defendant asserting a racial discrimination claim bears the burden of "relentlessly *182 document[ing] the risk of racial disparity in the imposition of the death penalty." 157 N.J. at 315, 724 A.2d 129. The enunciation of that rule was without explanation and without support in either federal or other state caselaw. New Jersey is the only state to use that standard.

I believe the "relentless documentation" standard was wrongly adopted and that its use is incongruous with our capital jurisprudence as well as the dissent from which the language was taken. See McCleskey, supra, 481 U.S. at 320-45, 107 S.Ct. at 1782-94, 95 L.Ed.2d at 296-312 (Brennan, J., dissenting).

Requiring a defendant not only to provide evidence of a substantial discriminatory effect in the application of the death penalty, but to "relentlessly document" that effect, is inconsistent with our "commitment to equality in the administration of justice." Bey IV, supra, 137 N.J. at 334, 645 A.2d 685. Just as the State has made the eradication of the "cancer of discrimination" one of its highest priorities, Dixon v. Rutgers, The State Univ., 110 N.J. 432, 451, 541 A.2d 1046 (1988), we have devoted significant efforts to identifying and eliminating racial disparity in capital sentencing. Ante at 209, 757 A.2d at 170; Loftin II, supra, 157 N.J. at 315, 724 A.2d 129; State v. Ramseur, 106 N.J. 123, 327, 524 A.2d 188 (1987). I find it incomprehensible to declare our commitment "to a course of review that is capable of discerning possible racial discrimination in our capital sentencing system," and then ask defendants to prove racial discrimination by an insurmountable standard. Loftin II, supra, 157 N.J. at 298, 724 A.2d 129 (emphasis added).

Declaring "relentless documentation" to be the legal standard also confounds the opinion from which it was borrowed. Justice Brennan used the "relentless documentation" language in McCleskey "to emphasize how conclusively McCleskey has... demonstrated precisely the type of risk of irrationality in sentencing that we have consistently condemned in our Eighth Amendment jurisprudence." 481 U.S. at 320-21, 107 S.Ct. at 1782, 95 L.Ed.2d at 297 (Brennan, J., dissenting). His point was that McCleskey not only met his burden of proof, but greatly surpassed it. Id. at 328, 107 S.Ct. at 1786, 95 L.Ed.2d at 302 (Brennan, J., dissenting). It was not Justice Brennan's aim to suggest that the quantum of proof provided by McCleskey should become the bar for every other aggrieved party to overcome. If the Court truly wishes to align itself with Justice Brennan's position, see Marshall II, supra, 130 N.J. at 215, 613 A.2d 1059, it should strive to meet the ideal outlined in his dissentthat of imposing strong safeguards against the risk of racially discriminatory capital sentencing. It does a disservice to Justice Brennan to cite his language, yet thwart its spirit.

III.

Our consideration of whether racial bias plays a role in deciding who lives and dies in our execution chamber "is at its core an exercise in human moral judgment, not a mechanical statistical analysis." McCleskey, supra, 481 U.S. at 335, 107 S.Ct. at 1789, 95 L.Ed.2d at 306 (Brennan, J., dissenting). Our evaluation of a capital defendant's racial discrimination claim must consider both statistical evidence as well as "our understanding of history and human experience." Id. at 404, 107 S.Ct. 1756 (Handler, J., dissenting) (citing McCleskey, supra, 481 U.S. at 328, 107 S.Ct. at 1786, 95 L.Ed.2d at 302 (Brennan, J., dissenting)).

In today's opinion, the Court has linked the relentless documentation standard to statistics although it is plainly not a term that arises out of the statisticians' idiom. To be sure, in his report to the Special Master, Dr. John W. Tukey stated that a "relentless" showing of racial discrimination could not reasonably be made based on a single isolated regression analysis. John W. Tukey, Report to the Special Master 12 (Jan. 27, 1997) (Tukey Report). However, as Judge Baime properly observed, *183 Dr. Tukey obviously adopted the word "relentless" and its concomitant standard from Justice Brennan's dissent in McCleskey. Report to the New Jersey Supreme Court: Systemic Proportionality Review Project 14 (Dec. 1, 1999) (Baime Report II). In so doing, Dr. Tukey apparently did not realize the import of Justice Brennan's dissent. One thing is clear: we did not derive the notion of relentless documentation from the statisticiansthey got it from us.

I certainly agree with Dr. Tukey and Judge Baime that a " `bouquet of analyses' constitutes a `reasonable strategy' in determining whether a race effect exists." Ibid. (quoting Tukey Report at 12). Clearly, we should strive to document a claim of racial discrimination by as many methods as practicable. However, that is quite distinct from raising a defendant's overall burden of proof, encompassing statistical and non-statistical evidence alike, to the relentless documentation level.

Although the Court today has reaffirmed that its focus is on the risk of race effect, that affirmation rings hollow in light of the burden it has cast on defendants. It is a burden of certainty and is inconsistent with our recognition of "the complete finality of the death sentence," Turner v. Murray, 476 U.S. 28, 35, 106 S.Ct. 1683, 1688, 90 L.Ed.2d 27, 36 (1986), and our unique "commitment to equality in the administration of justice." Bey IV, supra, 137 N.J. at 389, 645 A.2d 685. Clearly, not "every nuance of [a capital sentencing] decision [can] be statistically captured, nor can any individual judgment be plumbed with absolute certainty." McCleskey, supra, 481 U.S. at 335, 107 S.Ct. at 1789, 95 L.Ed.2d at 306 (Brennan, J., dissenting). As Justice Brennan wrote: "[T]he fact that we must always act without the illumination of complete knowledge cannot induce paralysis when we confront what is literally an issue of life and death." Ibid. Because we cannot know everything does not mean that we cannot know something; because we cannot do everything does not mean that we cannot do something.

I, therefore, urge the Court to abandon "relentless documentation" and to adopt a less burdensome quantum of proof as the standard a defendant must meet in proving a "substantial discriminatory effect." With that change, there is at least a possibility that proportionality review may serve as a safeguard against the random, arbitrary, and discriminatory application of the death penalty.

IV.

Intertwined with the reservations I harbor regarding the standard a defendant must meet in a systemic proportionality case, is the Court's rejection of the proportionality challenges of Ambrose Harris, Robert Morton, and Richard Feasterall white victim cases. Ante at 226, 757 A.2d at 179. Judge Baime has determined that, "race of victim may be statistically significant with respect to advancement to penalty trial," Baime Report II at 64, and that "[a]lthough the statistical evidence tends to trend against the thesis that white victim cases more often advance to a penalty trial than black victim cases, our finding is neither dispositive nor conclusive in that respect." Id. at 4. Judge Baime concluded that "application of our monitoring system discloses no consistent statistical evidence indicating that the race of the victim affects which cases progress to a penalty trial. However, some of the evidence in that respect is conflicting and the issue should be revisited when the database increases." Id. at 66. In other words, there is some question about race effect in white victim cases, although not enough to meet the relentless documentation standard.

The Court is comfortable to continue to "tinker with the machinery of death," Callins v. Collins, 510 U.S. 1141, 1145, 114 S.Ct. 1127, 1130, 127 L.Ed.2d 435, 438 (Blackmun, J., dissenting), when we do not yet fully understand the role of racial bias in the operation of our death penalty. I *184 am not. Executions should not be approved while we wait for statistics to be compiled to the point of relentlessness.

For adoption of recommendations, as modifiedChief Justice PORITZ and Justices O'HERN, STEIN, COLEMAN, VERNIERO (Except for Section V, in which Justice Verniero does not participate) and LaVECCHIA6.

Concurring in part, dissenting in partJustice LONG1.

NOTES

[1] "Multiple-regression analysis is a statistical tool used to describe the relationship between one or more independent variables (e.g., prior murder) and a dependent variable (e.g., the death penalty)." Loftin II, supra, 157 N.J. at 295 n. 8, 724 A.2d 129.

[2] Dr. Tukey, who passed away July 25, 2000, was a Professor Emeritus of Statistics at Princeton University. He was a 1973 recipient of the National Medal of Science.

[3] Dr. David Weisburd is the Director of and a Professor Min HaMinyan (Full Professor) at the Institute of Criminology at The Hebrew University. Since 1993, he has been a scientific advisor for the Death Penalty Proportionality Review Project.

[4] Dr. Joseph Naus is a Professor of Statistics at Rutgers University. Between 1981 and 1993, Naus served at various times as Chairman of the Rutgers University Department of Statistics.

[5] The implementation of these proposals requires that we continue to examine the larger universe of death-eligible cases and capitally prosecuted cases, as well as the more limited universe of death-sentenced cases. See Proportionality Review I, supra, 161 N.J. at 84, 735 A.2d 528 (rejecting proportionality universe of death-sentenced cases only).

[6] By way of example, if there is a statistically significant relationship between the race of defendant variable and the prior-murder aggravating-factor variable, the prior-murder variable would be included in the model.

[7] We also approve removing cases from the universe that no longer meet statutory requirements for capital murder and eliminating the Baldus factor-analysis methodology.

[8] In one combination, for example, factor "A" might be c(4)(c) (torture or depravity), and factor "B" might be c(5)(d) (diminished capacity).

[9] Judge Baime suggests that "some moratorium [on systemic proportionality would be appropriate] until the data base expands to a point that makes inquiry worthwhile." Id. at 33. For the present, we chose to add new cases on a yearly basis and to conduct reviews thereafter.

In Re Proportionality Review Project (II)

Related Cases