Wilson v. State

JOHNSON, J.,

filed a concurring opinion.1

Numbers flummox many of us, and as a result, numerical evidence can become confusing and misleading. This is particularly true if the evidence that inserts numbers into the legal equation is new and marginally understood. We are now at that point with evaluation of DNA evidence. The experts who come to court to present DNA evidence frequently come up with probabilities 2 of such great magnitude that they *487are patently unsupportable to those who understand numbers and very impressive to those who do not. The following discussion assumes that the only evidence linking the defendant to the offense is DNA.

The first probability mistake that experts make is to treat all variables3 as independent. A variable is independent if, when the numerical value of the variable changes, no other variables necessarily change also. In a rectangle, height and width are independent variables; changing the height does not necessarily change the width. A dependent variable is one that necessarily changes in response to a change in another variable. The area of a rectangle is the product of multiplying height times width and is a dependent variable; if the height or width changes, the area necessarily changes.

The probability of two independent variables occurring at the same time is the product of the probabilities for each: if each variable occurs one time in ten, the probability of both variables occurring at the same time is 1/10 x 1/10, or 1/100 = one in one hundred. Probabilities decrease rapidly with the number of variables. With only six independent variables that have individual probabilities of one in ten, the probability of all occurring at once is one in a million. The probability decreases even more rapidly for variables that occur less often than one in ten times; for variables that occur once in a hundred times, the probability of one in a million requires only three variables.

A problem arises when dependent variables are treated like independent ones. In a California case from 1964,4 The People v. Collins, an older woman returning from the grocery was accosted from behind and did not see her attacker, who took her purse. She did see a young woman running from the scene and described her as weighing about 145 pounds, wearing “something dark,” and having blonde hair that was lighter than Janet Collins’s hair was at the time of trial. A man who had been nearby reported seeing a young white woman with a blonde ponytail running from the direction of the robbery, but did not see the offense occur. He described the woman as slightly over 5 feet tall, of ordinary build, wearing a dark blonde ponytail and dark clothing. He also reported that the young woman got into a yellow or partly yellow car driven by a black man who had a beard and mous-tache.

The defendants, an interracial couple, were arrested and charged because they “sort of’ matched the physical descriptions, owned a car that was at least partly yellow, were newly married, jobless, and broke. They denied involvement and provided an alibi. At trial, the state presented a mathematics instructor from a state college as their expert witness on the probability that the defendants were guilty. The -witness refused to assign probabilities to the various factors chosen by the prosecutor, so the prosecutor proposed “probabilities” of his own; one in three young women were blonde, one in ten wore a ponytail, one in ten cars was at least partly yellow, one in ten black men had a beard, one in four men had a mous-tache, and one in a thousand couples was interracial.5 The prosecutor then multi*488plied his own “probabilities” together and calculated that the profile would match one in 12 million couples.6 • The Collinses were convicted of the robbery based on the claimed probability that no other couple in California matched the reported description of the perpetrators.

In reversing the conviction in 1968, the California Supreme Court noted that the evidence was presented “[without presenting any statistical evidence whatsoever in support of the probabilities for the factors selected.” Collins, 438 P.2d at 36 n9. The Court also noted that all of the selected factors were treated as independent and factually true and that there was no adjustment for dependent variables or the possibility of mistake. Id. at 39. Most men with beards also have moustaches, so a correction for the overlap was necessary. Id. at 39 nl5. The Supreme Court also noted that the witness had failed to consider other plausible possibilities, for example, the young woman was a light-skinned African-American with bleached hair.7

Some of the bad guesses increased probability, others decreased it, but the expressed probability itself was not reliable. “Mathematics, a veritable sorcerer in our computerized society, while assisting the trier of fact in the search for truth, must not cast a spell over him.” Id. at 33. The Collins Court also noted that, even if one accepted the prosecutor’s guesses, appropriate calculations indicated that there was a substantial probability that more than one other couple matched the selected factors. Id. at 43.

Multiplying the probabilities of all variables together, without regard to dependence, leads to a probability that is too small, often greatly too small. For example, variable A has a probability of occurring one time in one thousand, and always occurs with variable B. B always occurs with A. A and B are dependent variables, and the probability that A and B will occur together is still one in one thousand, because they never occur separately. If the probabilities of a random match to A and B are improperly multiplied together, the probability of both A and B occurring together is 1/1000 x 1/1000 = 1/1,000,000, or one in a million, and is one thousand times too small. The numbers soon get out of hand. One expert testified that a given profile occurred one time in 2.578 sextillion (2.578 followed by 21 zeroes),8 a number larger than the number of known stars in the universe (estimated at one sextillion).9 The population of Earth is *489about 6.5 billion, so anything in the sextil-lion range is more than one trillion times larger than the population of Earth. It is no wonder that, faced with numbers too large to conceive, some juries simply dismiss DNA evidence as not helpful, not persuasive, or not credible. The other side of the coin is a jury that accepts any claim about probabilities “because it’s DNA.” They have all seen “CSI: Crime Scene Investigation” or “NCIS” and “know” that DNA is infallible.

The reality of the human genome is that some genes are recessive and are therefore dependent on other genes for expression. For example, blue eyes occur only if both parents pass on the gene for blue eyes. If one parent passes a gene for brown eyes, the probability is high that the child’s eyes will not be blue. Many genes are intertwined to some degree; blue eyes often accompany blonde hair, but some Irish have strikingly blue eyes and black or red hair. It is very difficult to determine the probability of a given characteristic because we do not have a map of how each gene affects every other gene. It may be that one in ten people have blue eyes and one in twenty is (really) blonde, but because we know that the probability of blue eyes increases if the person is blonde, simply multiplying 1/10 x 1/20 will not tell us the true probability of having both blue eyes and blonde hair; the calculated value will be too low. If you are Japanese, there is close to a one hundred percent probability that you will have dark hair and brown eyes. When the probability of a person of Japanese ancestry having both dark hair and brown eyes is calculated, we must take that into account. All of these characteristics are controlled by DNA, and the same rules that apply to any probability calculation also apply to calculating the probabilities of a DNA match; if areas A and B of each DNA sample match, but A and B always occur together, A and B must be treated as one area of matching, not two.

In this case, the claim at trial was that only 1 in 2,083 persons of Hispanic descent would match appellant’s DNA profile. How was that number calculated? We do not know. An even more basic question is: what makes one “Hispanic”? Appellant’s surname is Wilson, a name not ordinarily thought to be Hispanic. May we assume that appellant’s father was not Hispanic? Part Hispanic? Part African-American? Part Western European? Eastern European? Asian? Were the differing probabilities of a non-Hispanic gene pool taken into account in calculating probabilities? How are probabilities for racial groups calculated in general? How do we calculate reasonably accurate probabilities for people like that famous self-described “Ca-blinasian,” Tiger Woods?10 We do not know.

Secondly, a statement that the DNA profile of the defendant occurs in only one in one million members of a given racial group means just that: if the reference group is one million individuals, one person will match; if you have two million individuals in the reference group, two individuals will' match, and so on. Unfortunately, most people translate that statement, that only one person in a million matches the profile, to mean that there is one chance in a million that the defendant is not guilty. Statistically, however, a city with ten million members of the reference group will include ten individuals who match the profile, and thus, there is only a one in ten chance that the defendant is guilty.

*490In this case, the offense occurred in Dallas County. Assuming a county population of one million and an Hispanic population of thirty percent, 300,000 Hispanics live in Dallas County. DNA is very reliable as to gender, so if we assume equal occurrence of gender, there are 150,000 Hispanic males in Dallas County. Dividing 150,000 by 2083, we find that, statistically, 72 men in Dallas County fit the profile. But do we really know that the perpetrator lived in Dallas County? Dallas is part of the Metroplex, which has a population of more than 3 million, and we are a mobile society. In the Metroplex, there are 216 statistical men who fit the profile. Could he be visiting from Houston (216 men) or Chicago (360 men)? Assuming that Mexico City is close to 100% Hispanic, 5,281 men in that city alone match the profile. How many male residents of Mexico City (or Guadalajara, Ciu-dad Juarez, Acapulco, etc.) were in Dallas County at the time of the offense? Even if we restrict the possibilities to Dallas County, a stated probability that only one Hispanic in 2,083 matches this profile does not mean that there is one chance in 2,083 that he is not guilty; it means that the probability is one in 72 that he is guilty.

Finally, trial attorneys need to understand how to validate (or repudiate) DNA evidence. They must begin with the reported match. Prosecutors may leap from a lab report saying that the samples match to an immediate conclusion that the defendant is guilty, thus the origin of the term “prosecutor’s fallacy.”11 But is it really a match? How many areas of the DNA strands coincide? 12 How big is the specified error range? Just as for fingerprints, the more areas that match, the more likely that this is truly a match.13 If there ap*491pears to be a match, advocates then need to discover how often the laboratory that did the DNA testing produces a false positive.14 Part of the problem for the State of California in the O.J. Simpson trial was the revelation that the state’s testing laboratory had a false positive rate of 1 in 200, that is, one match in 200 was not, in fact, a match, thus opening the door for the defense to argue that the sample really did not match Simpson’s DNA.15

Once it is established by the state that the two samples do, in fact, match within an appropriate margin of error, the next question is whether the defendant is the source. The random probabilities that are routinely used are valid only for unrelated persons. The closer the relative, the greater the number of areas on the DNA strand that will match. Identical twins have identical DNA. Parents will share DNA similarities with their children, and siblings will have many commonalities. Double first cousins will also have many commonalities; first cousins will have fewer commonalities, yet still a significant number. The only living male in a given family has a high probability of being the source, but a family with only sons over several generations will present a greater challenge. If the state shows' that the defendant is the source, there is one more hurdle; can the defendant be placed at the crime scene in the appropriate time frame?

DNA is durable; it does not evaporate or dissipate, and the time at which it was deposited on a surface cannot be directly determined. If the DNA sample was retrieved from a place where the defendant lives, works, or visits frequently, it is probably not probative, as one would expect to find the defendant’s DNA in those places. Sex crimes aside, if the sample is from a place where the defendant should not have been, the DNA, by itself, can confirm only that he was there at some time and cannot, by itself, prove conclusively that he was there at the time of the crime. By the same token, DNA cannot prove that the defendant was not there at the time of the crime.

DNA analysis is a powerful tool in determining guilt or innocence, and usually there is other evidence that links the defendant to the offense, but we must remember that DNA analysis is performed by humans and is not foolproof, nor are the conclusions drawn from the analysis always correct. Only if all the prerequisites for reliability — true match, correct source, and presence at the crime scene in the applicable time frame — are satisfied can society have confidence that the DNA evidence is, in and of itself, strong enough to support a conviction.

I concur in the judgment of the Court.

. In response to a motion for rehearing, the Court has withdrawn its original opinion and substituted a corrected opinion. This concurring opinion accompanied the original opinion of the Court and is reissued in order to accompany the corrected opinion.

. In this context, a probability is the likelihood that the DNA of a randomly selected person will match the DNA of a known sample.

.In a defined system, a variable is a characteristic that has a numerical value that changes.

. 68 Cal.2d 319, 66 Cal.Rptr. 497, 438 P.2d 33 (Cal.1968) (reversed and remanded for a new trial).

.In 1964, the probability of a young woman having a ponytail was probably higher that *488one in ten, but even in California, the incidence of natural blondes was likely to be less than one in three. Except for taxis, yellow cars are uncommon; probably, many fewer that one in ten non-taxi cars were yellow. In 1964, the number of interracial couples was probably far fewer than one in a thousand.

. He then performed the "prosecutor’s fallacy.” Post, infra.

. Other possibilities include: the young woman wore a blonde wig; she was running, not because she ha d robbed the older woman, but because she was late; she and the driver were merely car-pooling and were not a couple; the perpetrators were not from California.

. See, e.g. Ex parte Russeau, No. WR-61,389-01 (Tex.Crim.App. writ application pending) (trial court’s findings of fact and conclusions of law, p. 38) ("1 in 115.6 quintillion for Caucasians, 1 in 10.28 quintillion for blacks, and 1 in 1.578 sextillion for Hispanics.”); Benford v. State, 2005 WL 240611, 2005 Tex.App. LEXIS 840, No. 03-02-00686-CR (Tex.App.-Austin, delivered February 3, 2005, unpublished, pet ref d) ("1 in 8.77 trillion [in the Caucasian population], 1 in 7.46 trillion for African-Americans, and 1 in 4.52 trillion for Hispanics”).

. http://imagine.gsfc.nasa.gov/docs/ask^as-tro/answers/ 970115.html. “We believe that there are on the order of 10 21 stars in our universe.”

. Mr. Woods’s father has Caucasian, African-American, Asian, and First People ancestors. His mother is Thai.

. "The attorney and social psychologist William Thompson and his student Edward Schumann seem to have coined the term 'prosecutor’s fallacy.’ ” Gerd Gigerenzer, Calculated Risks 154 (2002).

. DNA analysis scans fragments in 13 areas of high variability. High probability of a true match is assumed if four to five areas match. See www.ornl.gov /sci/techresources/Hu-man_Genome/home.shtml, a site funded by the United States Department of Energy Office of Science. Given that only the individual whose DNA it is will match all 13 tested areas, using an explanation of the probability of a random match that is expressed in terms of how many areas match and the closeness of the matches instead of using statistics may be more persuasive to the finder of facts. Many people feel as Benjamin Disraeli did when he said, "There are three kinds of lies: lies, damn lies, and statistics.”

. The form of the DNA molecule is a double helix, similar in structure to a loosely coiled ladder with two long, continuous "legs" and many "rungs.” Nuclear DNA has four nitrogenous bases, referred to A, C, G, and T. Each base matches with only one other base, but because the sequence is directional, the order of the bases matters: A + T and G + C are different from T + A and C + G. The bases are arranged in varying sequences from rung to rung and the sequencing order encodes genetic information. By way of simplistic explanation, in a laboratory analysis DNA strands are broken into fragments, which are separated by electrophoresis, placed on a nylon membrane, and x-rayed. The result is an autoradiogram, a series of bands that resemble the bars of a UPC, but are less well delineated. Because the bands are not sharply defined, a match may be found when the bands in two samples align within a stated margin of error. As the allowed margin of error becomes larger, the statistical reliability of the match becomes smaller. The nuclear DNA strands that are analyzed are "non-coding” sections, that is, nuclear DNA that has no known function in the production of protein. Non-coding nuclear DNA has less selection pressure than coding nuclear DNA and therefore shows higher variability among individuals. Higher variability makes it easier to exclude or include individuals. Mitochondrial DNA goes even further toward identifying individuals because it is separate from nuclear DNA and different from it in that mitochondrial DNA comes solely from the mother and is therefore a clone of her mitochondrial DNA rather than a *491blending of the nuclear DNA from both parents. One of its uses is to differentiate between person who have different mothers, such as paternal half siblings, or cousins, or fathers and sons. The Y chromosome, present only in males, is passed directly from father to son and may offer similar information for males.

. A false positive is a test result that indicates that a factor is present, but in fact it is not. E.g., a blood test indicate that the person is HIV positive, but in fact the person is not infected.

. Gerd Gigerenzer, Calculated Risks 167 (2002).