Apparently,
after Sangero received from me original high-quality research demonstrating the
use of simulations as a safety tool in criminal law, he chose to suggest in a book crediting him alone the
use a certain simulation (“using strong computers”) as such safety tool. But in
the four sentences he used to make this suggestion, he made highly embarrassing
mistakes that indicate he did not really understand the nature of simulations
and the studies of other researchers in this field.
In
the past 25 years, I have been dealing with computer simulations in various
fields almost every day. Among else, I published two papers containing
theoretical calculations and original software simulations predicting the
number of full matches
and the number of partial matches
between genetic profiles in databases and in the entire Israel's population.
While
working on the book about safety in criminal law which was intended to be published by Oxford University Press
I provided Sangero with many pages of new original research which includes,
among else: that simulations are common in various study areas; references to a
source demonstrating that the FDA considers simulations to be an important
safety instrument; and description of the partial matches in many loci between
genetic profiles found in the Arizona DNA database which only includes 65,493
genetic profiles; I provided citation to the original papers of Mueller
and Weir, which I knew in-depth and described each of their analysis. Among else, in order to learn the number of
matches found in the US national database, I conducted - specifically for the
book - simulation of a DNA database with 10,477,600
profiles - the size of the NDIS (the US national database) at the time. I also performed theoretical
calculations relating to the number of matches in the entire USA population. I
linked the study to the disputed conviction of John Puckett. In this manner, I
demonstrated how simulation can be a prominent safety tool in criminal law (see
Annexes at the end of this post)
The
notion of using simulation as a safety instrument in criminal law, not only
with respect to DNA evidence, is one of the most profound, new and original
ideas I contributed to the draft of the safety book with Sangero. It is entirely
my idea, and Sangero has no part in it. It is based on my knowledge and
extensive experience of the enormous use of developing simulations and on my
familiarity with the legal issues. I mentioned simulations as an important
safety tool in criminal law in email correspondence and drafts of previous
chapters exchanged with Sangero. Clearly, he should not have suggested using
simulations as a safety tool in criminal law in publications bearing his name
only. After all, this was not his idea.
Indeed,
Sangero did not include my simulations and calculation in the book. But I was
amazed to read in pages 114-115 four sentences that in which he suggests (“I
contend”) performing certain simulations as an important safety tool in
criminal law. These original four sentences appear in Annex A at the end of the
post:
“A few researchers, including Bruce Weir and Laurence Mueller,
have used simulations with databases in their research. But the databases
available to these researchers are relatively small. I contend that
conducting expanded simulations on the broadest national database (NDIS) would
be an important safety tool for the criminal justice system. Indeed, people should not be judged and
sentenced to jail on the basis of theory and calculations (of the RMP) alone,
when we can verify (using strong computers) the exact RMP for each number of
loci in a profile”.
However,
we can easily see that this section includes very embarrassing mistakes:
(a) The sentence: “...when we can verify
(using strong computers) the exact RMP for each number of loci in a profile”
is clearly wrong and self-contradicting. Simulations like the one performed by
Mueller do not concern real genetic profiles, but synthetic ones, produced
based on existing theoretical models, alleles frequencies, and random numbers’ generator. Therefore,
“the exact RMP for each number of loci in a profile” (of
true genetic profiles) cannot be verified based on simulated synthetic
profiles created of random numbers. This is Sangero’s figment of
imagination, as he probably does not really understand what a simulation of a
genetic profiles database is.
B. For computing “the exact RMP” (to the extent
such concept may be defined), from labeled profiles in large databases, there
is no need for a computer as strong as the one required for simulations. All
you have to do is calculate the allele’s frequencies
in a relevant population. This is a simple enough calculation that any modest
computer can complete in milliseconds. The claim that exact RMP calculation
requires a strong computer is another one of Sangero’s figments of imagination.
C. Weir did not perform a simulation, not with a
small database and not with a large database. Weir did not perform any
simulation. He made a theoretical calculation that is intended to
predict the average number of full and partial matches in a database. He also
compared his theoretical calculation with the number of partial matches he
found in his database. Sangero claimed that Weir conducted simulation. This is
another figment of his imagination which indicates he did not understand the
difference between simulation and theoretical calculation, nor did he
understand what Weir did.
D. Laurence Mueller did not conduct simulation
with a small (or large) database. This, simply because Mueller’s simulation
itself generated synthetic data (randomly-generated, not real, profiles).
Mueller simulated a database the size of the Arizona database based on a
genetic model and tables of allele frequencies
in the population, and counted the partial matches there. He attempted to
compare this with the partial matches found in the true Arizona database.
Therefore, this is another of Sangero’s figments of imagination.
Needless
to mention that in page 161 of David H. Kaye’s paper, cited by Sangero (in
footnote 68) to support his arguments there is no trace of such figments. In
general, Kaye’s entire paper does not mention any simulation conducted by Bruce
Weir. This, as noted above, simply since Weir did not perform any simulation.
Simulation is mentioned in Kaye’s paper with respect to Mueller’s simulation,
not in the context of ”exact” RMP calculation. Of course, the paper does not
mention that Mueller conducted his simulation on a small database.
This
leads us to the following serious conclusions:
A. Sangero was exposed to my extensive study of
simulations and matches in databases and fails to mention it anywhere in his
book.
B. He chose to write about a technical matter
which he does not understand, thereby generating embarrassing mistakes. Beyond
embarrassment, I find publication of erroneous academic content to lack
academic integrity.
C. Sangero cited a
source that does not support the content he claims to have found there.
D. In the same few sentences, Sangero demonstrated
ignorance and misunderstanding of technical matters related to the papers he
cited.
How
then, did Sangero form the erroneous statement: “I contend that conducting
expanded simulations on the broadest national database (NDIS) would be an
important safety tool for the criminal justice system”? Obviously, the
content of this statement is not the conclusion of any of the papers on which
he supposedly relies. We can assume that Sangero was highly influenced by the
study I gave him, specifically from the fact I performed simulation on a
database as big as the NDIS. Apparently, he did everything he could to be the
one suggesting (alone) the use of simulations as a safety tool in criminal law,
although he does not understand the nature of simulations.
Annex
A - Pages
114-115 in Sangero’s book subject of the above:
Annex B - General information about simulations I provided Sangero with.
Annex C: Distribution of genetic profile in my
simulation:
Annex D: Results of simulation of a police
investigation in an NDIS-sized databased on one profile between six loci
(the number of loci in John Puckett’s case)
Annex E: Zoom-in on the Annex D chart.
Annex F: Results of theoretical calculations and
reference to John Puckett’s conviction.
אין תגובות:
הוסף רשומת תגובה