יום חמישי, 24 במאי 2018

How Many Embarrassing Mistakes can Sangero Make in Four Simple Sentences?



Apparently, after Sangero received from me original high-quality research demonstrating the use of simulations as a safety tool in criminal law, he chose to suggest in a book crediting him alone the use a certain simulation (“using strong computers”) as such safety tool. But in the four sentences he used to make this suggestion, he made highly embarrassing mistakes that indicate he did not really understand the nature of simulations and the studies of other researchers in this field.

In the past 25 years, I have been dealing with computer simulations in various fields almost every day. Among else, I published two papers containing theoretical calculations and original software simulations predicting the number of full matches and the number of partial matches between genetic profiles in databases and in the entire Israel's population.

While working on the book about safety in criminal law which was intended to be published by Oxford University Press I provided Sangero with many pages of new original research which includes, among else: that simulations are common in various study areas; references to a source demonstrating that the FDA considers simulations to be an important safety instrument; and description of the partial matches in many loci between genetic profiles found in the Arizona DNA database which only includes 65,493 genetic profiles; I provided citation to the original papers of Mueller and Weir, which I knew in-depth and described each of their analysis.  Among else, in order to learn the number of matches found in the US national database, I conducted - specifically for the book - simulation of a DNA database with 10,477,600 profiles - the size of the NDIS (the US national database) at the time. I also performed theoretical calculations relating to the number of matches in the entire USA population. I linked the study to the disputed conviction of John Puckett. In this manner, I demonstrated how simulation can be a prominent safety tool in criminal law (see Annexes at the end of this post)

The notion of using simulation as a safety instrument in criminal law, not only with respect to DNA evidence, is one of the most profound, new and original ideas I contributed to the draft of the safety book with Sangero. It is entirely my idea, and Sangero has no part in it. It is based on my knowledge and extensive experience of the enormous use of developing simulations and on my familiarity with the legal issues. I mentioned simulations as an important safety tool in criminal law in email correspondence and drafts of previous chapters exchanged with Sangero. Clearly, he should not have suggested using simulations as a safety tool in criminal law in publications bearing his name only. After all, this was not his idea.

Indeed, Sangero did not include my simulations and calculation in the book. But I was amazed to read in pages 114-115 four sentences that in which he suggests (“I contend”) performing certain simulations as an important safety tool in criminal law. These original four sentences appear in Annex A at the end of the post:

“A few researchers, including Bruce Weir and Laurence Mueller, have used simulations with databases in their research. But the databases available to these researchers are relatively small. I contend that conducting expanded simulations on the broadest national database (NDIS) would be an important safety tool for the criminal justice system.  Indeed, people should not be judged and sentenced to jail on the basis of theory and calculations (of the RMP) alone, when we can verify (using strong computers) the exact RMP for each number of loci in a profile”.
However, we can easily see that this section includes very embarrassing mistakes:

(a)       The sentence: “...when we can verify (using strong computers) the exact RMP for each number of loci in a profile” is clearly wrong and self-contradicting. Simulations like the one performed by Mueller do not concern real genetic profiles, but synthetic ones, produced based on existing theoretical models, alleles frequencies, and random numbers’ generator. Therefore, “the exact RMP for each number of loci in a profile” (of true genetic profiles) cannot be verified based on simulated synthetic profiles created of random numbers. This is Sangero’s figment of imagination, as he probably does not really understand what a simulation of a genetic profiles database is.

B.        For computing “the exact RMP” (to the extent such concept may be defined), from labeled profiles in large databases, there is no need for a computer as strong as the one required for simulations. All you have to do is calculate the allele’s frequencies in a relevant population. This is a simple enough calculation that any modest computer can complete in milliseconds. The claim that exact RMP calculation requires a strong computer is another one of Sangero’s figments of imagination.

C.         Weir did not perform a simulation, not with a small database and not with a large database. Weir did not perform any simulation. He made a theoretical calculation that is intended to predict the average number of full and partial matches in a database. He also compared his theoretical calculation with the number of partial matches he found in his database. Sangero claimed that Weir conducted simulation. This is another figment of his imagination which indicates he did not understand the difference between simulation and theoretical calculation, nor did he understand what Weir did.

D.         Laurence Mueller did not conduct simulation with a small (or large) database. This, simply because Mueller’s simulation itself generated synthetic data (randomly-generated, not real, profiles). Mueller simulated a database the size of the Arizona database based on a genetic model and tables of allele frequencies in the population, and counted the partial matches there. He attempted to compare this with the partial matches found in the true Arizona database. Therefore, this is another of Sangero’s figments of imagination.

Needless to mention that in page 161 of David H. Kaye’s paper, cited by Sangero (in footnote 68) to support his arguments there is no trace of such figments. In general, Kaye’s entire paper does not mention any simulation conducted by Bruce Weir. This, as noted above, simply since Weir did not perform any simulation. Simulation is mentioned in Kaye’s paper with respect to Mueller’s simulation, not in the context of ”exact” RMP calculation. Of course, the paper does not mention that Mueller conducted his simulation on a small database.

This leads us to the following serious conclusions:
A.       Sangero was exposed to my extensive study of simulations and matches in databases and fails to mention it anywhere in his book.
B.        He chose to write about a technical matter which he does not understand, thereby generating embarrassing mistakes. Beyond embarrassment, I find publication of erroneous academic content to lack academic integrity.
C.             Sangero cited a source that does not support the content he claims to have found there.
D.      In the same few sentences, Sangero demonstrated ignorance and misunderstanding of technical matters related to the papers he cited.

How then, did Sangero form the erroneous statement: “I contend that conducting expanded simulations on the broadest national database (NDIS) would be an important safety tool for the criminal justice system”? Obviously, the content of this statement is not the conclusion of any of the papers on which he supposedly relies. We can assume that Sangero was highly influenced by the study I gave him, specifically from the fact I performed simulation on a database as big as the NDIS. Apparently, he did everything he could to be the one suggesting (alone) the use of simulations as a safety tool in criminal law, although he does not understand the nature of simulations.

Annex A - Pages 114-115 in Sangero’s book subject of the above:


Annex B - General information about simulations I provided Sangero with.



Annex C: Distribution of genetic profile in my simulation:




Annex D: Results of simulation of a police investigation in an NDIS-sized databased on one profile between six loci (the number of loci in John Puckett’s case)



Annex E: Zoom-in on the Annex D chart. 



Annex F: Results of theoretical calculations and reference to John Puckett’s conviction. 





אין תגובות:

הוסף רשומת תגובה