Dear Ken,
I’ve long been an admirer of your work (and of you personally), and still am. Your confident declarations about the lack of evidence for Hans Niemann’s having cheated against Magnus Carlsen in the Sinquefield Cup or even, as far as you can tell, in any games since 2020, has played an important role in my assuming Niemann’s innocence not just as a legal and moral matter, but as a matter of fact.
But I do have some questions, not all of which are original to me. It’s quite possible that you’ve taken all of these concerns into account already; if so, fantastic. If not, maybe (where relevant) they can help your model improve in the future. At any rate, here goes:
Have you had people try to cheat, to see if they could elude detection? In light of how much information you’ve shared over the years about how your model works, a clever cheater could generate a lot of ideas about how to slip through the cracks. It would make sense for you to play a match with someone, or find volunteers to do so, where the players agree to use engines while trying to avoid detection from your model.
In your evaluation that Niemann played a good but not particularly special game at the Sinquefield Cup against Carlsen, did that include the opening through …Be6 or not? He claimed to have looked at the variation that very morning, and that’s certainly possible. But assuming for the sake of argument that he didn’t, how many standard deviations from the norm would his play have been in that case?
How many engines do you check? It’s obvious that constant cheating from any engine would be caught, but would it be appreciably harder for someone to be caught if they used one of the relatively weaker engines periodically, as opposed to the latest and greatest versions of Stockfish, Leela, or Komodo?
Carlsen has lost many games in his career, sometimes, though rarely, to players who aren’t in his peer group. (Or what passes for his peer group - most of the time, he’s in a class by himself.) But he has never reacted in this way before. Do you have any thoughts about what might be different in this case? Is it just a reaction, or possible overreaction, to Niemann’s comments after the Sinquefield Cup win and his earlier win in the first rapid game from their Miami match? Was there something that Niemann played that had a computer “style” or “feel”, even if the move or moves weren’t the top choice of the engines?
If you are privy to the goings-on in Chess.com’s recent ban of Niemann, what was your model’s verdict on the more recent games (if there are any) where they claim he cheated?
Did your model come up with the correct verdict in the events/games where Niemann was caught when he was 16 and when he was 12?
You’re a very smart person trying to do something good. Unfortunately, we all know that there are other very smart people who are on the other side of the ethical fence. While I assume your model does an excellent job of catching dumb-to-moderately smart cheaters, I wonder if your openness about its workings has given the smarter and more dedicated cheaters enough info to find the loopholes. (I also recognize the self-defeating nature of this line of reasoning combined with my questions above. But I assume you can answer them and then fix the whole collection of vulnerabilities.)
I hope you’re well, and congratulations to your Buffalo Bills, who are the early favorites to win the Super Bowl this year.
Greetings, Dennis! Here are brief answers:
1. Yes, a large tournament staff once tried this as a trial, but the results were pell-mell and gave too little data. A few games were lost before turn 9, forgetting I only start at turn 9 anyway.
2. Although the novelty per ChessBase Cloud + Mega was 10.Qxd4, I dropped turns 1-14 until MC's long think at turn 15. Redoing from turn 10 made little difference.
3. No one has been able to make a me a version of Leela that runs *in batch mode* on Linux/UNIX like Stockfish and Komodo/Dragon can. My requests for such compiles of Fritz and Houdini have been turned down. My model is not predicated on the "which engine?" question anyway.
4. The one speculation I have ventured in public (in Chris Chabris's item on Facebook and some interviews) is that insofar as the one unusual element was what Niemann called the "ridiculous miracle" of his having previewed the variation with 13...Be6! shortly before the game (the cheating past having been well supposed), more causative weight should be given to iit. Apart from my speculation, I think Tyler Cowen has the right take at https://marginalrevolution.com/marginalrevolution/2022/09/chessdrama-splat.html As for "feel", I cannot judge---except to say that my model finds the game relatively clear-cut, in accord with what several GMs have opined.
5 and 6 are private. On your last paragraph, part of the reason it applies is that I have deliberately kept things simple. The original name "Fidelity" for my site https://cse.buffalo.edu/~regan/chess/fidelity/, besides being a pun on FIDE and (playing in good) faith and being a synonym for concordance, referenced my original intent to employ distributional distance measures, of which /fidelity/ is one. Indeed, quantum fidelity was the basis of the statistical test by which Google claimed to have achieved "quantum supremacy" in October 2019, as I covered in the article https://rjlipton.wpcomstaging.com/2019/10/27/quantum-supremacy-at-last/ But I quickly switched to a simple "Bernoulli trials" model, abiding a necessary post-hoc adjustment for chess moves not being fully independent, because although maybe not as sharp it is robust and not cranky. I also purposely avoid any use of minimization over the small data of a player's own moves, again trading sharpness for avoiding greater issues of "what are the error bars on your error bars?" Nor do I use sequential information or analyze differences between engines at a level that would fuel a dedicated test---two things I have encouraged online platforms with their greater human and computer resources to do. The point is, these ideas are out there for someone able to bring them into play (plus something I mentioned on the podcast with Altucher is my own venture into sequencing and a form of minimization-on-small-data I have less objection to), so your putative smart cheater would have more to consider.