Introduction

Deep searches with far look-ahead continue to fascinate the whole computer-chess community because they promise to achieve ever stronger play for all decent programs. Thompson pioneered the scientific investigation of the general relation between search depth and playing strength in chess programs by his famous self-play experiments with the then reigning World Computer Chess Champion BELLE in the early 1980s [57,58,206]. His experiments led to the surprising result that the playing strength of BELLE increased almost linearly with search depth by roughly 200 rating points per ply for full-width searches of fixed depths ranging from 3-9 plies. Several other researchers later confirmed Thompson's findings by self-play experiments with their own chess programs HITECH, PHOENIX, THE TURK, and ZUGZWANG [28,121,158]. In Figure 1 of their article [121] Junghanns et al. showed that in all these cases the winning percentages of the program versions searching one ply deeper than their direct siblings remained range-bound between 70%-80% with no clearly visible, let alone statistically significant, average downward trends at the end of the 9-ply data curves. We thoroughly discuss the above mentioned self-play research and further related works in Section 1.5.3.

In 1985 Newborn [163] introduced a technique different from self-play in order to study the general relation of search depth and playing strength in chess programs. The rationale of Newborn's novel approach sprang from the assumption that new best moves as discovered by chess programs at higher search depths ought to represent better choices than the best moves preferred at shallower depths. To this end, Newborn tracked the according behaviour of BELLE for searches to fixed depths of 11 plies on a set of 447 test positions from real games. The close correlation of his data with Thompson's earlier self-play results made Newborn propose an interesting hypothesis concerning the playing strength of chess programs that we elaborate on in Section 1.5.4.

Unfortunately, Newborn's new experimental methodology of observing the behaviour of deep searches did not gain much popularity. Nobody followed his initial example until Junghanns et al. let PHOENIX and THE TURK search roughly 1,000 positions from self-play games to fixed depths of 9 plies while recording all the new best moves beside other information [121]. In 1997 Hyatt and Newborn himself then conducted another behavioural experiment with Hyatt's chess program CRAFTY searching 347 new test positions to a fixed depth of 14 plies each [114]. This experiment revealed the astonishing fact that the rate of new best moves as chosen by CRAFTY at high search depths of 9-14 plies remained quite steady around 15%-17% on average and hardly decreased anymore. In order to check whether this exceptional behaviour may actually hold for modern chess programs in general or if it was only an artifact of the specific CRAFTY implementation as used by Hyatt and Newborn in 1997, we repeated their experiment with our own chess program DARKTHOUGHT [96].

DARKTHOUGHT is a fast yet sophisticated alpha-beta searcher using PVS/NEGASCOUT [51,174] with state-of-the-art enhancements like extended and normal futility pruning [95,182,216], internal iterative deepening [7,184], dynamic move ordering (history+killer heuristic) [3,76,180,183,191], recursive null-move pruning [20,62,77], selective extensions [7,17], interior-node recognizers [94], and an extended transposition table [161,191]. DARKTHOUGHT routinely reaches 200K nodes per second (nps) and search depths of 11-13 plies in normal middlegame positions at tournament time-controls on a 500MHz DEC Alpha-21164a PC164 workstation. The search speed often doubles in endgame positions and peaks at 750K nps on the aforementioned hardware. Although CRAFTY and DARKTHOUGHT share many basic design principles, they also feature substantial differences regarding such fundamental issues as node expansion, position evaluation, and search strategy. In particular, DARKTHOUGHT seems to be much more selective than CRAFTY in the full-width part of its search (e.g. extended futility pruning). Besides, it relies on lazy alpha-bounding for the top-level search at the root node [96] and thus delays the full resolution of new best moves as long as possible (preferrably to the next iteration). Last but not least, DARKTHOUGHT does not employ fractional extensions which are among the specialities of CRAFTY. Overall we deem it fair to say that both programs sufficiently resemble each other and at the same time differ enough from one another in order to make comparative evaluations of their search depths meaningful.

Hyatt and Newborn relied on more than 20 volunteers to execute the 14-ply searches for the 347 test positions on heterogeneous PC-class machines at speeds of 25K-100K nps. Despite this highly distributed setting, their whole experiment still took about three weeks to complete. We succeeded in repeating the whole experiment in only four and a half days with DARKTHOUGHT running at an average speed of 250K nps each on 8x 500MHz DEC Alpha-21164a PC164 workstations. During the searches we let DARKTHOUGHT record more information about the new best moves at every iteration than what Hyatt and Newborn originally studied for their experiment in 1997. With the help of an automated Perl script we also derived the equivalent additional data for CRAFTY from Hyatt and Newborn's publicly available result file. We present a detailed discussion of our new experimental results for DARKTHOUGHT and CRAFTY in Section 1.5.6.

Created by Ernst A. Heinz, Thu Dec 16 23:28:11 EST 1999