``Best Change'' Rates for All Test Positions

Table 1.4: ``Best Change'' Rates of Belle, Crafty, and DarkThought.

Search	Belle	(Standard)	Crafty	(Standard)	DarkThought	(Standard)
Depth	1985	(Error)	1997	(Error)	1998	(Error)
2	- -	- -	38.78%	(2.63%)	35.28%	(2.58%)
3	- -	- -	36.73%	(2.60%)	39.65%	(2.64%)
4	33.10%	(2.23%)	30.61%	(2.49%)	31.78%	(2.51%)
5	33.10%	(2.23%)	30.32%	(2.48%)	29.45%	(2.46%)
6	27.70%	(2.12%)	27.41%	(2.41%)	24.49%	(2.32%)
7	29.50%	(2.16%)	24.49%	(2.32%)	21.28%	(2.21%)
8	26.00%	(2.07%)	22.45%	(2.25%)	25.07%	(2.34%)
9	22.60%	(1.98%)	^18.37%^	(2.09%)	21.57%	(2.22%)
10	^17.70%^	(1.81%)	17.20%	(2.04%)	24.20%	(2.31%)
11	18.10%	(1.82%)	16.62%	(2.01%)	^17.49%^	(2.05%)
12	- -	- -	16.91%	(2.02%)	15.45%	(1.95%)
13	- -	- -	14.58%	(1.91%)	16.62%	(2.01%)
14	- -	- -	15.45%	(1.95%)	_13.70%_	(1.86%)

Table 1.4 summarizes the ``Best Change'' rates BC(i) and their estimated standard deviations = standard errors $s(i) = \sqrt{BC(i) * (1 - BC(i)) / 343}$ as observed in our experiment for all 343 corrected test positiosn at search depths of 2-14 plies. These percentages of DARKTHOUGHT closely resemble the according numbers of CRAFTY from 1997 for the same set of positions and search depths as well as the numbers of BELLE from 1985 for a different set of 447 test positions and search depths of 4-11 plies [163]. For the convenience of the reader and in order to make our subsequent discussions more transparent, we also include the numbers of BELLE and CRAFTY in Table 1.4 showing them side-by-side with our own new data of DARKTHOUGHT.

The table illustrates that BELLE, CRAFTY, and DARKTHOUGHT feature very similar ``Best Change'' behaviours on average. This is quite surprising if you consider the substantial differences of the three programs regarding such fundamental issues as node expansion, position evaluation, and search strategy. The experimental results of DARKTHOUGHT support the pioneering findings of Hyatt and Newborn at high search depths of 12-14 plies in particular. For these search depths the ``Best Change'' rates of both CRAFTY and DARKTHOUGHT stayed range-bound around 16%. As a tentative conclusion we conjecture that the three columns of Table 1.4 taken together provide convincing empirical evidence that the very gradual decreases of the ``Best Change'' rates at high search depths are not only artifacts of specific implementations but rather represent an actually general phenomenon of chess programs which rely on depth-first alpha-beta search with iterative deepening. Despite the overall similarities, however, two numbers of DARKTHOUGHT roused our attention because they differ notably from those of BELLE and CRAFTY.

Drop below 20%.

The ``Best Change'' rates of both BELLE and CRAFTY dropped at least one iteration earlier to 17%-18% than that of DARKTHOUGHT (see numbers marked by ^** in Table 1.4) which stayed well above 20% until iteration #10 inclusively. We attribute the more unstable behaviour of DARKTHOUGHT to the increased selectivity of its search as compared with the two other programs. While the standard errors still leave some room for doubting the statistical significance of the drops below 20%, Appendix 1.5.10 nullifies the corresponding concerns by deriving 80%-confident and 90%-confident bounds on the ``Best Change'' probabilities of BELLE, CRAFTY, and DARKTHOUGHT.

Iteration #14.

The ``Best Change'' rates of CRAFTY remained surprisingly constant at roughly 15%-17% from iteration #9 onwards. DARKTHOUGHT only behaved like this from iteration #11 to iteration #13 and then recorded another drop of its ``Best Change'' rate to 13.7% for the final iteration #14 (see number marked by _** in Table 1.4). This constitutes the first experimental result reported so far which hints at the validity of the intuitive notion that the average ``Best Change'' rates should taper off even further at search depths beyond 14 plies. The experimental results of CRAFTY do not really support this notion because the ``Best Change'' rate of CRAFTY does not decrease but rather increases again for iteration #14. Unfortunately, it remains totally unclear whether the special behaviour of DARKTHOUGHT signals a consistent trend towards lower ``Best Change'' rates at higher search depths than 14 plies or if it is just a fluctuation at the end of our data curve. The statistical calculations of Appendix 1.5.10 do not suffice to discriminate the outstanding data point because the 80%-confident and the 90%-confident upper bounds on the ``Best Change'' probability of DARKTHOUGHT in iteration #14 equal 15.34% and 16.26% respectively. Thence, new experiments with search depths of at least 16 plies are needed to resolve this interesting question.

Created by Ernst A. Heinz, Thu Dec 16 23:28:11 EST 1999