Experimental Results for All Test Positions


 
Table 1.5: Results of DarkThought for All 343 Corrected Test Positions.
Search Best    Fresh    (I - 2)   (I - 3)  
Depth Change (#) Best  (#) Best  (#) Best  (#)
2 35.28% (121) 100.00% (121)    0.00% (0)   0.00% (0)
3 39.65% (136) 85.29% (116) 14.71% (20) 0.00% (0)
4 31.78% (109) 55.05% (60) 31.19% (34) 13.76% (15)
5 29.45% (101) 56.44% (57) 24.75% (25) 10.89% (11)
6 24.49% (84) 65.48% (55) 19.05% (16) 5.95% (5)
7 21.28% (73) 49.32% (36) 28.77% (21) 10.96% (8)
8 25.07% (86) 50.00% (43) 24.42% (21) 4.65% (4)
9 21.57% (74) 40.54% (30) 28.38% (21) 13.51% (10)
10 24.20% (83) 37.35% (31) 34.94% (29) 8.43% (7)
11 17.49% (60) 31.67% (19) 36.67% (22) 10.00% (6)
12 15.45% (53) 45.28% (24) 20.76% (11) 9.43% (5)
13 16.62% (57) 42.11% (24) 28.07% (16) 10.53% (6)
14 13.70% (47) 34.04% (16) 25.53% (12) 12.77% (6)
 


 
Table 1.6: Results of Crafty for All 343 Test Positions.
Search Best    Fresh    (I - 2)   (I - 3)  
Depth Change (#) Best  (#) Best  (#) Best  (#)
2 38.78% (133) 100.00% (133)    0.00% (0)   0.00% (0)
3 36.73% (126) 71.43% (90) 28.57% (36) 0.00% (0)
4 30.61% (105) 65.71% (69) 25.71% (27) 8.57% (9)
5 30.32% (104) 59.62% (62) 30.77% (32) 6.73% (7)
6 27.41% (94) 56.38% (53) 24.47% (23) 8.51% (8)
7 24.49% (84) 47.62% (40) 30.95% (26) 7.14% (6)
8 22.45% (77) 37.66% (29) 31.17% (24) 11.69% (9)
9 18.37% (63) 30.16% (19) 38.10% (24) 4.76% (3)
10 17.20% (59) 40.68% (24) 32.20% (19) 5.08% (3)
11 16.62% (57) 52.63% (30) 24.56% (14) 5.26% (3)
12 16.91% (58) 41.38% (24) 24.14% (14) 10.34% (6)
13 14.58% (50) 32.00% (16) 22.00% (11) 14.00% (7)
14 15.45% (53) 39.62% (21) 32.08% (17) 5.66% (3)
 

Table 1.5 summarizes the complete experimental results of DARKTHOUGHT for all 343 corrected test positions. Table 1.6 shows the corresponding numbers of CRAFTY as automatically computed by our Perl script from Hyatt and Newborn's publicly available result file. The three rightmost columns of the tables list the novel statistics of our additionally gathered data. Their percentages relate to the absolute numbers of the ``Best Change'' column. As already expected beforehand, our novel statistics reveal some very interesting general features of the new best moves at every iteration.

Fresh Best.
In contrast to what we and probably many others suspected, the rates of fresh best moves (relative to all new best moves) of both CRAFTY and DARKTHOUGHT did not steadily decrease from one iteration to the next - even not at high search depths of 9-14 plies. Instead, the ``Fresh Best'' rates wavered directionless between 30%-50% from iteration #7 onwards. This finding lends support to the validity of Newborn's hypothesis about the playing strength of chess programs (see Section 1.5.4) because

\begin{displaymath}\frac{\mbox{\char93 \lq\lq Best Change''($i$ )}}{\mbox{\char93 \lq\lq B...
...ox{\char93 \lq\lq Fresh Best''($i-1$ )}}~~
\mbox{holds for}~i ~>=8.
\end{displaymath}

The surprising approximation means that the discovery of fresh best moves remains substantial even at high search depths of up to 14 plies and decreases as gradually on average as the discovery of any new best moves (see Section 1.5.6). Given the strong empirical evidence from the experimental results of both CRAFTY and DARKTHOUGHT, we expect the approximation to be valid for modern chess programs in general.

(I - 2) Best.
The numbers of this column show that CRAFTY and DARKTHOUGHT suffered from instable odd/even behaviour in 25%-35% of all ``Best Change'' searches regardless of their nominal depth. We deem it quite remarkable that the rates of odd/even instability (relative to all new best moves) wavered in such a narrow range starting with iteration #3. Overall, the average probabilities of odd/even instability during any ``Best Change'' search amounted to 26.5% for CRAFTY and 24.4% for DARKTHOUGHT. The experimental results of both programs therefore strongly suggest that modern chess programs feature odd/even instabilities for 25% of all ``Best Change'' searches in general. Last but not least, we like to mention that the sum of ``Fresh Best'' moves and ``(I - 2) Best'' moves equalled about 65%-75% of all new best moves for both CRAFTY and DARKTHOUGHT from iteration #8 onwards (sole exception: CRAFTY at iteration #13).

(I - 3) Best.
There are hardly any noteworthy facts to report for the last columns of Table 1.5 and Table 1.6. The rates of ``(I - 3) Best'' moves (relative to all new best moves) wavered more radically than both the ``Fresh Best'' rates and the ``(I - 2) Best'' rates. We were surprised that the ``(I - 3) Best'' rates averaged at 10% for both CRAFTY and DARKTHOUGHT which was well above our expectations.



Created by Ernst A. Heinz, Thu Dec 16 23:28:11 EST 1999