Experimental Results for All Test Positions
Table 1.5:
Results of DarkThought for All 343 Corrected Test Positions.
Search |
Best |
|
Fresh |
|
(I - 2) |
|
(I - 3) |
|
Depth |
Change |
(#) |
Best |
(#) |
Best |
(#) |
Best |
(#) |
2 |
35.28% |
(121) |
100.00% |
(121) |
0.00% |
(0) |
0.00% |
(0) |
3 |
39.65% |
(136) |
85.29% |
(116) |
14.71% |
(20) |
0.00% |
(0) |
4 |
31.78% |
(109) |
55.05% |
(60) |
31.19% |
(34) |
13.76% |
(15) |
5 |
29.45% |
(101) |
56.44% |
(57) |
24.75% |
(25) |
10.89% |
(11) |
6 |
24.49% |
(84) |
65.48% |
(55) |
19.05% |
(16) |
5.95% |
(5) |
7 |
21.28% |
(73) |
49.32% |
(36) |
28.77% |
(21) |
10.96% |
(8) |
8 |
25.07% |
(86) |
50.00% |
(43) |
24.42% |
(21) |
4.65% |
(4) |
9 |
21.57% |
(74) |
40.54% |
(30) |
28.38% |
(21) |
13.51% |
(10) |
10 |
24.20% |
(83) |
37.35% |
(31) |
34.94% |
(29) |
8.43% |
(7) |
11 |
17.49% |
(60) |
31.67% |
(19) |
36.67% |
(22) |
10.00% |
(6) |
12 |
15.45% |
(53) |
45.28% |
(24) |
20.76% |
(11) |
9.43% |
(5) |
13 |
16.62% |
(57) |
42.11% |
(24) |
28.07% |
(16) |
10.53% |
(6) |
14 |
13.70% |
(47) |
34.04% |
(16) |
25.53% |
(12) |
12.77% |
(6) |
|
Table 1.6:
Results of Crafty for All 343 Test Positions.
Search |
Best |
|
Fresh |
|
(I - 2) |
|
(I - 3) |
|
Depth |
Change |
(#) |
Best |
(#) |
Best |
(#) |
Best |
(#) |
2 |
38.78% |
(133) |
100.00% |
(133) |
0.00% |
(0) |
0.00% |
(0) |
3 |
36.73% |
(126) |
71.43% |
(90) |
28.57% |
(36) |
0.00% |
(0) |
4 |
30.61% |
(105) |
65.71% |
(69) |
25.71% |
(27) |
8.57% |
(9) |
5 |
30.32% |
(104) |
59.62% |
(62) |
30.77% |
(32) |
6.73% |
(7) |
6 |
27.41% |
(94) |
56.38% |
(53) |
24.47% |
(23) |
8.51% |
(8) |
7 |
24.49% |
(84) |
47.62% |
(40) |
30.95% |
(26) |
7.14% |
(6) |
8 |
22.45% |
(77) |
37.66% |
(29) |
31.17% |
(24) |
11.69% |
(9) |
9 |
18.37% |
(63) |
30.16% |
(19) |
38.10% |
(24) |
4.76% |
(3) |
10 |
17.20% |
(59) |
40.68% |
(24) |
32.20% |
(19) |
5.08% |
(3) |
11 |
16.62% |
(57) |
52.63% |
(30) |
24.56% |
(14) |
5.26% |
(3) |
12 |
16.91% |
(58) |
41.38% |
(24) |
24.14% |
(14) |
10.34% |
(6) |
13 |
14.58% |
(50) |
32.00% |
(16) |
22.00% |
(11) |
14.00% |
(7) |
14 |
15.45% |
(53) |
39.62% |
(21) |
32.08% |
(17) |
5.66% |
(3) |
|
Table 1.5 summarizes the complete experimental results of
DARKTHOUGHT for all 343 corrected test positions.
Table 1.6 shows the corresponding numbers of CRAFTY
as automatically computed by our Perl script from Hyatt and Newborn's
publicly available result file. The three rightmost columns of the
tables list the novel statistics of our additionally gathered data.
Their percentages relate to the absolute numbers of the ``Best Change''
column. As already expected beforehand, our novel statistics reveal some
very interesting general features of the new best moves at every
iteration.
- Fresh Best.
- In contrast to what we and probably many others suspected, the rates of
fresh best moves (relative to all new best moves) of both CRAFTY and
DARKTHOUGHT did not steadily decrease from one iteration to the
next - even not at high search depths of 9-14 plies. Instead,
the ``Fresh Best'' rates wavered directionless between 30%-50%
from iteration #7 onwards. This finding lends support to the validity of
Newborn's hypothesis about the playing strength of chess programs (see
Section 1.5.4) because
The surprising approximation means that the discovery of fresh best
moves remains substantial even at high search depths of up to 14 plies
and decreases as gradually on average as the discovery of any new best
moves (see Section 1.5.6). Given the strong empirical evidence
from the experimental results of both CRAFTY and DARKTHOUGHT, we
expect the approximation to be valid for modern chess programs in
general.
- (I - 2) Best.
- The numbers of this column show that CRAFTY and
DARKTHOUGHT suffered from instable odd/even behaviour in
25%-35% of all ``Best Change'' searches regardless of their
nominal depth. We deem it quite remarkable that the rates of odd/even
instability (relative to all new best moves) wavered in such a narrow
range starting with iteration #3. Overall, the average probabilities
of odd/even instability during any ``Best Change'' search amounted to
26.5% for CRAFTY and 24.4% for DARKTHOUGHT. The experimental
results of both programs therefore strongly suggest that modern chess
programs feature odd/even instabilities for 25% of all ``Best Change''
searches in general. Last but not least, we like to mention that the sum
of ``Fresh Best'' moves and ``(I - 2) Best'' moves equalled about
65%-75% of all new best moves for both CRAFTY and
DARKTHOUGHT from iteration #8 onwards (sole exception: CRAFTY at
iteration #13).
- (I - 3) Best.
- There are hardly any noteworthy facts to report for the last columns of
Table 1.5 and Table 1.6. The rates of
``(I - 3) Best'' moves (relative to all new best moves) wavered more
radically than both the ``Fresh Best'' rates and the ``(I - 2) Best''
rates. We were surprised that the ``(I - 3) Best'' rates averaged at
10% for both CRAFTY and DARKTHOUGHT which was well above our
expectations.
Created by Ernst A. Heinz, Thu Dec 16 23:28:11 EST 1999