Wednesday, January 19, 2011

Features of the UCI chess data sets

Only to be found in Figure 2 of CONSCIOUSNESS AS AN ENGINEERING ISSUE, PART 2 by Donald Michie (University of Edinburgh).
1bkblk the BK is not in the way
2bknwy the BK is not in the BR's way
3bkon8 the BK is on rank 8 in a position to aid the BR
4bkona the BK is on file A in a position to aid the BR
5bkspr the BK can support the BR
6bkxbq the BK is not attacked in some way by the pro- moted WP
7bkxcr the BK can attack the critical square (b7)
8bkxwp the BK can attack the WP
9blxwp B attacks the WP (BR in direction x = -1 only)
10bxqsq one or more Black pieces control the queening square
11cntxt the WK is on an edge and not on a8
12dsopp the kings are in normal opposition
13dwipd the WK distance to intersect point is too great
14hdchk there is a good delay because there is a hidden check
15katri the BK controls the intersect point
16mulch B can renew the check to good advantage
17qxmsqthe mating square is attacked in some way by the promoted WP
18r2ar8 the BR does not have safe access to file A or rank 8
19reskd the WK can be reskewered via a delayed skewer
20reskr the BR alone can renew the skewer threat
21rimmx the BR can be captured safely
22rkxwp the BR bears on the WP (direction x = -1 only)
23rxmsq the BR attacks a mating square safely
24simpl a very simple pattern applies
25skach the WK can be skewered after one or more checks
26skewr there is a potential skewer as opposed to fork
27skrxp the BR can achieve a skewer or the BK attacks the WP
28spcop there is a special opposition pattern present
29stlmt the WK is in stalemate
30thrsk there is a skewer threat lurking
31wkcti the WK cannot control the intersect point
32wkna8 the WK is on square a8
33wknck the WK is in check
34wkovl the WK is overloaded
35wkpos the WK is in a potential skewer position
36wtoeg the WK is one away from the relevant edge

References:

  1. UCI archive

  2. PDF of the Michie article, found on Google

Monday, January 17, 2011

Visualizing clusters of high-dimensional data

I wanted to demonstrate that distance-based methods should have a hard time distinguishing some classes (say the libras UCI data set) and not others (say the iris UCI data set) with a visualization. S.R. suggested:

  1. Use multidimensional scaling (MDS@wikipedia) to plot points in 2 dimensions.
  2. draw lines between points and their nearest neighbors in the original space

If the lines are an overlapping mess, the points don't cluster well.


Steps to do this in MATLAB:


  1. M=dlmread('my.data', '\t') % read tab-delimited data into memory

  2. M=transpose(M) % I have to transpose my data to make columns examples, rows are variables (features)

  3. D=dist(M) % compute distances into a square matrix

  4. G=mdscale(D, 2) % find points in 2 dimensions


G is now a Nx2 matrix (N=number of samples). The rest of the process involves plotting the lines to connect neighbors. You can do that in MATLAB, but I don't know a particularly fast way.

Followers