Saturday, April 10, 2010


So it turns out that pruning does not produce a U-shaped leave-one-out error curve as I expected:
For reference, a 3d plot of the pruned data with k=3:
The densities of computer generated and human papers do look much more comparable, though. The error looks to be uniformly at least slightly higher than without pruning, which is expected.  I pruned by removing only points which were both classified correctly (leave-one-out) and whose removal did not cause any previously removed points to be classified incorrectly.  There are certainly other (probably better) pruning algorithms, but I would expect at least comparable results from them.

I will try using a validation set instead of leave-one-out cross-validation just for kicks, but it's looking increasingly like k=3 is the way to go.

No comments:

Post a Comment