Algorithmic Detection of Computer Generated Papers: Third Feature

Friday, March 26, 2010

Third Feature

The long awaited third feature has finally arrived. I'm measuring the occurrence of keywords from the reference section of papers in their body, title and abstract. Implementing the feature itself was fairly trivial, but 3D plotting with Matplotlib turns out to be somewhat tricky.

You can see that the dots are tiny. Unfortunately a bug in the latest version of Matplotlib prevents adjusting them. I may eventually grab Matplotlib from the project's Subversion repository, where the bug is already fixed.

Regardless, all 100 papers are correctly classified with this new feature (still working with k-nearest neighbor, now in 3 dimensions). My next step will be to get a bunch more data and evaluate various classifiers, since I'm using a quite arbitrary k=11 right now (as Professor Magdon pointed out at the CS poster session).

2 comments:

mskmoorthyMarch 26, 2010 at 11:15 AM
One possibility is to plot how the classification changes as k changes from 2 to 20. (just a thought).
ReplyDelete
Replies
AllenApril 7, 2010 at 3:49 PM
Yes, that should work nicely. k-nearest-neighbor with an even k leads to ties in voting, though, so I've skipped even numbers.
ReplyDelete
Replies

Add comment

Algorithmic Detection of Computer Generated Papers

Friday, March 26, 2010

Third Feature

2 comments:

Followers

Blog Archive

About Me