• dsr_ 5 hours ago |
    Notably lacking in comparisons in speed, accuracy and costs vs KNN-hyperspace, Bayesian-updaters like SpamAssassin's SpamBayes, and traditional rules methods.
    • PaulHoule 3 hours ago |
      Yeah, I collect papers where run-of-the-mill people do run-of-the-mill classification problems and the standard of quality is not what I wish it was. This paper avoids the common antipattern of wasting effort on Word2Vec and five other things that never work.

      They are using Enron which is a very strange email spool to work with because it's almost entirely spam free. The problem in Enron is to find a tiny amount of criminal activity in a vast volume of innocent communications whereas the problem in a normal email spool today is to find a tiny amount of meaningful email in a great flood of spam and attempted criminal activity.