Analyzing the Results
Now you’re ready to write some code to analyze the results generated by test-classifier
. Recall that test-classifier
returns the list returned by test-from-corpus
in which each element is a plist representing the result of classifying one file. This plist contains the name of the file, the actual type of the file, the classification, and the score returned by classify
. The first bit of analytical code you should write is a function that returns a symbol indicating whether a given result was correct, a false positive, a false negative, a missed ham, or a missed spam. You can use **DESTRUCTURING-BIND**
to pull out the :type
and :classification
elements of an individual result list (using **&allow-other-keys**
to tell **DESTRUCTURING-BIND**
to ignore any other key/value pairs it sees) and then use nested **ECASE**
to translate the different pairings into a single symbol.
(defun result-type (result)
(destructuring-bind (&key type classification &allow-other-keys) result
(ecase type
(ham
(ecase classification
(ham 'correct)
(spam 'false-positive)
(unsure 'missed-ham)))
(spam
(ecase classification
(ham 'false-negative)
(spam 'correct)
(unsure 'missed-spam))))))
You can test out this function at the REPL.
SPAM> (result-type '(:FILE #p"foo" :type ham :classification ham :score 0))
CORRECT
SPAM> (result-type '(:FILE #p"foo" :type spam :classification spam :score 0))
CORRECT
SPAM> (result-type '(:FILE #p"foo" :type ham :classification spam :score 0))
FALSE-POSITIVE
SPAM> (result-type '(:FILE #p"foo" :type spam :classification ham :score 0))
FALSE-NEGATIVE
SPAM> (result-type '(:FILE #p"foo" :type ham :classification unsure :score 0))
MISSED-HAM
SPAM> (result-type '(:FILE #p"foo" :type spam :classification unsure :score 0))
MISSED-SPAM
Having this function makes it easy to slice and dice the results of test-classifier
in a variety of ways. For instance, you can start by defining predicate functions for each type of result.
(defun false-positive-p (result)
(eql (result-type result) 'false-positive))
(defun false-negative-p (result)
(eql (result-type result) 'false-negative))
(defun missed-ham-p (result)
(eql (result-type result) 'missed-ham))
(defun missed-spam-p (result)
(eql (result-type result) 'missed-spam))
(defun correct-p (result)
(eql (result-type result) 'correct))
With those functions, you can easily use the list and sequence manipulation functions I discussed in Chapter 11 to extract and count particular kinds of results.
SPAM> (count-if #'false-positive-p *results*)
6
SPAM> (remove-if-not #'false-positive-p *results*)
((:FILE #p"ham/5349" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999983107355541d0)
(:FILE #p"ham/2746" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.6286468956619795d0)
(:FILE #p"ham/3427" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9833753501352983d0)
(:FILE #p"ham/7785" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9542788587998488d0)
(:FILE #p"ham/1728" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.684339162891261d0)
(:FILE #p"ham/10581" :TYPE HAM :CLASSIFICATION SPAM :SCORE 0.9999924537959615d0))
You can also use the symbols returned by result-type
as keys into a hash table or an alist. For instance, you can write a function to print a summary of the counts and percentages of each type of result using an alist that maps each type plus the extra symbol total
to a count.
(defun analyze-results (results)
(let* ((keys '(total correct false-positive
false-negative missed-ham missed-spam))
(counts (loop for x in keys collect (cons x 0))))
(dolist (item results)
(incf (cdr (assoc 'total counts)))
(incf (cdr (assoc (result-type item) counts))))
(loop with total = (cdr (assoc 'total counts))
for (label . count) in counts
do (format t "~&~@(~a~):~20t~5d~,5t: ~6,2f%~%"
label count (* 100 (/ count total))))))
This function will give output like this when passed a list of results generated by test-classifier
:
SPAM> (analyze-results *results*)
Total: 3761 : 100.00%
Correct: 3689 : 98.09%
False-positive: 4 : 0.11%
False-negative: 9 : 0.24%
Missed-ham: 19 : 0.51%
Missed-spam: 40 : 1.06%
NIL
And as a last bit of analysis you might want to look at why an individual message was classified the way it was. The following functions will show you:
(defun explain-classification (file)
(let* ((text (start-of-file file *max-chars*))
(features (extract-features text))
(score (score features))
(classification (classification score)))
(show-summary file text classification score)
(dolist (feature (sorted-interesting features))
(show-feature feature))))
(defun show-summary (file text classification score)
(format t "~&~a" file)
(format t "~2%~a~2%" text)
(format t "Classified as ~a with score of ~,5f~%" classification score))
(defun show-feature (feature)
(with-slots (word ham-count spam-count) feature
(format
t "~&~2t~a~30thams: ~5d; spams: ~5d;~,10tprob: ~,f~%"
word ham-count spam-count (bayesian-spam-probability feature))))
(defun sorted-interesting (features)
(sort (remove-if #'untrained-p features) #'< :key #'bayesian-spam-probability))