Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Thursday, July 29 • 10:00am - 11:00am
Keeping the Good Stuff In: Confidential Information Firewalling with the CRM114 Spam Filter & Text Classifier

Sign up or log in to save this to your schedule and see who's attending!

In this whitepaper we consider the problem of outbound-filtering of emails to prevent accidental leakage of confidential information, We examine how to do this with the GPLed open-source spam filter CRM114 and test the accuracy of this filter against a 10,000+ document corpus of hand-classified emails (both confidential and non-confidential) in Japanese. We look into what moving parts are involved in these filters, and how they can be set up. The results show that a hybrid of multiple CRM114 filters outperforms a human-crafted regular-expression filter by nearly 100x in recall, by detecting > 99.9% of confidential documents, and with a simultaneous false alarm rate of less than 5.3%. As the programmers creating the machine-learning programs don't know how to read or write Japanese, this problem is an almost ideal case of the Searle “Chinese Room” problem.


Thursday July 29, 2010 10:00am - 11:00am
Day 2 - Where the Data Lives

Attendees (18)