spam / junk mail architecture

You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.

Roadmap
Projects
Coding
Testing
Tools
- Bugzilla
- Tinderbox
- Bonsai
- LXR
FAQs

spam / junk mail filtering
Seth Spitzer

the mozilla implement is based on Paul Graham's A Plan for Spam.

training data is global across all mail accounts within a profile

training data is stored in a binary format, in a file named "training.dat".

initially, the training.dat file is empty (there was discussion of shipping with a default file)

on spam detection, the user can choose to move spam to a special "Junk" folder

the user can configure junk mail can be automatically purged from the "Junk" folder

to analyze a message for spam, we need the entire message, not just the headers.

spam detection happens after filters are run

white listing happens after filters, but before spam detection

the purge code is implemented as a search of the "Junk" folder, looking for "old" message that have the proper junk status.

when does purging happens?

elaborate on the mime changes that were made for spam

currently, spam filtering is does not work for news, but it would be possible to add support for this. (there is a bug on it.)

initial state	user action	table changes
unknown (user can't see this, looks like "not junk")	mark as junk	add tokens to bad
unknown (user can't see this, looks like "not junk")	mark as not junk	add tokens to good
not junk	mark as junk	remove tokens from good, add tokens to bad
not junk	mark as not junk	no op
junk	mark as junk	no op
junk	mark as not junk	remove tokens from bad, add tokens to good

Mozilla