FAQ - Current Spam thresholds and guidelines

Archive-name: usenet/spam-faq
Posting-Frequency: weekly
Last-modified: 1997/03/25
URL: http://www.uiuc.edu/ph/www/tskirvin/faqs/spam.html
Maintainer: tskirvin@uiuc.edu (Tim Skirvin)
Original-author: clewis@ferret.ocunix.on.ca (Chris Lewis)

            Current Spam thresholds and guidelines
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This article is intended to describe the current consensus spam
thresholds and ensure that the definitions of these terms are availible.
It is believed that most, if not all, spam cancellers use these terms and
definitions in their work; however, many other people use the terms
inappropriately, which leands to confusion in discussions.  This is an
informal FAQ aimed at clarity and understanding, not anal-retentive
correctness.

Excessive Multi-Posting (EMP) has the same meaning as the term "spam"
usually carries, but it is more accurate and self-explanatory.  EMP
means, essentially, "too many separate copies of a substantively
identical article."

"Substantively identical" means that the material in each article
is sufficiently similar to construe the same message.  The signature
is included in the determination.  These are examples of substantively
identical articles:

        - byte-for-byte identical messages
        - otherwise identical postings minimally customized for each group it
          appears in.
        - advertising the same service.
        - articles that consist solely of the same signature
        - articles which consist of inclusions of other user's postings,
          but are otherwise identical.

Cross-posting means that a single message appears in more than one
group.  Most newsreaders allow you to specify more than one group in a
posting.

Excessive Crossposting (ECP), also known as "Velveeta", refers to where
a "lot" of postings to more than one group each have been made.

Some people think cross-posting is "bad".  In and of itself, it's
good behaviour - it allows you to reach more groups with less impact
on the net.  Especially if you set the followup-to: header to one
group.  It is "bad" when it's done to provoke flamewars (like cross-posting
how to cook a cat between alt.tasteless and rec.pet.cats), but this
is not the topic of this FAQ.

This author considers the term "spam" to mean excessive postings of
EMP and/or ECP variety.  That is, "spam", is a generic term for several
different things.  The term was originally supposed to mean EMPs only, but
most people use "spam" to mean "any excessive posting".

The term "jello" means a large/combined EMP/ECP.  This author doesn't
believe this to be a useful term.  Indeed, this author doesn't really
believe any of these terms are useful - always call them "spam".

A spam, EMP, or ECP then refers to a posting that has been posted to
many places.  There is a consensus that there is a point at which
it is abuse, and is subject to advisory cancellation.

A formula has been invented by Seth Breidbart which attempts to
quantify the degree of "badness" of a spam (whether EMP or ECP) as a
single number.  The Breidbart Index (BI) is defined as the sum of the
square roots of n (n is the number of newsgroups each copy was posted
to).

Example: If two copies of a posting are made, one to 9 groups, and one
to 16, the BI index is sqrt(9)+sqrt(16) = 3+4 = 7.

The BI2 (Breidbart Index, version 2) is an experimental metric, which
may eventually replace the BI.  It is calculated by computing the sum
of the square roots of n, plus the sum of n, and dividing by two.  Eg:
one posting to 9, and one to 16 is

        (sqrt(9) + sqrt(16) + 9 + 16) / 2
        3 + 4 + 9 + 16 = 32 / 2 = 16

The BI2 is more "aggressive" than the BI, intended to cut off the "higher
end".  BI allows about 125 newsgroups maximum.  BI2 allows a maximum of 35.

A slightly less aggressive index is the SBI (Skirvin-Breidbart Index); it
is calculated much the same as the BI2, but sums the number of groups in
the Followup-to: header (if available), rather than the newsgroups.  Eg:
one posting to 9 groups, and one to 16 with followups set to 4 is

        (sqrt(9) + sqrt(16) + 9 + 4) / 2
        3 + 4 + 9 + 4 = 20 / 2 = 10

Except in nl.*, the BI2 and SBI are not used to determine whether a spam
is cancellable.

The thresholds for spam cancels are based _only_ on one or more of the
following measures:

        1) The BI is 20 or greater over a 45 day period.
        2) is a continuation of a previous EMP/ECP, within a 45 day
           sliding window.  That is: if the articles posted within the
           past 45 days exceeds a BI threshold of 20, it gets removed,
           unless the originator has made a clear and obvious effort to
           cease spamming (which includes an undertaking to do so
           posted in news.admin.net-abuse.usenet).  This includes "make
           money fast" schemes which passed the EMP/ECP thresholds
           several years ago.  This author recommends one posting
           cross-posted to no more than 10 groups, no more often than
           once every two weeks (a BI of 3).

A single posting cannot be cancellable - to reach a BI of 20, it would
have to be cross-posted to 400 groups.  This isn't possible due to
limitations in Usenet software.

These thresholds are applied to all hierarchies, not only the big8, but
alt, bitnet, bionet, biz and regional hierarchies etc.    Many hierarchies
have more restrictive rules which are decided upon and enforced by their
users and administrators.

These cancels have nothing whatsoever to do with the contents of the
message.  It doesn't matter if it's an advertisement, it doesn't matter
if it's abusive, it doesn't matter whether it's on-topic in the groups
it was posted in, it doesn't matter whether the posting is for a "good
cause" or not.

Spam cancels are non-content based.  They're not based on _what_ was said,
they're based only on how many times it was said.

Administrators wishing to ignore spam cancels can "alias out" the
site "cyberspam", and the cancels will not affect your system.  This
is normally done at your feed site, but patches are available for
INN to allow you to reject spam cancels on your own system.   Ask in
news.admin.net-abuse.usenet if you need this patch.

Further literature on posting etiquette can be found in:

    - the newsgroup news.announce.newusers,

    - "What is Usenet", by Salzenberg, Spafford and Moraes.
      ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/what-is/part1

    - "What is Usenet?  A second opinion.", by Vielmetti.
      ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/what-is/part2

    - "FAQ: Advertising on Usenet: How To Do It, How Not To Do It"
      by Furr.

ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/advertising/how-to/part1

    - "A Primer on How to Work With the Usenet Community", by Von Rospach,
      Spafford, et. al.
      ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/primer/part1

    - "Rules for posting to Usenet", by Horton, Spafford & Moraes.

ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/posting-rules/part1

    - "Emily Postnews Answers Your Questions on Netiquette", by Templeton
      et. al.

ftp://rtfm.mit.edu/pub/usenet-by-group/news.answers/usenet/emily-postnews/part1

    - Numerous books and publications on Usenet, such as O'Reilly's
      "Usenet Handbook", the "Whole Internet Guide and Catalog" (Krol)
      etc.

The above FAQs are also mirrored at various sites, including as ftp.sunet.se,
mirror.aol.com, ftp.uu.net, ftp.uni-paderborn.de, nctuccca.edu.tw,
hwarang.postech.ac.kr, ftp.hk.super.net etc.

A mailing list has been set up to assist those wishing to post commercial
advertisements on Usenet in a responsible fashion.  Email your questions to
commerce@acpub.duke.edu.
Return to Source Index
Designed by BIZynet
Return to BIZynet's Home Page