Latest updates: see the notes for code checkins [1178] and [1184], and the movie on MyBloop

Link-spamming on FeedMeLinks

The Problem

Because FeedMeLinks has a decent Google Page-Rank and a deliberately low barrier to entry, "link spammers" have discovered the site and are posting hundred of almost identical URLs to the site in order to promote other online properties. This disturbs users and dillutes the value of the site as an online resource of good links.

What are their methods?

Link-spammers appear to use three main tactics:

  • manually posting and tagging links
  • importing gamed bookmarks files (often exported from delicious)
  • using shotgunning tools like 0n1yw1re.c0m to spray links across many social-bookmarking sites at once
  • They look alike, choosing (auto-generated?) names like username87 or HumpBackName?
  • They spring from a single source, often creating multiple accounts from a single IP
  • They are tag-happy, some creating almost as many tags as they do links, or even MORE tags than links
  • They lie about their email, registering account with bogus, non-working email addresses
  • They repeat themselves, importing series of links to domains that are very similar in the "bottom half", e.g. lots of crap-whatever.info or foo.blogspot.com things

Steps already taken to combat this problem

  • confirm the email address of new accounts by requiring a verification code. this means spammers can't easily create multiple accounts in order to circumvent the "no linking to duplicate links per user" rule
  • convert web forms from GET to POST to make automating linking or registration more difficult for spammers
  • privatize new links (upon add) that already exist, i.e. was already linked by another user (per-user duplicates are already blocked) This means no more safety in numbers for spammers who work in teams to create multiple accounts from different IP addresses, or who create multiple accounts from the same IP more than a week apart.

Next steps to take

  • server-side validate all new usernames
  • run a batch query nightly and snuff any user with 40% as many tags as links
  • do the previous heuristic on bookmarks imports as well to stop "day spammers" who post during the day but would be deleted by the nightly cron job
  • upon linking: flag users with lots of hypens in the url as possible link-spammers (?, v. Kailash Nadh's Fighting Spam Blogs: A Hypothesis)