5-Layer Spam Detection System
What Makes a Domain "Toxic"?
A toxic domain is one that has engaged in manipulative SEO practices, hosted spam content, or violated Google's Webmaster Guidelines. Using a toxic domain can result in:
- Manual actions (penalties): Google may apply penalties that persist even after domain ownership changes
- Algorithmic suppression: Domains with poor link profiles may never rank, even with clean content
- Indexing issues: Deindexed domains may take months or years to be re-crawled and indexed
- Reputational damage: If the domain is flagged as spam by browsers or security tools, visitors will see warnings
Critical rule: Toxicity is permanent in most cases. Google has long institutional memory. A domain that was penalized in 2019 may still carry baggage in 2026, even if ownership changed.
Layer 1: Moz Spam Score Interpretation
Moz's Spam Score is a machine learning model that predicts the likelihood a domain has been penalized or engaged in spammy link building. It's based on 27 signals correlated with penalized sites.
How to Check Spam Score
- Use Moz Link Explorer (paid) or MozBar browser extension (free with account)
- Enter the domain name
- Review the Spam Score (0–100%)
- Click on "Spam Flags" to see which signals triggered the score
Score Interpretation
- 0–10%: Low risk — safe to proceed
- 11–30%: Moderate risk — requires manual verification (SERP checks, Archive.org review)
- 31–60%: High risk — only safe for 301 redirects, not money sites
- 61–100%: Severe risk — skip the domain entirely
Important: Spam Score is a prediction, not a diagnosis. A 30% score doesn't mean the domain is penalized — it means it shares characteristics with penalized sites. Always combine this with manual checks.
Common Spam Flags and What They Mean
High Link Volume Flags
- Large proportion of branded links
- Low number of linking domains
- Large number of external outbound links
- Ratio of external links to content
Content Quality Flags
- Thin content (low word count)
- High proportion of ads
- Small proportion of branded anchor text
- Domain name length (very short or very long)
Technical Flags
- TLD correlation (certain TLDs like .tk, .ga are spam-prone)
- Presence of contact info (lack of contact = spam signal)
- Double-digit domain extensions
- No favicon present
Link Quality Flags
- Poison anchor text (CJK/Cyrillic)
- Large proportion of links from low DA sites
- Exact-match anchor text over-optimization
- Link farm referrers
Layer 2: Manual SERP Checks
Algorithmic tools like Spam Score can miss context. Manual SERP checks reveal how Google actually treats the domain.
Step-by-Step SERP Check Process
- Brand name search: Search for the domain's brand name (without .com). Does it rank #1?
- Site: operator search: Search
site:example.comto see indexed pages - Historical content search: Find 3–5 unique phrases from Archive.org snapshots and search them in quotes
- Exact URL search: Search specific page URLs — do they appear in results?
- Safe search filter test: Enable Google's Safe Search — is the domain filtered out?
Pass/Fail Criteria
Pass Signals
- Ranks #1 for brand name
- Site: operator returns indexed pages
- Historical content appears in SERP
- No Safe Search filtering
- Domain ranks for topical keywords
Fail Signals
- Doesn't rank for own brand name
- Site: operator returns 0 results
- Historical content not indexed
- Filtered by Safe Search
- All results are redirects to other domains
Red flag combination: If a domain doesn't rank for its brand name AND has a Spam Score over 30%, it's almost certainly penalized. Skip it.
Layer 3: Link Neighborhood Analysis
"Link neighborhood" refers to the other sites that link to and from the domain. Toxic neighborhoods are a major red flag.
How to Analyze Link Neighborhoods
- Use Ahrefs or Majestic to pull the domain's top 50 referring domains
- Visit 10–15 random referring domains
- Check each referring domain's link profile (how many total outbound links do they have?)
- Look for thematic coherence — do the sites relate to the domain's topic?
- Check for link farm patterns (sites that link to hundreds of unrelated domains)
Link Farm Red Flags
- Massive outbound link counts: Referring domains that link to 500+ unrelated sites
- No original content: Sites that only exist to host links (thin blog posts with 10+ unrelated links)
- Sitewide links: Every page on the referring domain links to the target domain (footer/sidebar spam)
- CJK/Cyrillic link farms: Referring domains entirely in foreign languages unrelated to the target domain's niche
- Parked domain referrers: Links from domains that show only ads or "coming soon" pages
Healthy benchmark: Clean domains have 70%+ referring domains with DA/DR over 10, topically relevant content, and under 100 total outbound links per page.
How to Spot a Private Blog Network (PBN)
PBNs are networks of sites built solely to manipulate rankings. They leave footprints:
- Same hosting IP ranges for multiple referring domains
- Same Google Analytics/AdSense IDs across multiple sites
- Same WHOIS privacy service for all referring domains
- Thin content (300–500 words) with 2–3 outbound links per post
- No social media presence or external mentions
- Diverse topics on the same site (tech blog with sudden finance posts linking to unrelated sites)
Layer 4: Doorway Page Detection
Doorway pages are low-quality pages created solely to rank for specific keywords and funnel traffic to a target site. Google explicitly penalizes this practice.
What Doorway Pages Look Like
- City-based duplicates: "Plumber in [City]" pages with identical content for 50+ cities
- Keyword-stuffed templates: Pages that repeat the same structure with only keyword variations
- Auto-generated content: Scraped content or spun articles with no original value
- Redirect chains: Pages that immediately redirect to another domain
- Thin affiliate pages: Product listings pulled from Amazon/eBay with no added content
How to Check for Doorway Pages
- Use Archive.org to view historical snapshots
- Look for URL patterns like
/city-name-keyword/or/state-keyword/ - Check if multiple pages have nearly identical content
- Look for mass-generated subdomains (city1.example.com, city2.example.com)
Instant disqualifier: If a domain has 100+ city/state-based pages with templated content, it was built as a doorway site. Skip it, regardless of metrics.
Layer 5: Cloaking History Detection
Cloaking is when a site shows different content to Google than to users. It's a severe violation and often results in permanent penalties.
How to Detect Cloaking
- Compare Archive.org snapshots to Google Cache (use
cache:example.com) - Look for discrepancies — if Archive shows a legitimate blog but Google Cache shows pharma spam, the site was cloaked
- Check for JavaScript redirects that trigger based on user agent (Googlebot vs. regular users)
- Look for historical use of iframe injections (Archive snapshots show iframes loading spam content)
Cloaking Red Flags
- Google Cache shows completely different content than Archive.org
- Historical snapshots show hidden divs or iframes loading external content
- User-agent-based redirects in JavaScript or .htaccess (visible in source code snapshots)
- Reports of "hacked" periods where content suddenly changed
Specific Spam Patterns to Recognize
Pharma Spam
One of the most common spam types on expired domains. Characteristics:
- Content about Viagra, Cialis, weight loss pills, ED medication
- URLs like
/cheap-viagra/or/buy-cialis-online/ - Anchor text like "buy meds online," "cheap pharmacy," "order pills"
- Often injected via hacks (sudden appearance of pharma content on unrelated sites)
Casino/Gambling Spam
Another high-risk category. Indicators:
- Content about poker, slots, sports betting, online casinos
- URLs like
/poker-online/or/best-casino-bonus/ - Anchor text like "play poker," "casino bonus," "bet online"
- Often violates Google's YMYL (Your Money Your Life) quality standards
Adult Content Spam
Permanent reputation damage. Signs:
- Pornographic images or dating site redirects
- URLs like
/hot-girls/or/adult-dating/ - Anchor text with explicit terms
- Often flagged by Safe Search filters
Zero tolerance: If you find any pharma, casino, or adult content in the domain's history, skip it immediately. No exceptions.
Tools for Spam Detection
Use these tools to systematically check for spam:
- Moz Link Explorer / MozBar: Spam Score, spam flags — $99/month or free extension
- Ahrefs Site Explorer: Backlink analysis, referring domain quality — $99/month
- Majestic Site Explorer: Trust Flow / Citation Flow ratio, topical trust flow — $49.99/month
- Archive.org (Wayback Machine): Historical content review — free
- Google Search: SERP checks, site: operator, cache: operator — free
- Google Safe Browsing Checker:
transparencyreport.google.com/safe-browsing— free - Screaming Frog SEO Spider: Crawl historical snapshots for hidden content — free for 500 URLs
Pass/Fail Thresholds Summary
Use these thresholds to make final decisions:
Automatic Pass
- Moz Spam Score: 0–10%
- Ranks #1 for brand name
- Indexed pages in Google
- Clean Archive.org history
- Topically relevant referring domains
- No CJK/Cyrillic anchors
Manual Review Required
- Moz Spam Score: 11–30%
- Some indexed pages but not #1 for brand
- Short spam period (under 3 months)
- Mixed anchor text (some CJK but under 10%)
- Link farms but under 20% of referrers
High Risk (301 Only)
- Moz Spam Score: 31–60%
- No brand ranking but some indexed pages
- Parked periods over 1 year
- 15–25% CJK/Cyrillic anchors
- Link farms over 20% of referrers
Automatic Fail
- Moz Spam Score: 61–100%
- Pharma/casino/adult history
- Doorway pages (100+ city pages)
- Cloaking evidence
- Zero indexed pages + high historical page count
- Over 25% CJK/Cyrillic anchors
Common Mistakes in Spam Checking
- Trusting metrics over history: A DA 50 domain with pharma spam is worthless
- Skipping Archive.org: Never rely solely on algorithmic spam scores
- Ignoring Safe Search filtering: If Google filters the domain, users will too
- Rationalizing red flags: "It's only 6 months of spam" is still a deal-breaker
- Not checking link neighborhoods: Link farm referrers indicate the domain was part of a scheme
Next Steps
Now that you can identify spam, learn the other components of domain vetting:
- Backlink Analysis — Understand DA, DR, TF, CF metrics
- History & Wayback Machine — Step-by-step Archive.org tutorial
- The Vetting Blueprint — Complete checklist integrating all vetting layers