PRIVATE BETA Redirect + Canonical Intelligence - PageRank Analysis

The AI-native crawler that sees what search bots see

AI Spider audits your site the way AI models crawl it - extractability, citability, JS dependency, chunk quality, and retrieval readiness. Not just rankings. Actual AI discoverability.

Screaming Frog parity AI Retrievability Scoring Real-time SSE updates No LLM API key required Private Beta
COMPLETE - 94 URLs · 36s · Avg Score 83
AddressStatusAI ScoreWordsResponseIssuesDepth
linkgroup.co 200 90 1,24441ms 50
/services 200 84 1,06244ms 71
/expertise/cloud-devops 200 90 1,5372.1s 92
/contact 200 85 756802ms 81
+90 more URLs…
363 URLs discovered in a single audit (HTML + resources)
69 Data columns exported to CSV
22+ Issue types detected automatically
6 AI scoring dimensions per page

Not just a crawler. A retrieval auditor.

AI Spider goes beyond status codes and titles. It evaluates every page the way an AI language model would encounter it - and scores it accordingly.

🤖

AI Retrievability Score

Six-dimensional scoring per page: extractability, citability, structure, AI crawlability, chunk quality, and link authority. Aggregated into a single 0-100 score.

🕷

Full Resource Discovery

Crawls HTML pages AND discovers images, JavaScript, CSS, PDFs, and fonts via HEAD requests. Full site inventory like Screaming Frog - not just HTML.

📡

Real-Time SSE Updates

Watch pages appear in the URL table live during crawl. Scores, signals, and issue counts update the moment each page is processed.

Server-Sent Events
🔗

Redirect Intelligence

Detects redirect chains (3+ hops), redirect loops, mixed 301/302 chains, and redirects pointing to noindex pages. Full chain visualisation.

📎

Canonical Intelligence

Canonical chains, loops, canonical pointing to 404/noindex/redirect pages. Goes beyond "has a canonical" to verify the canonical is actually valid.

📊

Architecture Reporting

Directory-level aggregations: avg AI score per folder, avg crawl depth, issue rate, page count. Understand your site structure at a glance.

🧩

PageRank + Link Intelligence

Internal PageRank via power iteration. Orphan page detection, buried authority pages, dead-end pages, excessive outlink flagging.

🧪

Semantic Similarity

TF-IDF cosine similarity across all pages. Detects content cannibalization (>70% overlap) and topic overlap (50-70%). Two separate signals from two engines.

💡

Actionable Recommendations

Site-wide and per-page recommendations backed by real data. Orphan pages, buried content, readability, missing citability signals - all prioritised by impact.

/pages/recommendations

Everything runs automatically after the crawl

No manual triggers. No separate tools. The moment a crawl completes, AI Spider runs 8 post-crawl analysis engines in sequence.

  • Redirect Chain + Loop Analysis

    Walks every redirect chain, flags loops and long chains with full hop visualisation

  • 📎

    Canonical Intelligence

    Validates every canonical tag destination - not just presence

  • 🔁

    Near-Duplicate Detection

    MinHash Jaccard similarity across all page text. Exact + near-exact duplicates

  • 📈

    Internal PageRank

    Power-iteration PageRank on the internal link graph. Scores persisted to pages table

  • 🧠

    Semantic Similarity

    TF-IDF cosine similarity. Finds content cannibalization and topic overlap

  • 🌐

    Hreflang Validation

    BCP-47 validation, x-default presence, reciprocal link checking

Post-Crawl Summary - linkgroup.co Complete
Pages Crawled94
Resource URLs269
Avg AI Score83
Redirect Chains3
Redirect Loops0
Canonical Issues7
Orphan Pages4
Near Duplicates2
Semantic Overlaps6 pairs
Broken Links0
AI Bots Blocked0
JS-Dependent Pages4
Total Issues105

22+ issue types across 7 categories

Every issue is categorised by severity and linked to the affected page. Issues feed the recommendations engine directly.

GPTBot / ClaudeBot / PerplexityBot blocked
JS-dependent content (AI can't read)
Redirect loop detected
Canonical points to 404 page
Missing H1 tag
Very thin content (<100 words)
Long redirect chain (3+ hops)
Canonical chain (A→B→C)
High boilerplate ratio (>85%)
No author signal detected
Missing meta description
Exact duplicate content
Orphan page - 0 internal inlinks
Content difficult to read (Flesch <40)
llms.txt not present
Buried authority page (deep + high PR)
URL contains uppercase characters
Near-duplicate content detected

Production-grade. Local-first.

Built on a modern full-stack architecture. Runs entirely on your machine - no cloud, no data leaving your infrastructure.

🖥 Frontend

  • Vite 7 + React 19 + TypeScript 5.9
  • React Router v7 (file-based routing)
  • Tailwind CSS v4 + CSS variables
  • react-window virtualised URL table
  • Recharts for visualisations

⚙️ Backend

  • Node.js + Express REST API
  • SQLite via better-sqlite3 (session DB)
  • Raw SQL migrations (28 migrations)
  • Server-Sent Events for live updates
  • Axios with retry + redirect capture

🧠 Processing - Zero LLM Dependency

  • No API key required - fully algorithmic
  • MinHash Jaccard (duplicate detection)
  • TF-IDF cosine similarity (semantic)
  • Playwright for JS-rendered pages
  • Cheerio for HTML parsing

📦 Data + Export

  • 69-column CSV export
  • JSON + Markdown + llms.txt export
  • Screenshot capture + storage
  • Session-based DB per crawl
  • Sitemap generation

AI Spider vs Screaming Frog

Screaming Frog is the gold standard for technical SEO crawling. AI Spider matches it on the fundamentals and goes further on AI-specific signals.

Feature Screaming Frog AI Spider
HTML page crawling
Resource URL discovery (images, JS, CSS)
AI Retrievability Scoring
GPTBot / ClaudeBot access analysis
Content chunking for AI retrieval
llms.txt detection
Semantic similarity (TF-IDF)
Site-wide recommendations engine
Redirect chain detection
Canonical intelligence
Internal PageRank
Hreflang validation
Structured data extraction
Real-time crawl updates✓ (SSE live)
MCP server integration✓ (v24)Planned
Runs locally (no cloud)✓ (desktop app)✓ (local server)
Pricing€245/yearFree during beta

Start auditing AI retrievability today

View on GitHub →