PRIVATE BETA Full Intelligence Platform · 17 Analysis Views

The AI-native crawler that sees what search bots see

AI Spider audits your site the way AI models crawl it: extractability, citability, JS dependency, chunk quality, and retrieval readiness. Redirect intelligence, canonical analysis, PageRank, near-duplicate detection, and on-demand AI analysis for N-Grams, Citation Readiness, Section Strength, and Authority. Not just rankings. Actual AI discoverability.

Start Crawling Free See All Features

Screaming Frog parity AI Retrievability Scoring Page Intelligence Panel Architecture Intelligence No LLM API key required

COMPLETE - 94 URLs · 36s · Avg Score 83

AddressStatusAI ScoreWordsResponseIssuesDepth

akshaydahiya.site 200 90 1,24441ms50

/services 200 84 1,06244ms71

/expertise/cloud-devops 200 90 1,5372.1s92

/contact 200 85 756802ms81

+90 more URLs…

Core Capabilities

Not just a crawler. A full intelligence platform.

AI Spider goes beyond status codes and titles. It evaluates every page the way an AI language model would encounter it - and scores it accordingly.

AI Retrievability Score

Six-dimensional scoring per page: extractability, citability, structure, AI crawlability, chunk quality, and link authority. Aggregated into a single 0-100 score.

Full Resource Discovery

Crawls HTML pages AND discovers images, JavaScript, CSS, PDFs, and fonts via HEAD requests. Full site inventory like Screaming Frog - not just HTML.

Real-Time SSE Updates

Watch pages appear in the URL table live during crawl. Scores, signals, and issue counts update the moment each page is processed.

Server-Sent Events

Redirect Intelligence

Detects redirect chains (3+ hops), redirect loops, mixed 301/302 chains, and redirects pointing to noindex pages. Full chain visualisation.

Canonical Intelligence

Canonical chains, loops, canonical pointing to 404/noindex/redirect pages. Goes beyond "has a canonical" to verify the canonical is actually valid.

Architecture Reporting

Directory-level aggregations: avg AI score per folder, avg crawl depth, issue rate, page count. Understand your site structure at a glance.

PageRank + HITS Authority

Internal PageRank via power iteration plus HITS hub and authority scores. Orphan detection, buried authority pages, leakage analysis, and dead-end flagging.

HITS Algorithm

Near-Duplicate Detection

MinHash Jaccard similarity across all pages post-crawl. Exact duplicates via SHA-256 hash, near-duplicates at 80%+ content overlap. Fully algorithmic. No LLM required..

MinHash · 128 hash functions

Actionable Recommendations

22+ site-wide and per-page recommendations backed by real data - structured data gaps, AI crawler blocks, citation readiness, section strength, authority leakage, and more.

/pages/recommendations

Page Intelligence Panel

Click any URL in any tab to open a full-detail drawer: AI scores, bot access, content signals, inlinks/outlinks with anchor analysis, and per-page fixes.

Universal page drawer

Citation Readiness

Scores every page on how citable it is - statistics, dates, source attributions, quotes, tables, and lists. Identifies pages AI systems are likely to cite vs ignore.

Post-crawl analysis

Section Strength Analysis

Scores each content section (H2 block) on word count, lists, tables, numbers, and links. Identifies weak sections that undermine AI extractability.

Per-section scoring

N-Gram Phrase Intelligence

Extracts the most frequent 2- and 3-word phrases across all crawled content. Reveals your site's key topics and terminology as AI systems actually see them.

Post-crawl NLP

Resource Explorer

Full inventory of JS, CSS, images, fonts, and media. MIME-type aware classification, broken resource detection, and found-on-page counts.

MIME-aware

Architecture Intelligence

Directory tree with avg AI score, issue rate, and depth per folder. Orphan pages, weakly-linked pages, buried authority detection, excessive outlinks, and dead-end page analysis.

4 sub-views

Post-Crawl Analysis

Everything runs automatically after the crawl

Core analysis runs automatically after every crawl. On-demand tabs (N-Grams, Citation, Section Strength, Authority) run on the URLs you select, so you stay in control of what gets analysed.

Redirect Chain + Loop Analysis
Walks every redirect chain, flags loops and long chains with full hop visualisation
Canonical Intelligence
Validates every canonical tag destination - not just presence
Near-Duplicate Detection
MinHash Jaccard similarity across all page text. Exact + near-exact duplicates
Internal PageRank + HITS
Power-iteration PageRank and HITS hub/authority scores on the internal link graph
N-Gram Phrase Intelligence on-demand
Extracts 2- and 3-word phrases from selected pages to reveal key topics as AI systems see them.
Citation Readiness on-demand
Deterministic scan for statistics, dates, sources, quotes, tables, and lists. Run on selected URLs from the Citation tab.
Authority / HITS on-demand
Hub and authority scores via the HITS algorithm on selected pages. Identifies buried authority, dead ends, and leaking pages.
Section Strength on-demand
Per-chunk scoring of word count, structure, and data richness for every H2 section. Run on selected URLs.

Post-Crawl Summary - akshaydahiya.site Complete

Pages Crawled94

Resource URLs269

Avg AI Score83

Avg Citation Score71

Avg Section Strength48

Avg Authority Score62

Redirect Chains3

Redirect Loops0

Canonical Issues7

Orphan Pages4

Near Duplicates2

AI Bots Blocked0

JS-Dependent Pages4

Total Issues105

Issue Detection

22+ issue types across 11 categories

Every issue is categorised by severity and linked to the affected page. Issues feed the recommendations engine directly.

GPTBot / ClaudeBot / PerplexityBot blocked

JS-dependent content (AI can't read)

Redirect loop detected

Canonical points to 404 page

Missing H1 tag

Very thin content (<100 words)

Long redirect chain (3+ hops)

Canonical chain (A→B→C)

High boilerplate ratio (>85%)

No author signal detected

Missing meta description

Exact duplicate content

Orphan page - 0 internal inlinks

Content difficult to read (Flesch <40)

Missing structured data (Schema.org)

Images missing alt text

llms.txt not present

Buried authority page (deep + high PR)

hreflang missing return tags

Page missing from XML sitemap

URL contains uppercase characters

Near-duplicate content detected

Technical Architecture

Production-grade. Local-first.

Built on a modern full-stack architecture. Runs entirely on your machine - no cloud, no data leaving your infrastructure.

Frontend

Vite 7 + React 19 + TypeScript 5.9
React Router v7 (file-based routing)
Tailwind CSS v4 + CSS variables
react-window virtualised URL table
Recharts for visualisations

Backend

Node.js + Express REST API
SQLite via better-sqlite3 (session DB)
Raw SQL migrations (fresh per session)
Server-Sent Events for live updates
Axios with retry + redirect capture

Processing - Zero LLM Dependency

No API key required - fully algorithmic
MinHash Jaccard (duplicate detection)
HITS power iteration (authority)
Flesch-Kincaid readability scoring

Data + Export

70-column CSV export
HTML report + Markdown ZIP + llms.txt export
Session-based DB per crawl
Sitemap generation

COMING SOON

Advanced Intelligence Features

LLM-powered analysis for deep retrieval intelligence

The next frontier of AI Spider. These features go beyond deterministic signals - using large language models to understand how AI systems reason about, retrieve, and reconstruct your content.

Content Intelligence

Information Gain Identifies the unique insights your content contributes compared to competing pages - original facts, claims, and perspectives that add value beyond commonly repeated information.
Coverage Score Measures how comprehensively a page addresses a topic - whether important subtopics, entities, questions, and supporting information are present for a complete answer experience.
Coverage vs Confidence Balances breadth of coverage with how trustworthy and well-supported the information appears - evidence quality, authority signals, and supporting claims.

Entity & Knowledge Analysis

Entity Expansion Discovers related entities, concepts, brands, products, and topics connected to your content - uncovering opportunities to strengthen topical authority across a website.
Multi-Hop Relationship Analysis Maps connections between entities, concepts, and facts across content - identifying how information links together and revealing opportunities to strengthen topical depth.
Retrieval Drift Analysis Tracks how information evolves as retrieval systems gather and expand knowledge around a topic - identifying gaps between original query intent and information ultimately surfaced.

Query & Retrieval Intelligence

Query Expansion Intelligence Reveals related searches, synonyms, concepts, and topic variations associated with a target query - identifying content opportunities across a wider range of user intents.
Retrieval Compression Analyses how effectively a page can be distilled into key facts, claims, and evidence. Highlights content that is easy for modern AI systems to process and reference.
Complex Query Reformulation Breaks down sophisticated user questions into underlying concepts, entities, and information needs - identifying the content required to answer complex searches more effectively.

Comparison

AI Spider vs Screaming Frog

Screaming Frog is the gold standard for technical SEO crawling. AI Spider matches it on the fundamentals and goes further on AI-specific signals.

Feature	Screaming Frog	AI Spider
HTML page crawling	✓	✓
Resource URL discovery (images, JS, CSS)	✓	✓
AI Retrievability Scoring	✗	✓
GPTBot / ClaudeBot access analysis	✗	✓
Content chunking for AI retrieval	✗	✓
Citation Readiness scoring	✗	✓
Section Strength analysis	✗	✓
HITS hub + authority propagation	✗	✓
N-Gram phrase intelligence	✗	✓
llms.txt detection	✗	✓
Near-duplicate detection (MinHash)	✗	✓
22+ recommendation engine	✗	✓
Redirect chain detection	✓	✓
Canonical intelligence	✓	✓
Internal PageRank	✓	✓
Structured data extraction	✓	✓
Real-time crawl updates	✗	✓ (SSE live)
Architecture Intelligence (4 sub-views)	✗	✓
Page Intelligence Panel	✗	✓
Internal Link Explorer	✓	✓
Resource Explorer (MIME-aware)	✓	✓
MCP server integration	✓ (v24)	Planned
Runs locally (no cloud)	✓ (desktop app)	✓ (local server)
Pricing	€245/year	Free during beta

The AI-native crawler that sees what search bots see

Not just a crawler. A full intelligence platform.

AI Retrievability Score

Full Resource Discovery

Real-Time SSE Updates

Redirect Intelligence

Canonical Intelligence

Architecture Reporting

PageRank + HITS Authority

Near-Duplicate Detection

Actionable Recommendations

Page Intelligence Panel

Citation Readiness

Section Strength Analysis

N-Gram Phrase Intelligence

Resource Explorer

Architecture Intelligence

Everything runs automatically after the crawl

Redirect Chain + Loop Analysis

Canonical Intelligence

Near-Duplicate Detection

Internal PageRank + HITS

N-Gram Phrase Intelligence on-demand

Citation Readiness on-demand

Authority / HITS on-demand

Section Strength on-demand

22+ issue types across 11 categories

Production-grade. Local-first.

Frontend

Backend

Processing - Zero LLM Dependency

Data + Export

LLM-powered analysis for deep retrieval intelligence

AI Spider vs Screaming Frog

Start auditing AI retrievability today