How Rampify Works

Rampify brings SEO Intelligence Inside Your IDE by combining a hosted crawler, structured data extraction, and editor-native tools built on the Model Context Protocol (MCP). This page explains the architecture that powers it.

Overview#

Rampify's architecture flows from ground truth to natural language interface through four distinct layers:

  1. Reality — Your actual site and Google Search Console data
  2. Rampify API — HTTP crawler and structured data storage
  3. MCP Server — Stateless intelligence layer exposing structured tools
  4. LLM — Natural language interface powered by deterministic data

Each layer handles what it does best, ensuring your AI assistant gets deterministic, real-time SEO context grounded in how your actual website behaves — not guesses, heuristics, or hallucinations.

The Layered Architecture#

Rampify uses a layered architecture that puts each technology in its proper place:

Reality (Your Site + Google)
Your actual site, GSC data
Ground truth
↑ accesses
Rampify API (Data Layer)
Crawling, GSC integration, rules
Deterministic
(Authoritative sources)
↑ fetches
MCP Server (Intelligence Layer)
get_page_seo(), crawl_site()
Deterministic
(Structured data access)
↑ queries
LLM (Natural Language Interface)
"What SEO issues should I fix?"
Non-deterministic
(Conversation, explanation)

Layer 1: Reality (Ground Truth)

The foundation of Rampify's architecture is your actual site and Google Search Console data. This is the source of truth. What actually exists, how Google sees it, and what performance data shows.

Instead of relying on AI to guess about your site's SEO health, Rampify starts with deterministic facts: HTTP responses, HTML structure, GSC indexing status, and search performance metrics. Everything flows from this ground truth.

Layer 2: Rampify API (Data Collection & Storage)

Rampify uses a hosted, HTTP-only crawler designed specifically for SEO diagnostics. It's built on axios for HTTP requests and cheerio for server-side HTML parsing (jQuery-like traversal), with deterministic sitemap-first crawling and navigation fallback when needed.

Why HTTP-only Instead of Headless Browser?

Google's crawler (Googlebot) primarily operates as an HTTP crawler, not a full JavaScript runtime. While Google can execute JavaScript, most SEO signals come from the initial HTML response. An HTTP-only approach is:

  • Faster - No browser overhead, Puppeteer crashes, or memory leaks
  • More reliable - Deterministic behavior, easier to debug and reproduce
  • Closer to reality - Matches how Googlebot actually sees most sites
  • Better for SEO - Server-side rendered content is what gets indexed first

This approach works best for sites with server-side rendering (Next.js, Remix, Astro) or static generation. For heavily client-rendered SPAs, you'll want server-side rendering for SEO anyway.

How & When Crawling Happens#

The crawler follows a sitemap-first approach—it looks for your sitemap.xml (or discovers it through robots.txt), prioritizes those URLs, and only falls back to navigation crawling if no sitemap is found. Each URL gets a 10-second timeout, redirects are followed up to 5 hops (tracking the full chain), and canonical URLs are identified to avoid duplicates.

Crawls run on-demand (free tier) or automatically every 24 hours (paid plans). Each crawl includes change detection, automatically flagging new pages that appeared since the last run. This ensures your IDE always has fresh SEO intelligence.

What Gets Collected#

Extracted SEO Data

Page metadata (title, description), canonical URL, robots directives, status codes & redirect chains, Open Graph tags, JSON-LD/Microdata/RDFa structured data, internal links (with link graph relationships), and content depth signals (word count, heading structure).

We do not scrape full content bodies to avoid overreach and keep page evaluation fast.

Your Website Metadata as a Queryable API#

Here's what makes Rampify different: we store metadata about your site in a structured database that you can query through the Rampify API. This isn't just another crawler report you download once, it's persistent, queryable infrastructure for your site's SEO state.

SEO has hundreds of nitpicky details to track: canonical tags, redirect chains, schema markup, internal link counts, indexing status, meta descriptions, Open Graph tags. The only way to get this right is to retrieve, store, and query this data deterministically. AI cannot hold all this information in context or reliably follow instructions across conversations. It needs ground truth to retrieve from.

This gives you something unique: an API for your own website's metadata. You can ask questions like "Which pages are missing descriptions?" and get back structured data from your actual site—not AI guesses, not stale reports, but current facts stored in a queryable format. Cross-URL comparisons, issue analysis, internal link mapping, indexability evaluation, all backed by deterministic storage, not LLM memory.

The result is a separation of concerns that developers understand: persistent storage handles ground truth, APIs provide structured access, and AI provides the natural language interface on top.

Layer 3: MCP Server (Intelligence Layer)

Rampify integrates with IDEs through the Model Context Protocol (MCP), a standard that lets tools expose structured actions your AI assistant can call.

This layer is intentionally stateless. The MCP server acts as a bridge between your IDE, Rampify's API, and your live site.

Why Stateless Architecture?

A stateless design means SEO data is always fresh and consistent:

  • Always current - Every query fetches latest crawl data, no stale cache
  • Deterministic - Same query always returns same result for current site state
  • Auditable - You can trace every recommendation back to source data
  • Reliable - No cache invalidation bugs, no sync issues between machines
  • Lightweight - No local database, no storage management in your IDE

The tradeoff: slightly higher latency per query. But for SEO data that changes slowly (hours/days, not seconds), fresh data beats cached speed.

MCP Tools Available#

Rampify exposes tools across three categories:

  • Analysis tools - Get SEO data for pages and sites (get_page_seo, get_issues, get_gsc_insights)
  • Generation tools - Create metadata and schema grounded in real data (generate_meta, generate_schema)
  • Crawl tools - Trigger fresh analysis on-demand (crawl_site)

For complete tool documentation, parameters, and examples, see the MCP Tools Reference.

Alternative Integration: AGENTS.md#

In parallel to the MCP server, Rampify can also sync a local AGENTS.md file in your project. This Git-friendly markdown file contains all detected issues, affected pages, structured recommendations, and cross-page patterns, essentially a persistent report that lives in your repository.

Both the MCP server and AGENTS.md file are optional. Use one, both, or neither depending on your workflow. The MCP server excels at real-time queries during development, while AGENTS.md gives you a versionable snapshot you can commit, diff, and track over time. Together, they streamline SEO intelligence into your existing development workflow.

Layer 4: LLM (Natural Language Interface)

The top layer is your AI assistant using natural language to interact with all the deterministic data below. Rampify is not an LLM wrapper—you bring your own model (Claude, GPT-4, etc.) and your own subscription. Rampify simply delivers SEO context to your model of choice through the MCP protocol.

The MCP layer enables a clear separation of concerns between what AI should and shouldn't handle:

AI Should NOT Do
1.Measure indexing status (requires GSC API)
2.Detect technical SEO issues (requires crawling)
3.Track performance trends (requires historical data)
4.Make factual claims without sources
AI Should Do
1.Natural language interface ("What's wrong with my SEO?")
2.Contextual recommendations based on real data
3.Generate code/markup grounded in page content
4.Explain SEO concepts and priorities

When you ask your AI assistant "How's the SEO on this page?", it calls get_page_seo() to retrieve deterministic data, then synthesizes that into a natural language explanation. The AI doesn't guess—it reports what the crawler actually found.

Example: What get_page_seo() Returns

Here's what actual data looks like (simplified):

{
  "url": "/blog/seo-tips",
  "status_code": 200,
  "metadata": {
    "title": "10 SEO Tips for Developers",
    "description": "Learn how to optimize your site...",
    "canonical": "/blog/seo-tips"
  },
  "indexability": {
    "robots_meta": "index, follow",
    "in_sitemap": true,
    "internal_links_count": 3
  },
  "schema": {
    "detected": true,
    "types": ["Article", "BreadcrumbList"],
    "valid": true
  },
  "issues": []
}

Your AI uses this structured data to generate responses like: "This page looks good - indexed, has schema markup, 3 internal links. Title could be shorter (currently 65 chars, aim for 60)."

Summary#

Rampify's architecture separates deterministic data collection from AI-powered analysis. The HTTP crawler extracts structured SEO fields from your live site, stores them in a queryable format, and exposes that data through MCP tools your AI assistant can call.

When you ask "How's my SEO?", your AI doesn't guess, it queries get_page_seo() for actual crawl data, then synthesizes that into natural language. When you generate metadata, it's grounded in real page content and existing SEO signals. When you check indexability, it pulls from Google Search Console.

The result: SEO workflows that happen in your editor, powered by deterministic data instead of LLM assumptions.

Technical Deep Dive

Ready to See How It Works in Practice?

Install the Rampify MCP server and start getting real SEO intelligence in your IDE. Works with Cursor, VS Code, Claude Code, and any MCP-compatible editor.

Install MCP Server