Search Engine Indexing

Publishing a page and being findable are two different things. Between them sits a pipeline you don't control (discovery, crawling, indexing), and pages drop out of it silently. The index that pipeline feeds is the foundation of being found everywhere it matters now: the classic results page and the AI answer. Understanding it is the difference between "we shipped it" and "people can find it."

From published to findable#

Every search surface (Google, Bing, and increasingly the AI answer engines) runs some version of the same pipeline:

Discovery: the engine learns your URL exists, from your sitemap, a link, or a previous crawl.
Crawling: it fetches the page and reads what's actually served.
Indexing: it decides the page is worth storing and retrieving.
Ranking / citation: the page competes to appear in results or get cited in an answer.

Each stage is a gate. A page that fails discovery is invisible no matter how good it is; a page that's crawled but judged thin never reaches the index; an indexed page with no authority never ranks. When something "isn't showing up in Google," the first question is always which gate did it fail?, and that's exactly what coverage states tell you.

What coverage states actually mean#

Google reports a verdict per URL. Four of them cover almost everything you'll see:

Coverage state	What it means	What to do
Submitted and indexed	In the index, eligible to appear in results. The goal state, and Rampify's definition of an indexed page.	Nothing. This is what done looks like.
Discovered – currently not indexed	Google knows the URL exists but hasn't bothered to crawl it. Usually a priority signal: the page looks low-value from the outside.	Strengthen internal links to it, make sure it's in your sitemap, and give it time, or improve the content before requesting indexing.
Crawled – currently not indexed	Google fetched the page and chose not to store it. Usually a quality signal: thin, duplicative, or boilerplate content.	Improve the page before resubmitting. Repeated submission without changes teaches Google to ignore you.
Excluded	Kept out deliberately: robots.txt, a noindex tag, a canonical pointing elsewhere, or judged duplicate.	Confirm it's intentional. Excluded-by-design is healthy; excluded-by-accident is a silent traffic leak.

Ranking: the algorithm after the index#

Reaching the index isn't the finish line: it's admission to the competition. When someone searches, the engine doesn't scan the web; it queries its index and runs an algorithmic selection over everything stored there. The signals are familiar even where the weights are secret: relevance (does the page actually answer the query), authority (do other credible pages and entities point at it), freshness (is it current, where currency matters).

The failure mode at this gate is indexed but invisible: the page is in the index, eligible to appear, and sits at position 40 where nobody will ever see it. That's not an indexing problem, and indexing fixes won't move it; it's a competition problem. The levers are different: tighter content-to-query match, internal links that concentrate authority on the pages that matter, and picking queries you can plausibly win. The most movable ground is the within-reach queries (page-two rankings and high-impression-low-CTR results), which Google Search surfaces directly.

One index, two retrieval surfaces#

The reason indexing is worth understanding, rather than delegating to a checklist, is that the crawl-and-index foundation now feeds two different kinds of retrieval:

	Classic search	AI answers
What retrieval looks like	Ranks indexed pages into a results page	Selects a handful of sources to ground and cite one answer
Where it draws from	The engine's index	Training corpora plus live retrieval from search and per-crawler indexes
Selection dynamic	Position 1 through N; visibility degrades gradually	Cited or absent; a few sources carry the answer
What you optimize	Ranking for queries within reach	Being the clearest, most credible answer to cite

These aren't separate disciplines — they're two facets of the same thing. Both start from a crawler that found your URL, fetched it, read real content, and stored it. The hygiene work (a sitemap, clean internal links, server-rendered content, a deliberate robots policy) pays on both surfaces at once. Where they genuinely differ:

AI crawlers keep their own indexes. OAI-SearchBot, PerplexityBot, ClaudeBot and friends fetch and store independently of Google, and your robots.txt is a per-crawler decision: being indexed by Google says nothing about being readable to the bots that feed answers. AI Visibility covers that side's three gates (discoverable, allowed, readable), the same pipeline told for AI crawlers.
Most AI crawlers don't run JavaScript, not even as a deferred second pass the way Google does. A client-rendered page is a handicap in classic search and a wall in AI answers.
Answers cite; results rank. A results page degrades gracefully: position 8 still earns some clicks. An answer is winner-take-most: a few citations carry it, and everything else might as well not exist. Whether you're among them is your AI Answer Presence.

Same foundation, different scoreboard

Getting indexed is the shared prerequisite; what changes per surface is how winners are picked. Fix the foundation once, then optimize each scoreboard on its own terms: rankings on one, citations on the other.

Why the raw data goes unused#

Search Console holds all of this, and most developers check it once and never go back. The failure modes are consistent:

Data without direction. A thousand queries, 47 not-indexed pages, no "fix this next." Everything is reported; nothing is prioritized.
Scattered context. Query data in Performance, coverage in Pages, per-URL detail in URL Inspection. Understanding one page means mentally assembling three tabs.
Outside your workflow. The data lives in a browser tab; your work lives in an editor and a repository. Every check costs a context switch.
Expertise required. "Crawl anomaly detected" and "average position 7.8" are only actionable if you already know SEO. The interface assumes you do.

The result isn't that the data is wrong; it's that fixable issues accumulate unread.

Check it yourself#

Don't trust site: queries; they're a rough sketch, not the index. For a real answer about one URL, use Search Console's URL Inspection and read the coverage state and the crawled as date. If the verdict surprises you (a page you care about sitting in "Discovered – currently not indexed" for weeks), that's the pipeline telling you where it stalled.

How Rampify surfaces it#

Rampify pulls your Search Console data on a schedule and folds the verdicts back into the pages they belong to, so the pipeline is readable instead of reconstructed:

Per-URL verdicts sit in the Website table next to each page's crawl health: coverage state, canonical choice, last crawl, and 28-day performance in one row.
Indexed pages over time charts on Home, reconstructed from Google's own URL inspection log. "Indexed" always means Submitted and indexed: one definition everywhere, so numbers never disagree between pages.
Query and page performance reads cleanly in Google Search, including the within-reach queries (page-two rankings, high-impression low-CTR) where small fixes move real traffic.
The AI side of the foundation is checked on the same crawl: the Website page's AI visibility card reports render mode and which AI crawlers your robots.txt admits.
Your AI gets the same data over MCP: get_search_performance for the site-wide picture, get_page_intelligence for one page's full story, so "how is this page doing?" is answerable without leaving your editor.

What to do about pages that go missing#

The remediation follows the gate that failed. Not discovered: fix your sitemap and internal links, because orphan pages stay invisible. Discovered but not crawled: raise the page's apparent value (link to it from pages that matter) and give Google time. Crawled but not indexed: the content itself needs work; make it less thin or less duplicative before asking again. Excluded: audit your robots.txt, noindex tags, and canonicals, and make sure every exclusion is a decision, not an accident. Indexed but never seen: that's the ranking gate, not an indexing problem; compete on the queries within reach.

And if the whole site under-indexes despite good content: check what crawlers actually receive. Client-side rendering is the classic invisible failure, fine in a browser and empty to a bot, and doubly costly because AI crawlers won't render it at all. Rampify flags it on Home when it detects it.