A systematic audit of your site's crawlability, indexing, and technical
health from Google's perspective. Covers robots.txt, sitemaps, canonical
+URLs, redirect chains, Core Web Vitals, Search Console diagnostics, and
+recent platform changes (CrUX coverage for LCP/INP and indexing API
−URLs, redirect chains, Core Web Vitals, and Search Console diagnostics.
+considerations).
## When to use
- Launching a new site or redesigning an existing one
- Diagnosing sudden drops in Google organic traffic
- Migrating domains, changing URL structures, or consolidating subdomains
+- Setting up or auditing robots.txt and sitemap files for a Next.js or
−- Setting up or auditing robots.txt and sitemap files for a Next.js app
+ single-page app
- Verifying that server-rendered content is accessible to Googlebot
−- Preparing for a Core Web Vitals assessment
+- Preparing for a Core Web Vitals assessment (field + lab data)
## When NOT to use
+- Content quality and keyword strategy — use **keyword-research** and
−- Content quality and keyword strategy — use **keyword-research** and **content-seo-strategy**
+ **content-seo-strategy**
- Schema markup specifics — use **schema-markup**
- Social media metadata (OG tags for sharing) — that is on-page SEO
- Paid search (Google Ads) campaigns
@@ −24 +28 @@
| Concept | Description |
|---------|-------------|
+| Crawl budget | The number of pages Googlebot will crawl per visit; relevant
+for very large, frequently-updated sites (see Google Crawl Budget
+guide). |
+| Index coverage | Which pages are indexed vs. excluded and why (via Search
+Console). |
+| Canonical URL | The single authoritative URL for a page when duplicates
+exist. |
+| robots.txt | A file at the site root that controls crawler access to paths. |
+| sitemap.xml | An XML file listing indexable URLs with metadata; use a
+sitemap index for >50k URLs. |
+| Redirect chain | Multiple sequential redirects that waste crawl budget and
+slow discovery. |
+| Core Web Vitals | LCP, CLS, INP — Google's page experience metrics. LCP and
+INP are now widely measurable in major browsers (see web.dev). |
+| Rendering mode | SSR, SSG, ISR, or CSR — affects whether Googlebot sees
+content on first crawl. Also consider Navigation API support for SPAs.
+
+Notes and authoritative sources: Google’s crawl budget guidance
+(developers.google.com/crawling/docs/crawl-budget) explains when to
+worry about crawl budget; web.dev documents the Baseline availability of
+LCP and INP APIs (Dec 2025), which affects how CrUX collects field
−| Crawl budget | The number of pages Googlebot will crawl per visit; wasted on low-value URLs |
+metrics for these Core Web Vitals.
−| Index coverage | Which pages are indexed vs. excluded and why (via Search Console) |
−| Canonical URL | The single authoritative URL for a page when duplicates exist |
−| robots.txt | A file at the site root that controls crawler access to paths |
−| sitemap.xml | An XML file listing all indexable URLs with metadata |
−| Redirect chain | Multiple sequential redirects (301 → 301 → 200) that waste crawl budget |
−| Core Web Vitals | LCP, CLS, INP — Google's page experience metrics |
−| Rendering mode | SSR, SSG, ISR, or CSR — affects whether Googlebot sees content on first crawl |
## Workflow
### Step 1: Audit robots.txt
+Check your `public/robots.txt` or generate it dynamically with Next.js.
+Make sure you do not accidentally block indexable pages (including API
+routes you rely on for prerendering). Example dynamic robots file for a
−Check your `public/robots.txt` or generate it dynamically with Next.js:
+Next.js app:
```typescript
// app/robots.ts
@@ −51 +72 @@
userAgent: "*",
allow: "/",
disallow: ["/api/", "/admin/", "/_next/"],
− },
− {
− userAgent: "GPTBot",
− allow: "/blog/",
− disallow: "/",
},
],
sitemap: `${baseUrl}/sitemap.xml`,
};
}
```
+
+Checklist items for robots.txt:
+- Is the file at /robots.txt on the production domain?
+- Does it accidentally disallow important paths (APIs used for SSR,
+ sitemaps, or public content)?
+- Does your staging/preview environment serve X-Robots-Tag:noindex and
+ block preview URLs?
+
+### Step 2: Generate and validate sitemaps
+
+Sitemaps tell Google what to crawl and when. Ensure sitemap(s):
+- Are reachable at the production domain (https://yourbrand.com/sitemap.xml)
+- Use a sitemap index if >50,000 URLs or >50MB (uncompressed)
+- Use lastmod correctly for frequently updated content
+- Match canonical URLs and do not include noindex pages
−### Step 2: Generate a dynamic sitemap
+Example dynamic sitemap generator (Next.js):
```typescript
// app/sitemap.ts