← All how-tos·How to

How to track citations.

This is a how-to for building a minimum-viable AI citation tracker yourself. It's meant for engineers who want to understand the shape of the problem before deciding whether to build or buy. AIRRNK does all of this and more, but the DIY version is a useful reference.

Time4–8 hours to build, ongoing maintenance·DifficultyAdvanced

01
Build a query panel
Pick 20–50 buyer queries that your ideal customers might ask an AI. These are your test probes. Bad queries: your brand name (always yields a result, useless as signal). Good queries: 'what's the best X for Y under $Z'.
02
Call the APIs in a clean session
OpenAI (with browsing enabled via a web-search tool), Anthropic (Claude 4.7 with the web search tool), Perplexity (Sonar API), Google (Gemini with grounding). Run each query with temperature 0, no memory, no context. One query, one response.
03
Parse the response
Each response has prose + structured citation blocks. Extract URLs from citation blocks. Extract 15+ token snippets from the prose. Store both, plus the full raw response for audit.
04
Match against your site
For each URL, check if it's in your site's URL space (domain match). For each snippet, compute an embedding and compare against a pre-computed embedding index of your pages. Threshold around 0.88 cosine similarity for paraphrase detection.
05
Schedule it
Run every 6 hours. Store results in a time-series database. Variance is high — don't chase single-day swings; use 7-day rolling windows.
06
Build the deduper and competitor tracker
The hardest part. Near-duplicate paragraphs (model regenerations) need collapsing. Competitor citations need detection (maintain a competitor URL list, run the same matcher). This is where most DIY implementations fall over.

What to expect

A DIY tracker will cost you roughly $80–150/month in API calls for a single site, assuming 50 queries × 4 platforms × 4 runs/day. Maintenance runs 2–4 hours a week as API contracts drift. Our honest take: build it if you want to understand the shape; otherwise pay us $49/month and point the engineering time at something that compounds.

Signals · sourced

72.4%of cited pages include ≥2 question-based H2sCited-page pattern audit, 2026

+30–40%citation lift when GEO schema is correctly appliedAggarwal et al. · Princeton

42%of B2B buyer research now starts inside an LLMForrester Research, 2026

Written by

The AIRank Editorial Team

Research & editorial, AIRank

The AIRank editorial team runs the 47-point scanner, the Observer pings, and the GEO research programme every week. Writing is reviewed by the core engineers who build the Injector, Blaster, and Surgeon agents.

About the team →

Sources & further reading

Aggarwal et al. · GEO: Generative Engine Optimization (30–40% citation lift)· Princeton / arXiv
Google Search Central · Guidelines for AI-generated content· Google
Schema.org · Article / BlogPosting structured-data reference· Schema.org

How citations work→AIRRNK API reference→

Build a query panel

Call the APIs in a clean session

Parse the response

Match against your site

Schedule it

Build the deduper and competitor tracker

The AIRank Editorial Team