What predicts AI citations? A three-phase study.
We ran three rounds of analysis on 61 websites across 17 industries and 540+ citation checks. Phase 1 discovered which GEO pillars predict citations. Phase 2 rebalanced the algorithm based on those findings. Phase 3 built and validated a focused Citation Readiness Score that actually predicts whether AI will cite your content.
Abstract
This three-phase study examines what content signals predict AI citations across ChatGPT, Perplexity AI, and Claude — and uses those findings to build a more accurate scoring tool. Phase 1 scanned 61 websites across 17 industries and identified three pillars that strongly predict citations: Answerability, Citation Quality, and Definitions. Phase 2 rebalanced our scoring algorithm to weight these pillars more heavily. Phase 3 validated a new Citation Readiness Score built from these three pillars; high CR sites are cited substantially more often than low CR sites. The study also revealed that informational content gets cited at dramatically higher rates than product pages, giving smaller companies a clear path to competing with larger brands through educational content.
Methodology
Sample selection
We selected websites across 17 industry verticals, intentionally mixing large brands (e.g., Salesforce, Airbnb), mid-size companies (e.g., Linear, Whoop), and authoritative content publishers (e.g., Investopedia, Wirecutter). 61 sites were successfully scanned and included in the final analysis. Sites were chosen to represent a diversity of GEO optimization levels, content strategies, and market positions.
GEO scoring
Each website's homepage was scanned using the GeoSource.ai full-tier GEO scoring engine, which evaluates content across 12 pillars grouped into three tiers. Core pillars: Definition Clarity, Structured Knowledge, Topic Authority, Machine-Readable Formatting, and Answerability. Advanced pillars: E-E-A-T Signals, Citation Quality, and AI Accessibility. Expert pillars: Content Freshness, Readability, Question Coverage, and Multimedia Optimization. Composite scores range from 0 to ~150 depending on content depth.
Citation testing
For each domain, we crafted a natural-language query that a real user might ask an AI assistant about that company's space. Each query was sent to three AI platforms — ChatGPT (OpenAI), Perplexity AI, and Claude (Anthropic) — and the responses were analyzed for whether the domain appeared as a citation or recommendation. A site was considered "cited" for a platform if the AI's response mentioned the domain or brand name.
Three-phase design
Phase 1 (Discovery) used the original equally-weighted scoring algorithm to scan all sites and run citation checks. The findings identified which pillars most strongly predict citations — and revealed that the overall GEO score was not a good predictor. Phase 2 (Algorithm Update) rebalanced the pillar weights based on Phase 1 data, increasing weight for the pillars that predicted citation and reducing weight for those that did not. Phase 3 (Validation) built a focused Citation Readiness Score using only the three proven pillars (Answerability, Citation Quality, and Definitions) and ran a fresh round of 180+ citation checks to validate it against real-world AI behavior.
Limitations
This study has several important limitations. Sample size (n=61) limits statistical power. AI responses are non-deterministic — the same query may produce different citations at different times. We tested only homepage URLs, which may not represent a site's best-optimized content. GEO scores reflect a single point-in-time scan. Brand recognition and training data prevalence are confounding variables that this study design cannot fully isolate from content optimization effects. Each phase's citation checks were run at different times, introducing potential temporal variation in AI responses.
What we discovered about AI citations
This study revealed that AI citations are driven by multiple factors working together — content quality, industry context, query type, and brand awareness. The most valuable insight: we identified exactly which content signals you can control that make the biggest difference, and built those into the Citation Readiness Score.
The four factors that drive AI citations
Three specific pillars — Answerability, Citation Quality, and Definitions — have the strongest measurable impact on citation likelihood. Sites scoring high on these three are cited substantially more often than low scorers. This is what GeoSource optimizes and what you can directly improve.
Informational content (guides, how-tos, educational pages) gets cited at dramatically higher rates than product pages. This is where smaller companies gain an edge — you don't need to be the biggest brand to have the best answer to "how to choose a CRM" or "what causes back pain."
Healthcare (66.7%), travel (83.3%), and finance (75%) naturally get cited more often because AI platforms confidently answer factual questions in these domains. SaaS (13.3%) and ecommerce (6.7%) face lower baseline rates — making content quality even more important as a differentiator.
Well-known brands carry an advantage from AI training data. TripAdvisor gets cited even with a low GEO score. But brand recognition isn't static — it's built through consistent presence on review sites, forums, and industry publications. GEO optimization and brand building work together.
Discovery: Which pillars predict AI citations?
We scanned 61 sites across 17 industries with all 12 pillars. The goal: identify which pillars most strongly predict AI citations, so we can weight them accordingly and give users the most actionable score possible.
Finding 1: Industry is the strongest predictor of citation
Industry was the most powerful predictor of citation likelihood in our dataset — more so than GEO score, any individual pillar, or content quality. Informational industries where AI platforms have strong knowledge and users ask factual questions showed dramatically higher citation rates. Transaction-oriented industries where queries tend toward subjective preferences showed much lower rates. The spread between the top and bottom industries is enormous: Automotive/Real Estate at 100% vs Ecommerce at 6.7%.
Figure 1. Citation rates by industry vertical. Industries are sorted by citation rate. Green bars indicate rates above 75%, blue above 50%, amber above 33%, and red below 33%.
| Industry | Sites (n) | Avg GEO Score | Citation Rate |
|---|---|---|---|
| Automotive | 1 | 92 | 100% |
| Real Estate | 2 | 84.5 | 100% |
| Travel | 4 | 56.8 | 83.3% |
| Finance | 4 | 107.2 | 75% |
| Healthcare | 5 | 72 | 66.7% |
| News & Media | 3 | 94.3 | 55.6% |
| Education | 4 | 59 | 50% |
| Marketing | 3 | 97.7 | 44.4% |
| Legal | 3 | 88 | 33.3% |
| Fitness | 3 | 74.3 | 33.3% |
| Cybersecurity | 3 | 78.7 | 33.3% |
| B2B | 4 | 88 | 33.3% |
| Gaming | 3 | 79 | 22.2% |
| Food & Bev | 2 | 70.5 | 16.7% |
| SaaS | 10 | 77.9 | 13.3% |
| Ecommerce | 5 | 83.2 | 6.7% |
Finding 2: Which of the 12 GEO pillars predict citations?
We evaluated all 12 GEO pillars to identify which ones most strongly predict AI citation likelihood. For each pillar, we split sites into "high" (score ≥50%) and "low" (<50%) groups and compared citation rates. Three pillars emerged as clear winners: Answerability, Citation Quality, and Definitions. The remaining pillars showed weak, neutral, or even negative correlations.
| GEO Pillar | Direction | Interpretation |
|---|---|---|
| Answerability | Positive | Strongest positive predictor — direct, declarative content gets cited more often |
| Citations Quality | Positive | Sites that cite external sources earn more AI citations themselves |
| Definitions | Positive | Explicit "X is Y" definitions lift citation rates |
| Authority | Positive | Topic depth and internal linking provide a moderate positive edge |
| Freshness | Positive | Content recency provides a meaningful edge |
| Machine Readable | Positive | Modest positive signal — helpful but not a differentiator |
| Structure | Negative | Necessary but not sufficient — most sites already score well |
| Readability | Negative | Neutral-to-slightly-negative at this threshold — quality may be binary |
| E-E-A-T | Negative | Counter-intuitive negative — explored further in our E-E-A-T follow-up study |
| Question Coverage | Negative | Small sample — most sites lack FAQ content |
| Multimedia | Negative | Heavy multimedia may indicate less text for AI to parse |
Table 2. Citation lift by pillar. The top three pillars (Answerability, Citation Quality, Definitions) show consistent, meaningful positive lift. The bottom five pillars show no positive signal.
Follow-up 1: The negative E-E-A-T finding above was worth isolating from possible content-type confounds. We ran a controlled 2×2 study to test it. Read the E-E-A-T & content type follow-up →
Follow-up 2: Single-turn citation isn't the same as commercial outcome. We ran a 4-stage shopping-journey study across 40 ecommerce brands and updated the algorithm with a Recommendation Readiness Score. Read the ecommerce recommendation-survival study →
Cross-study synthesis: Six findings that held across all of our research, with practical implications for what to optimize. Read the synthesis →
Finding 3: Platform citation distribution
All three AI platforms cited sites at similar rates. ChatGPT cited 27 domains (44.3%), Claude cited 24 (39.3%), and Perplexity cited 23 (37.7%). This convergence suggests that the content signals AI platforms use for citation selection are similar — optimizing for one platform effectively optimizes for all of them.
Practical implication: You do not need to optimize separately for each AI platform. Content that is clear, well-cited, and directly answerable performs consistently across all three.
Algorithm Update: Rebalancing pillar weights
Phase 1 revealed which pillars matter and which don't. Phase 2 rebalanced the GEO scoring algorithm to reflect reality — increasing weight for the three proven predictors and reducing weight on pillars that showed no correlation with citations.
Finding 4: Algorithm rebalancing results
Based on Phase 1 findings, we rebalanced the GEO scoring algorithm — increasing weight on the pillars that demonstrably predict citations and reducing weight on those that don't. The rebalanced algorithm was then used to re-scan all sites and re-run citation checks. The change preserved the same set of pillars; only the weights moved, and the directional changes are summarized below.
| Pillar | Weight change (first → second algorithm version) |
|---|---|
| Answerability | Increased |
| Citations Quality | Increased |
| Readability | Increased |
| Definitions | Unchanged |
| Structure | Decreased |
| Machine Readable | Decreased |
| Authority | Unchanged |
Table 3. Direction of each pillar's weight change between the first and second versions of the GEO scoring algorithm. Specific weight values are not published.
What Phase 2 told us
The rebalanced algorithm improved individual pillar correlations. But we wanted to go further — rather than averaging 12 pillars where only 3 are strong predictors, we asked: what if we built a focused score from just the three pillars that matter most? That question led directly to Phase 3 and the Citation Readiness Score.
Validation: The Citation Readiness Score
Phase 2 improved individual pillar accuracy. Phase 3 went further — building a focused Citation Readiness Score from the three proven predictive pillars and testing it with a fresh round of 180+ citation checks. The results validated the approach.
Finding 5: The Citation Readiness Score — a focused predictor
Building on what Phase 1 and 2 taught us, we created a Citation Readiness (CR) Score using the three empirically-proven predictive pillars: Answerability (40% weight), Citation Quality (35%), and Definitions (25%). The CR Score gives users a focused, actionable metric for the content signals they can directly improve. The results validated this approach.
Citation rate by CR Score grade
Breaking the CR Score into four grade bands reveals a clear gradient — the higher the CR grade, the more likely a site is to be cited by AI platforms. The one exception is "Very Low" scoring slightly above "Low," which we attribute to small sample noise and brand override effects.
Figure 3. Citation rate by CR Score grade. High-grade sites (66.7%) are cited more than twice as often as Low-grade sites (30.0%). This is the gradient that the overall GEO score fails to produce.
| CR Score Grade | Citation Rate |
|---|---|
| High | 66.7% |
| Moderate | 50.9% |
| Low | 30% |
| Very Low | 38.1% |
Finding 6: Content type determines citation ceiling
Phase 1 revealed that industry matters more than any pillar. Phase 3 confirmed that this is fundamentally a content type problem. Informational content ("how to find a doctor," "what is compound interest") gets cited at dramatically higher rates than transactional content ("best CRM software," "best running shoes"). This finding led us to build content type detection into every scan.
Factual, how-to, and educational queries. Travel, healthcare, finance. AI confidently cites authoritative sources for factual answers.
Review and comparison queries. News, education, marketing. AI cites some sources but hedges on subjective elements.
"Best X" product queries. SaaS, ecommerce, gaming. AI avoids recommending specific products, preferring to list options without strong endorsement.
The practical implication: If you're a SaaS company with only product pages, your citation ceiling is roughly 13%. Creating educational, informational content — guides, definitions, how-to articles — can lift your ceiling to 50%+ because you're shifting the query type from transactional to informational.
Finding 7: How brand awareness and content quality work together
Brand recognition gives well-known sites a built-in advantage — AI has encountered them extensively in training data. But brand isn't destiny. Smaller companies with strong informational content regularly earn citations in our data, especially for specific, educational queries where expertise matters more than name recognition. The most effective AI visibility strategy combines GEO-optimized content with ongoing brand building through review sites, forums, and industry publications.
Established brands — cited across all platforms
| Domain | GEO Score | Industry | Why cited despite score? |
|---|---|---|---|
| tripadvisor.com | 18 | Travel | Category-defining brand, massive training data presence |
| kayak.com | 56 | Travel | Category leader, universally recognized in travel search |
| webmd.com | 72 | Healthcare | Dominant health information brand, decades of authority |
High score, 0% cited — brand isn't strong enough
| Domain | GEO Score | Grade | Industry | Why not cited? |
|---|---|---|---|---|
| peloton.com | 71 | D | Fitness | Product/transactional query in competitive category |
| airtable.com | 70 | D | SaaS | SaaS niche — AI cites larger competitors |
| postman.com | 70 | D | SaaS | Developer tool — AI prefers broader platform recommendations |
Site size and citation rates
Large sites were cited nearly twice as often as medium sites (49.2% vs 25.0%), despite having lower average GEO scores (79.2 vs 84.6). This confirms that brand size and training data presence are significant confounding variables. For mid-size and smaller companies, this makes content optimization even more critical — they cannot rely on brand recognition alone.
Most cited sites (100% citation rate)
These 10 sites were cited by all three AI platforms. They span a wide range of GEO scores — from TripAdvisor at 18 to Bankrate at 118 — reinforcing that citation is multifactorial. Note TripAdvisor's score of 18: this is a homepage with minimal text content, yet it gets cited 100% of the time because of overwhelming brand recognition and training data presence.
| Domain | GEO Score | Industry | Cited |
|---|---|---|---|
| bankrate.com | 118 | Finance | |
| nerdwallet.com | 104 | Finance | |
| legalzoom.com | 105 | Legal | |
| coursera.org | 99 | Education | |
| tripadvisor.com | 18 | Travel | |
| kayak.com | 56 | Travel | |
| webmd.com | 72 | Healthcare | |
| onemedical.com | 80 | Healthcare | |
| twilio.com | 87 | B2B | |
| techcrunch.com | 87 | News & Media |
Sites with room to improve
These sites were not cited in our test queries. Most are product-focused pages that could improve by creating educational, informational content alongside their product pages. The opportunity: target informational queries where content quality differentiates, rather than relying solely on product pages where brand recognition dominates.
| Domain | GEO Score | Grade | Industry | Cited |
|---|---|---|---|---|
| khanacademy.org | 31 | F | Education | |
| whoop.com | 56 | F | Fitness | |
| hellofresh.com | 60 | F | Food & Bev | |
| steampowered.com | 62 | F | Gaming | |
| datadog.com | 63 | F | B2B | |
| linear.app | 67 | F | SaaS | |
| avvo.com | 69 | D | Legal | |
| postman.com | 70 | D | SaaS | |
| airtable.com | 70 | D | SaaS | |
| peloton.com | 71 | D | Fitness |
What we built from this research
We didn't publish this study and move on. The findings changed the product. If the data shows that the overall GEO score is a weak predictor but specific pillars and content types matter enormously, the tool should reflect that reality.
Citation Readiness Score
A focused metric calculated from only the three empirically-proven predictive pillars: Answerability, Citation Quality, and Definitions. High CR sites are cited substantially more often than low CR sites, and unlike the overall GEO score the CR Score gap points in the correct direction. This is now prominently displayed alongside your overall GEO score.
Content Type Detection
Every scan now detects whether your content is informational, transactional, or educational — and tells you what that means for your citation ceiling. Informational content in healthcare and travel gets cited 83%+ of the time. Transactional SaaS pages get cited 13%. You'll get recommendations specific to your content type.
Industry Benchmarks
Your scan results now include expected citation rates based on our study data across 17 industries. Instead of an abstract score, you'll see how your site compares to others in your specific industry — because a 75% citation rate in SaaS is exceptional while it's average in healthcare.
Key definitions
The following terms are used throughout this study. Clear definitions support accurate interpretation of findings.
Generative Engine Optimization (GEO)
GEO is the practice of optimizing web content so that AI search engines can understand, trust, and cite it. It encompasses content structure, definitions, authority signals, machine readability, and answerability.
Citation Readiness Score (CR Score)
A focused metric built from three empirically-proven predictive pillars: Answerability (40%), Citation Quality (35%), and Definitions (25%). Unlike the overall GEO score, the CR Score shows a meaningful positive correlation with actual AI citation rates (+4.6 gap, correct direction).
AI Citation
An AI citation occurs when an AI search platform (ChatGPT, Perplexity, Claude) references a website or brand in its generated response. Citations appear as inline mentions, source links, or direct recommendations.
GEO Score
A composite numerical score (0-150 with 12 pillars) measuring how well a web page is optimized for AI comprehension. Our research found this is a weak overall predictor of citations (-3.1 gap), which is why we developed the CR Score as a focused alternative.
Citation Rate
The percentage of AI platforms that cited a given domain when asked a relevant query. A site cited by all 3 platforms has a 100% citation rate. A site cited by 1 of 3 has a 33.3% rate.
Lift
The relative direction in citation rate when a pillar scores high (≥50%) vs low (<50%). Positive means high-scoring sites are cited more often than low-scoring sites.