Does adding images and video improve AI citation?

Not on its own. Our ecommerce recommendation study found pages with heavy multimedia were recommended less often by AI assistants, not more. Multimedia is neutral for informational content and a drag on commercial content unless every visual is mirrored in clear text.

Should I remove images and video from my site?

No. If multimedia serves the reader, keep it. The point is to stop adding multimedia expecting an AI citation lift, and to make sure every visual has a text equivalent the model can actually parse.

What kind of multimedia still helps for GEO?

Accessibility-driven multimedia. Alt text, captions, transcripts, and written summaries convert visual content into text an AI assistant can quote. Decorative multimedia without text equivalents is empty calories.

Content Strategy

Multimedia and GEO: When It Helps, When It Hurts

The conventional advice says load up on images and video for AI visibility. Our recommendation-survival research suggests the opposite for commercial content, and "it depends" for everything else.

Updated June 2026

For years the advice on multimedia and Generative Engine Optimization has been: more is better. Add images. Embed video. Pad the page with diagrams. AI assistants will reward you with citations.

That guidance has not survived contact with our data. In our ecommerce recommendation-survival study, pages with heavy multimedia were recommended less often by AI assistants in the shopping journey, not more. We rewrote this guide to match what the research actually showed. See all our studies for the broader picture.

Key findings

For informational content, multimedia is neutral. Use it when it helps the reader, not for AI visibility.
For commercial content, heavy multimedia is associated with fewer AI recommendations, not more.
AI assistants extract from text. Multimedia that lacks text equivalents gives the model nothing to quote.
Accessibility-driven multimedia (alt text, captions, transcripts) is the version that actually helps.

The conventional advice doesn't hold

The standard pitch is that multimedia signals topical depth, engagement, and effort — all of which AI assistants supposedly reward. It is a clean story. It is also wrong for the part of the funnel that matters most to ecommerce: the recommendation.

When we measured which product pages AI assistants actually recommended in real shopping queries, multimedia load was inversely associated with recommendation strength. The pages that survived to the final answer were not the ones with the most video and imagery. They were the ones with the cleanest, most scannable, text-dense product information.

This is one of six cross-study patterns we keep seeing: content type sets the citation ceiling, and shopping content rewards different signals than informational content. Treating multimedia as a universal lever is the mistake.

Informational vs. commercial content

Informational content

Definitions, how-tos, condition guides, glossaries, explainers.

Multimedia is roughly neutral. Use it when it genuinely helps a reader: a diagram that clarifies a concept, a demo that shows the step, an annotated screenshot that saves a paragraph of prose.

Do not add multimedia expecting an AI citation lift. The text around it is what does the work.

Commercial content

Product pages, category pages, comparison pages, shopping content.

Heavy multimedia hurts. AI assistants appear to prefer clean, scannable, text-dense pages when answering "which one should I buy?"

Lead with text. Specs, materials, sizing, use cases, differentiators — all in writing. Imagery supports the buyer; it should not replace the description.

Why this might be happening

A working hypothesis, offered tentatively: AI assistants extract from text. Multimedia-heavy pages tend to have less clear text per visible byte — more pixels, fewer parseable sentences. That means the model has less raw material to quote, summarize, or anchor a recommendation against.

A product page with eight hero images and a forty-word description gives the assistant almost nothing to work with. The same product page with crisp specs, clear use cases, and an honest pros-and-cons paragraph gives the assistant something to actually cite.

This lines up with another cross-study finding: brand recognition swamps page quality, but among pages of similar brand strength, text density and structural clarity are what differentiate the ones that get recommended from the ones that get skipped.

Where multimedia still earns its keep

The version of multimedia that holds up is the accessible kind. Alt text, captions, and transcripts convert visual content into text the model can parse — and the same text that helps screen-reader users helps AI assistants. Treat accessibility as the brief, not as an afterthought.

Descriptive alt text that explains what the image shows and why it matters
Full transcripts for videos and podcasts
Captions and chapter markers on demos and walkthroughs
Text summaries paired with infographics and charts

See AI Accessibility and GEO for the longer treatment.

Decorative multimedia is empty calories

These patterns look polished to a human visitor and contribute nothing the assistant can actually use. On commercial pages they may actively suppress recommendation strength.

Hero videos that autoplay over the answer the reader came for
Image galleries with no descriptive text around them
Embedded explainer videos with no transcript or written summary
Carousels and lightboxes that hide product details behind clicks
Replacing written specs with screenshots of specs

What to do instead

1. Don't strip multimedia that serves the user

If a video genuinely shows a buyer what they need to see, keep it. The point is to stop adding multimedia expecting an AI citation lift, not to gut your existing experience.

2. Audit text density on commercial pages

Open your top product pages. Strip the imagery in your head. Is there enough text left for an AI assistant to write a recommendation? If not, the text is doing too little.

3. Mirror every visual in text

If the image shows a spec, write the spec. If the video walks through setup, write the steps. Treat visuals as a second channel, not the only channel.

4. Invest in alt text and transcripts

This is the form of multimedia work that genuinely compounds. It helps real users, and it gives AI assistants something to extract.

5. Optimize the validated pillars first

Answerability, citation quality, and clear definitions move the needle in our studies. Multimedia is, at best, a wrapper around those.

The bigger picture

Multimedia is a wrapper around your text. It is not a substitute for it.

AI assistants quote what they can read. Pages that give the model strong, scannable text win recommendations. Pages that hide their substance behind imagery and video lose them — especially in the commercial moment when a buyer asks "which one should I get?"

Add multimedia for the reader. Add text for the AI assistant. Make sure those two audiences are looking at the same content.

What Is Generative Engine Optimization (GEO)?

What Is a GEO Score?

Introduction