Multimedia and GEO: When It Helps, When It Hurts
The conventional advice says load up on images and video for AI visibility. Our recommendation-survival research suggests the opposite for commercial content, and "it depends" for everything else.
Introduction
For years the advice on multimedia and Generative Engine Optimization has been: more is better. Add images. Embed video. Pad the page with diagrams. AI assistants will reward you with citations.
That guidance has not survived contact with our data. In our ecommerce recommendation-survival study, pages with heavy multimedia were recommended less often by AI assistants in the shopping journey, not more. We rewrote this guide to match what the research actually showed. See all our studies for the broader picture.
Key findings
- For informational content, multimedia is neutral. Use it when it helps the reader, not for AI visibility.
- For commercial content, heavy multimedia is associated with fewer AI recommendations, not more.
- AI assistants extract from text. Multimedia that lacks text equivalents gives the model nothing to quote.
- Accessibility-driven multimedia (alt text, captions, transcripts) is the version that actually helps.
The conventional advice doesn't hold
The standard pitch is that multimedia signals topical depth, engagement, and effort — all of which AI assistants supposedly reward. It is a clean story. It is also wrong for the part of the funnel that matters most to ecommerce: the recommendation.
When we measured which product pages AI assistants actually recommended in real shopping queries, multimedia load was inversely associated with recommendation strength. The pages that survived to the final answer were not the ones with the most video and imagery. They were the ones with the cleanest, most scannable, text-dense product information.
This is one of six cross-study patterns we keep seeing: content type sets the citation ceiling, and shopping content rewards different signals than informational content. Treating multimedia as a universal lever is the mistake.
Informational vs. commercial content
Informational content
Definitions, how-tos, condition guides, glossaries, explainers.
Multimedia is roughly neutral. Use it when it genuinely helps a reader: a diagram that clarifies a concept, a demo that shows the step, an annotated screenshot that saves a paragraph of prose.
Do not add multimedia expecting an AI citation lift. The text around it is what does the work.
Commercial content
Product pages, category pages, comparison pages, shopping content.
Heavy multimedia hurts. AI assistants appear to prefer clean, scannable, text-dense pages when answering "which one should I buy?"
Lead with text. Specs, materials, sizing, use cases, differentiators — all in writing. Imagery supports the buyer; it should not replace the description.
Why this might be happening
A working hypothesis, offered tentatively: AI assistants extract from text. Multimedia-heavy pages tend to have less clear text per visible byte — more pixels, fewer parseable sentences. That means the model has less raw material to quote, summarize, or anchor a recommendation against.
A product page with eight hero images and a forty-word description gives the assistant almost nothing to work with. The same product page with crisp specs, clear use cases, and an honest pros-and-cons paragraph gives the assistant something to actually cite.
This lines up with another cross-study finding: brand recognition swamps page quality, but among pages of similar brand strength, text density and structural clarity are what differentiate the ones that get recommended from the ones that get skipped.
Where multimedia still earns its keep
The version of multimedia that holds up is the accessible kind. Alt text, captions, and transcripts convert visual content into text the model can parse — and the same text that helps screen-reader users helps AI assistants. Treat accessibility as the brief, not as an afterthought.
- Descriptive alt text that explains what the image shows and why it matters
- Full transcripts for videos and podcasts
- Captions and chapter markers on demos and walkthroughs
- Text summaries paired with infographics and charts
See AI Accessibility and GEO for the longer treatment.
Decorative multimedia is empty calories
These patterns look polished to a human visitor and contribute nothing the assistant can actually use. On commercial pages they may actively suppress recommendation strength.
- Hero videos that autoplay over the answer the reader came for
- Image galleries with no descriptive text around them
- Embedded explainer videos with no transcript or written summary
- Carousels and lightboxes that hide product details behind clicks
- Replacing written specs with screenshots of specs
What to do instead
1. Don't strip multimedia that serves the user
If a video genuinely shows a buyer what they need to see, keep it. The point is to stop adding multimedia expecting an AI citation lift, not to gut your existing experience.
2. Audit text density on commercial pages
Open your top product pages. Strip the imagery in your head. Is there enough text left for an AI assistant to write a recommendation? If not, the text is doing too little.
3. Mirror every visual in text
If the image shows a spec, write the spec. If the video walks through setup, write the steps. Treat visuals as a second channel, not the only channel.
4. Invest in alt text and transcripts
This is the form of multimedia work that genuinely compounds. It helps real users, and it gives AI assistants something to extract.
5. Optimize the validated pillars first
Answerability, citation quality, and clear definitions move the needle in our studies. Multimedia is, at best, a wrapper around those.
The bigger picture
Multimedia is a wrapper around your text. It is not a substitute for it.
AI assistants quote what they can read. Pages that give the model strong, scannable text win recommendations. Pages that hide their substance behind imagery and video lose them — especially in the commercial moment when a buyer asks "which one should I get?"
Add multimedia for the reader. Add text for the AI assistant. Make sure those two audiences are looking at the same content.