Gemini Omni Prompt Templates for Multimodal Video

May 20, 2026

Gemini Omni prompts need more than a pretty description. The model family is built for any-input video creation, so a good prompt should name the source material, the visible action, the physical behavior, and the next edit you expect to make.

Related guides:

The Gemini Omni prompt card

Goal: what should this clip prove?
Inputs: text, image, video, audio, or style references.
Subject: who or what must stay recognizable?
Action: one visible movement or transformation.
Camera: one camera behavior.
Physics: gravity, reflection, liquid, light, timing, or material constraints.
Audio or rhythm: optional cue for pacing.
Next edit: what should be changed if this draft is close?
Review: what makes the output usable?

Example: prompt-only scene

Create a 6-second vertical product reveal for a compact travel camera. The camera rests on a wet stone surface at sunrise. A slow push-in begins as droplets slide down the lens ring. Keep the camera body crisp, the motion realistic, and the background soft. Review: does the first two seconds make the product feel durable and portable?

Example: reference-led remix

Use @image1 as the product shape and @video1 as the camera rhythm. Preserve the logo position, color, silhouette, and front button layout. Add a soft light sweep and a small rotation, no new hands, no text overlays. If the result is close, the next edit should increase contrast without changing the product.

Repair weak outputs

Do not add ten adjectives after a weak generation. If identity drifted, strengthen the reference constraint. If the motion feels impossible, simplify the physics. If the scene lacks purpose, rewrite the goal and review line.

Takeaway

A Gemini Omni prompt is a revision-ready production card: inputs, action, physical logic, and the next edit all belong in the same instruction.

Gemini Omni Team

Gemini Omni Team

Gemini Omni Prompt Templates for Multimodal Video