Video Series
Gemini Native (Video)
Google Veo video generation model
POST
Introduction
Veo is a multimodal video generation model from Google Vertex AI, supporting text-to-video (T2V), first-frame constraint, and first-last frame constraint (3.1 series only) for generating coherent videos. Call through MixRoute unified video interface: first submit task to gettask_id, then query task to poll status and retrieve results.
Authentication
Bearer Token, e.g.,Bearer sk-xxxxxxxxxx
Supported Models
| Model ID | Description |
|---|---|
veo-3.0-fast-generate-001 | Text-to-video, first-frame mode, fast version (audio included by default) |
veo-3.1-fast-generate-preview | Text-to-video, first-frame/first-last frame mode, fast version |
veo-3.0-generate-preview | Text-to-video, first-frame mode |
veo-3.1-generate-preview | Text-to-video, first-frame/first-last frame mode |
Call Flow
- Submit Task:
POST /v1/video/generations, passmodel,promptand Veo-specific parameters. - Poll Status:
GET /v1/video/generations/{task_id}, untilstatusissucceededorfailed. - Get Result: When successful, the
urlin the response contains video data (Veo may returndata:video/mp4;base64,...or OSS link).
Veo-Specific Parameters
Video generation prompt describing the scene and actions.
Video duration (seconds), supported values:
4, 6, 8.Aspect ratio, only supports:
16:9, 9:16.Resolution:
720p, 1080p.First frame reference image (URL or Base64), for image-to-video/first-frame constraint.
Last frame reference image (only
veo-3.1 series supported), works with first frame for first-last frame constraint.Whether to generate synchronized audio. Fast version models ignore this parameter and always include audio.
Number of videos to generate per request, range
1-4.personGeneration, addWatermark, seed), see Submit Video Task.
Request Examples
- Submit Veo Task (Text-to-Video)
- Query Task Status