AI Video & Image Generation API — Kling, Google, ByteDance, OpenAI

Unified API for 116+ AI models. Pay-per-use, no subscriptions. One integration for video, image, audio, and text generation.

Why Glio

Unified Interface — One API for all models. No need to integrate each provider separately.
Multi-provider Routing — Automatic failover and load balancing across providers.
Pay-per-use — Token-based billing. 1 GL = $0.01 USD. No subscriptions, no minimum payment.
Simple Integration — Two endpoints: POST /v1/jobs to create, GET /v1/jobs/{id} to poll.

Models

Video Generation

Kling

Kling 3.0 supports single-shot and multi-shot video generation with element references, start/end frames, and advanced mode controls.

Kling 3.0 — Kling 3.0 with single-shot and multi-shot generation modes. API docs
Kling 3.0 Motion Control — Transfer motion from a reference video to a character image with Kling 3.0. API docs
Kling 2.6 Pro Text-to-Video — High-end cinematic video generation with native audio support. API docs
Kling 2.6 Pro Image-to-Video — High-end cinematic video from source image with native audio support. API docs
Kling 2.6 Motion Control — Transfer motion from a reference video to a character image. API docs
Kling 2.1 Standard Image-to-Video — Cost-effective video from images with the Standard tier. API docs
Kling 2.1 Pro Image-to-Video — High-quality video from images with the Pro tier. API docs
Kling 2.1 Master Text-to-Video — Premium quality video generation with the Master tier. API docs
Kling 2.1 Master Image-to-Video — Premium quality video from images with the Master tier. API docs
Kling 2.5 Turbo Image-to-Video — Fast video generation from source image. Optimized for physics and motion. API docs
Kling 2.5 Turbo Text-to-Video — Fast video generation optimized for physics and motion. No native audio. API docs
Kling AI Avatar Pro — Premium lip sync video generation from image and audio (1080p). API docs
Kling AI Avatar Standard — Lip sync video generation from image and audio (720p). API docs

Google Veo 3.1 is Google DeepMind's state-of-the-art text-to-video model with both Quality and Fast generation modes. Ideal for filmmakers and content creators requiring realistic video from text descriptions. Generates videos at 16:9 or 9:16 aspect ratios with support for video extension chains.

Google Veo 3.1 — Google Veo 3.1 text-to-video generation with quality and fast modes. API docs
Google Veo 3.1 Image-to-Video — Google Veo 3.1 image-to-video generation. Use 1 image for dynamic extension, or 2 images for first/last frame transition. API docs
Google Veo 3.1 Reference-to-Video — Google Veo 3.1 reference-to-video generation. Generate videos based on 1-3 reference images. Fast mode only, 16:9 aspect ratio. API docs

ByteDance

Seedance 2 accepts text-only generation, first and last frame guidance, image references, video references, and audio references in one unified KIE request.

Seedance 2 — ByteDance Seedance 2 video generation with optional start frame, end frame, image references, video references, and audio references. API docs
Seedance 2 Fast — ByteDance Seedance 2 Fast video generation for lower-cost, faster multimodal drafts and iterations. API docs
Seedance 1.5 Pro Text-to-Video — High-quality text-to-video generation with optional audio. API docs
Seedance 1.5 Pro Image-to-Video — High-quality image-to-video generation with optional audio. Supports 1-2 input images. API docs
Seedance 1.0 Pro Text-to-Video — High-quality text-to-video generation with full parameter control. API docs
Seedance 1.0 Pro Image-to-Video — High-quality image-to-video generation with full parameter control. API docs
Seedance 1.0 Pro Fast Image-to-Video — Fast video generation from images with simplified parameters. Limited to 720p/1080p. API docs
Seedance 1.0 Lite Text-to-Video — Budget-friendly text-to-video generation with good quality. API docs
Seedance 1.0 Lite Image-to-Video — Budget-friendly image-to-video with start/end image support. API docs

OpenAI

Sora 2 is OpenAI's breakthrough text-to-video AI capable of creating realistic scenes from text descriptions. Perfect for content creators and marketers needing quick video generation. Generates 10-15 second clips in portrait or landscape orientation with character consistency support.

Sora 2 T2V — OpenAI Sora 2 text-to-video generation. API docs
Sora 2 I2V — OpenAI Sora 2 image-to-video generation. API docs

Alibaba

HappyHorse 1.0 Text-to-Video generates video from text prompts using Kie.ai's HappyHorse family. It supports five aspect ratios, 720p or 1080p resolution, integer duration control from 3 to 15 seconds, and optional seeded generation for reproducible outputs.

HappyHorse 1.0 Text-to-Video — Cinematic text-to-video generation with 3-15s duration and up to 1080p output. API docs
HappyHorse 1.0 Image-to-Video — Animate a source image into 3-15s video with up to 1080p output. API docs
HappyHorse 1.0 Reference-to-Video — Generate video from prompt plus ordered reference images, with up to 1080p output. API docs
HappyHorse 1.0 Video-to-Video Edit — Edit an input video with text instructions, optional references, and up to 1080p output. API docs

xAI

Grok Imagine Text-to-Video generates videos from text prompts using xAI's video generation model with multiple creative modes. Perfect for social media content, animated scenes, and dynamic visual storytelling. Features Fun, Normal, and Spicy generation modes with multiple aspect ratio options.

Grok Imagine Text-to-Video — Generate videos from text prompts using Grok Imagine. API docs
Grok Imagine Image-to-Video — Generate videos from images using Grok Imagine. API docs

Midjourney

Midjourney Video (SD) creates standard definition video animations from source images. Cost-effective solution for social media content and rapid video prototyping. Offers batch sizes of 1-4 videos with adjustable motion intensity and all standard Midjourney controls.

Midjourney Video (SD) — Generate standard definition video from images using Midjourney AI. API docs
Midjourney Video (HD) — Generate high definition video from images using Midjourney AI. API docs

Hailuo

Hailuo 02 Standard Image-to-Video is MiniMax's budget-friendly animation model with duration and resolution control. Ideal for cost-effective video production and social content. Supports 6-10 second clips at 512p or 768p with end frame option.

Hailuo 02 Standard Image-to-Video — Budget-friendly image-to-video with duration and resolution control. API docs
Hailuo 02 Pro Text-to-Video — High-quality text-to-video generation with 1080p output. API docs
Hailuo 02 Pro Image-to-Video — High-quality image-to-video with start/end frame support. API docs
Hailuo 02 Standard Text-to-Video — Budget-friendly text-to-video generation with duration control. API docs
Hailuo 2.3 Standard Image-to-Video — Latest Hailuo 2.3 Standard with 1080p support at budget price. API docs
Hailuo 2.3 Pro Image-to-Video — Latest Hailuo 2.3 Pro with 1080p support and improved quality. API docs

Luma

Luma Ray 2 is Luma Labs' video-to-video AI model that transforms existing footage based on text prompts. Perfect for creative video editing, style transfer, and modifying video content without re-shooting. Accepts MP4/MOV/AVI input videos up to 10 seconds and 500MB, with optional watermark support.

Luma Ray 2 — Modify video using AI based on text prompt. Max 10s input video. API docs

Runway

Runway Gen-3 Alpha is Runway's advanced text-to-video model creating 5-10 second cinematic clips. Designed for filmmakers and video professionals requiring high-quality AI-generated footage. Offers 720p and 1080p HD output with 16:9, 9:16, 1:1, 4:3, and 3:4 aspect ratios plus video extension capability.

Runway Gen-3 — Runway Gen-3 Alpha text-to-video generation with 5-10 second clips. API docs
Runway Gen-3 — Runway Gen-3 Alpha image-to-video generation - animate your images. API docs

Wan

Wan 2.7 Text-to-Video by Alibaba generates videos from text prompts with optional audio synchronization, five aspect ratios, and integer duration control 2-15 seconds. Supports Chinese and English prompts at 720p or 1080p.

Wan 2.7 Text-to-Video — High-fidelity text-to-video with optional audio sync, 2-15s duration, up to 1080p. API docs
Wan 2.7 Image-to-Video — Animate a starting image, optionally with a last-frame keyframe or video continuation. API docs
Wan 2.6 Text-to-Video — High-quality video generation from text prompts with 5-15 second duration. API docs
Wan 2.6 Image-to-Video — Generate videos from images with 5-15 second duration. API docs
Wan 2.7 Reference-to-Video — Generate video guided by up to 5 reference images and/or videos, with optional voice. API docs
Wan 2.2 Speech-to-Video — Generate lip-synced video from image and audio. API docs
Wan 2.6 Video-to-Video — Transform videos with text prompts, 5-10 second duration. API docs
Wan 2.2 Animate Move — Transfer motion from video to image. API docs
Wan 2.2 Turbo Text-to-Video — Fast video generation from text with acceleration options. API docs
Wan 2.2 Animate Replace — Replace character in video with image. API docs
Wan 2.2 Turbo Image-to-Video — Fast video generation from image with acceleration options. API docs

Topaz

Topaz Video Upscale delivers AI-powered video enhancement, upscaling footage up to 4x resolution with exceptional detail preservation. Ideal for restoring old video content, improving low-resolution clips, and preparing videos for 4K displays. Supports MP4, MOV, and MKV formats with intelligent artifact removal.

Topaz Video Upscale — AI-powered video upscaling up to 4x resolution. API docs

LTXV

LTXV-2 is Lightricks' flagship text-to-video and image-to-video AI model delivering high-quality video generation with integrated audio support. Best for creating professional marketing content, social media videos, and visual storytelling with resolutions up to 4K (2160p). Supports 6-10 second clips at 25 or 50 fps with optional starting frame for precise creative control.

LTXV-2 — High-quality video generation with audio and up to 4K resolution. API docs
LTXV-2 Fast — Fast video generation with audio and up to 4K resolution. API docs

Image Generation

Google

Nano Banana 2 (Gemini 3.1 Flash) is Google's fast text-to-image model with accurate text rendering, character consistency, and optional Google Search grounding for real-time reference awareness. Outputs up to 4K resolution with 15 aspect ratio options including ultra-wide formats.

Nano Banana 2 (Gemini 3.1 Flash) Text-to-Image — Fast image generation with text rendering and Google Search grounding. API docs
Nano Banana 2 (Gemini 3.1 Flash) Image-to-Image — Fast image editing with multi-image input and Google Search grounding. API docs
Nano Banana Pro (Gemini 3 Pro) Text-to-Image — High-quality image generation with sharp structural accuracy and precise text rendering. API docs
Nano Banana Pro (Gemini 3 Pro) Image-to-Image — Image editing with inpainting, outpainting, and style transfer capabilities. API docs
Google Imagen 4 — Google Imagen 4 text-to-image with Standard, Fast, and Ultra quality tiers. API docs

ByteDance

Seedream 5.0 Lite is ByteDance's budget-friendly text-to-image model offering solid quality at lower cost. Supports multiple aspect ratios with 2K to 3K resolution output. Great for rapid prototyping and high-volume image generation where cost efficiency matters.

Seedream 5.0 Lite Text-to-Image — Budget-friendly Seedream 5.0 Lite text-to-image from ByteDance. API docs
Seedream 5.0 Lite Edit — Budget-friendly Seedream 5.0 Lite image editing from ByteDance. API docs
Seedream 4.5 Text-to-Image — Latest Seedream 4.5 with premium image quality from ByteDance. API docs
Seedream 4.5 Edit — Latest Seedream 4.5 image editing with premium quality from ByteDance. API docs
Seedream v4 Text-to-Image — ByteDance Seedream v4 with enhanced image quality and resolution options. API docs
Seedream v4 Edit — ByteDance Seedream v4 image editing with enhanced quality and resolution options. API docs
Seedream v3 Text-to-Image — High-quality image generation from ByteDance Seedream v3. API docs

OpenAI

GPT Image 2 Image-to-Image by OpenAI transforms and edits images using natural language instructions. Accepts up to 16 input images per request with fast generation (~3s) and flexible aspect ratios (auto, 1:1, 9:16, 16:9, 4:3, 3:4) plus 1K, 2K, and 4K resolution options for batch editing workflows.

GPT Image 2 Text-to-Image — Generate images from text prompts using GPT Image 2. API docs
GPT Image 2 Image-to-Image — Transform and edit images with text prompts using GPT Image 2. API docs
GPT Image 1.5 Text-to-Image — Generate photorealistic images from text prompts using GPT Image 1.5. API docs
GPT Image 1.5 Image-to-Image — Transform and edit images with text prompts using GPT Image 1.5. API docs

xAI

Grok Imagine Text-to-Image is xAI's photorealistic image generation model creating high-quality visuals from text prompts. Ideal for generating realistic photographs, product mockups, and lifelike scenes. Supports multiple aspect ratios including 1:1, 2:3, 3:2, 16:9, and 9:16 formats.

Grok Imagine Text-to-Image — Generate high-quality photorealistic images from text prompts using Grok Imagine. API docs
Grok Imagine Image-to-Image — Transform images with text prompts using Grok Imagine. API docs

Midjourney

Midjourney is the industry-leading AI image generator known for stunning artistic and photorealistic outputs. Ideal for concept artists, designers, and creative professionals seeking distinctive visual styles. Supports versions v5.1 through v7 plus Niji 6 for anime, with extensive customization via stylization, weirdness, and variety controls.

Midjourney — High-quality artistic image generation with multiple version support. API docs
Midjourney Image-to-Image — Transform images using Midjourney AI with style and content preservation. API docs
Midjourney Style Reference — Generate images using another image's style as reference. API docs
Midjourney Omni Reference — Place characters and objects from reference image into new scenes. API docs

Qwen

Qwen-Image-2.0 is Alibaba's unified image model featuring realistic generation, structured text rendering, and native 2K output. Excels at text-heavy visuals including infographics, posters, comics, and PPT slides. Supports both English and Chinese text integration with high visual fidelity.

Qwen Image 2 Text-to-Image — Qwen Image 2 generation with native 2K output and text rendering. API docs
Qwen Image 2 Edit — Natural language image editing with Qwen Image 2. API docs
Qwen Text-to-Image — Qwen image generation from text prompts. API docs
Qwen Image-to-Image — Transform images with text prompts. API docs
Qwen Image Edit — Edit images with natural language instructions. API docs

Flux

FLUX.2 Pro is Black Forest Labs' premium text-to-image model delivering high-quality image generation from detailed prompts. Ideal for professional artwork, marketing visuals, and creative projects requiring exceptional fidelity. Supports 1K and 2K resolutions with multiple aspect ratios including square, portrait, and landscape formats.

Flux 2 Pro Text-to-Image — High-quality text-to-image generation with Flux 2 Pro. API docs
Flux 2 Pro Image-to-Image — Transform and edit images with Flux 2 Pro. API docs
Flux 2 Flex Text-to-Image — Flexible text-to-image generation with Flux 2 Flex. API docs
Flux 2 Flex Image-to-Image — Flexible image transformation with Flux 2 Flex. API docs

Topaz

Topaz Image Upscaler uses industry-leading AI enhancement technology to upscale images up to 8x their original resolution. Best for restoring old photos, preparing images for large-format printing, and enhancing low-resolution graphics. Preserves fine details and textures while eliminating compression artifacts and noise.

Topaz Image Upscaler — AI-powered image upscaling with Topaz technology. Upscale images up to 8x resolution. API docs

Ideogram

Ideogram V3 Reframe intelligently expands images to different aspect ratios with AI-generated fill that seamlessly matches the original content. Essential for repurposing images across social media formats and adapting content for different platforms. Supports multiple output sizes with Turbo, Balanced, and Quality rendering speeds.

Ideogram V3 Reframe — Reframe images to different aspect ratios with AI-generated fill. API docs
Ideogram Character — Generate images featuring a character from a reference photo. API docs
Ideogram Character Edit — Edit images with character-consistent inpainting using a mask. API docs
Ideogram Character Remix — Remix an image with a consistent character from a reference photo. API docs

Recraft

Recraft Crisp Upscale enhances image resolution using AI-powered upscaling that preserves sharpness and detail. Perfect for enlarging images for print, improving low-resolution assets, and preparing visuals for high-DPI displays. Accepts PNG, JPG, and WEBP images up to 10MB.

Recraft Crisp Upscale — AI-powered crisp image upscaling by Recraft. API docs
Recraft Remove Background — AI-powered background removal by Recraft. API docs

Audio Generation

Suno

Suno Music is the flagship AI music generation model from Suno capable of creating complete songs with vocals and instrumentals from text prompts. Perfect for content creators, game developers, and musicians needing royalty-free original music. Features custom mode for full creative control, multi-track output, stem separation, WAV/MP4 export, and MIDI conversion with V4, V4.5, and V5 model options.

Suno Music — AI-powered music generation with multi-track output and custom style support. API docs
Suno Add Vocals — Add vocal singing to existing audio files. API docs
Suno Add Instrumental — Add instrumental accompaniment to existing audio. API docs
Suno Lyrics — AI-powered lyrics generation with multiple variations. API docs
Suno Sounds — Generate sound effects, ambient audio, and instrument samples from text. API docs

ElevenLabs

ElevenLabs TTS Multilingual v2 is a premium text-to-speech model from ElevenLabs featuring 21 natural-sounding voices across multiple languages. Perfect for audiobook production, video narration, and accessibility applications. Offers adjustable stability, similarity, style, and speed controls for fine-tuned voice output.

ElevenLabs TTS Multilingual v2 — High-quality multilingual text-to-speech with 21 voices and adjustable voice settings. API docs
ElevenLabs TTS Turbo v2.5 — Fast, low-latency text-to-speech with language enforcement support. API docs
ElevenLabs Sound Effects v2 — AI-generated sound effects from text descriptions with looping and format options. API docs

Text Generation

Google

Gemini 3.1 Pro is Google's latest flagship reasoning model with multimodal vision, tool calling, configurable reasoning effort, and streaming support.

Gemini 3.1 Pro — Google's latest flagship reasoning model with enhanced capabilities. API docs
Gemini 3 Pro — Google flagship reasoning model for complex tasks. API docs
Gemini 3 Flash — Google next-gen fast model with reasoning and vision. API docs
Gemini 2.5 Pro — Google advanced reasoning model with extended thinking. API docs
Gemini 2.5 Flash — Google fast reasoning model with vision and tool support. API docs

OpenAI

GPT-5.5 is OpenAI's most capable model for complex professional work, with reasoning effort control, vision input, tool calling, and long context.

GPT-5.5 — OpenAI frontier reasoning and chat model. API docs
GPT-5.2 — Latest OpenAI reasoning and chat model. API docs

Anthropic

Claude Opus 4.8 is Anthropic's most capable model for coding, debugging, code review, and longer-running agentic workflows, with 128K output and vision support.

Claude Opus 4.8 — Anthropic flagship model for advanced reasoning and agentic workflows. API docs
Claude Opus 4.7 — Anthropic flagship model for advanced reasoning and agentic workflows. API docs
Claude Opus 4.6 — Anthropic flagship model for advanced reasoning and agentic workflows. API docs
Claude Sonnet 4.6 — Fast, capable Anthropic model for everyday coding and reasoning tasks. API docs
Claude 4.5 Opus — Anthropic flagship model for advanced reasoning. API docs

Pricing

Pay-per-use. 1 GL = $0.01 USD. No subscriptions, no monthly fees. Top up any amount.

View all pricing →

Quick Start

Base URL: https://api.glio.io

Auth: Authorization: Bearer YOUR_API_KEY

Create job: POST /v1/jobs with {"model": "model-alias", "params": {...}}

Poll status: GET /v1/jobs/{id}

API Documentation

/llms.txt — Compact API reference for AI/LLM agents (all models listed)
/openapi.json — Full OpenAPI 3.1 specification
/docs — Human-readable documentation