Multi-Agent Orchestrator
The iPulse Multi-Agent Video Pipeline represents a shift from static generative models to an active, collaborative intelligence ecosystem. Instead of writing prompts manually, fact-checking details, and stitching clips, a dynamic team of six AI agents collaborates via structured tasks to build complete, cinematic video productions from simple descriptions.
Meet The Creative Crew
Each agent has an opinionated backstory, distinct goal, and specific toolsets. They interact using @-mentions to collaborate through the workflow chain:
Idea Generator (@Bully)
Acts as the creative director. Brainstorms scroll-stopping hooks, format parameters, emotional tone, and hooks. Directs the workflow and determines whether historical research or statistics are required to back up the visual direction.
Research Analyst (@Raffa)
The skeptical thinker. Fact-checks user claims, queries real-time databases, and builds factual outlines. If the brief is purely creative, it steps aside to conserve API quota and signals the screenwriter to start immediately.
Copywriter & Screenwriter (@Monker)
Perfectionist screenwriter. Translates the outline or concept into structured screenplays with distinct scene cues and voiceover lines. Constrains word counts precisely to fit the generated video durations.
Art Director (@Intruder)
Visual stylist. Breaks the script into 5-10 second chunks, details camera panning directives, lighting configurations, color palettes, and writes cinematic prompts optimized for Grok Imagine Video. Ensures character seeds match if a reference image is present.
Media Director (@Tupac)
Pragmatic engineer. Executes the video generation API calls. Seeds Scene 1 from the reference image, sequentially runs the Flow extension to append further scenes, logs rendering times, and resolves failures automatically.
Editor & QA Specialist (@Sam)
Strict post-production compiler. Audits rendering outputs for continuity errors. Stitches clip sequences using MoviePy, overlays the neural TTS audio file, merges audio/video elements, and exports the final MP4.
Orchestration Workflows
The agent crew automatically shifts roles depending on the user's intent:
Creative Mode
Optimized for cinematic stories and artistic briefs. @Bully outlines a fiction style, @Raffa skips fact-checking, and @Monker focuses on sensory narrative arcs and punchy visual details.
Research-backed Mode
Triggered for educational content, product statistics, or historical recaps. @Raffa runs web searches to lock down verified numbers and trending details, structuring a rigid outline before @Monker begins writing.
Image-to-Video Mode
Locks character and scene consistency using a reference image. @Intruder targets Scene 1 specifically to match the style/actor details, and instructs @Tupac to use `generate_first_clip(image_path)` to seed the temporal animation.
Verbatim Mode
Used when the user provides an exact audio file or script text. @Monker is banned from altering the voiceover words, enforcing 100% adherence to input strings while directing only the visual styling.