Commissioning AI Video: A Guide for Fintech Marketers

Why this matters if you’re commissioning video

Let’s say you’re planning your next brand video.

Maybe it’s a campaign piece. Maybe it’s something designed to stop the scroll on social.

Either way, you want it to feel bigger than your typical product explainer.

The kind of idea that might normally involve:

multiple locations
a cast of characters
a proper shoot

The kind of thing that quickly starts to look like a commercial. And with that, a commercial-level budget.

So you start looking at AI video.

Not because you’re trying to replace production entirely, but because it feels like a way to get closer to that kind of idea without the same overhead.

You need to brief an agency, but you don’t quite know what’s possible yet. And that makes it harder to shape the idea.

Because the creative direction depends on the constraints. And with AI video, those constraints aren’t always obvious.

They’re defined by the technology.

Introducing: the technology.

As with any type of gen-AI, it’s the various models that sit at the centre of the creative process.

Tools like Runway, Google Veo, and Kling AI. These are what create the footage itself.

You give them a prompt, or an image, and they generate a short video clip. Each one excels at certain types of content.

But on their own, they’re not designed to build a full piece.

They don’t “remember” what came before, and they don’t automatically keep a character consistent across shots or maintain a coherent world unless you actively guide them to.

As such, a solid gen-AI video process is needed to ensure not only continuity, but also the ability to iterate across a project lifecycle.

What happens when you need to return to a sequence from draft 1, but you’ve lost all the references from it?

Instead of starting at the video clip level, a more robust process involves generating your reference image assets to act as anchors.

Things like characters, settings, styles, even types of shot and movement.

They give the models something consistent to work from, rather than generating something new each time.

From there, the video is built one shot at a time.

Each scene is generated individually, often multiple times, until you get close, adjust the prompt, and try again.

The same references are fed back in to keep things aligned, but it still requires direction. You’re guiding the output, not controlling it completely.

Platforms like Imagine.Art or Flora AI can help manage this.

Where it all comes together

At this point, you’ve got a great set of clips, but they’re still just fragments.

Everything still needs tying together in an edit.

The sequence is built. Timing is refined. Shots are reordered, trimmed, sometimes replaced entirely.

You’ll layer in things like:

Music
Voiceover (AI-generated too, why not?!)
Sound design
Graphics or UI overlays

And then the finishing touches.

Colour, transitions, subtle effects. The kind of details that make all of the material feel cohesive and part of the same production.

Where AI video works well, and where it doesn’t

So now you know how an AI-gen video is constructed, what are the best types of project to apply this process too?

In some ways, the sky’s the limit!

One of the best things about this process is that your creative ideas don’t have to be limited by a budget.

Think:

Brand campaigns
Abstract storytelling
Social content based around humour

The kind of ideas that might involve multiple locations, stylised environments, or moments that would be difficult to shoot practically.

You could even lean into the AI creation of the video itself, and make that part of the story. Coinbase took the opposite approach to this in their most recent ad!

However, there are still some clear limitations.

Consistency doesn’t happen by default, and pushing prompts or scenarios too far can produce weird results that are tricky to refine without the footage looking unnatural.

You can guide the output, but not dial things in with complete precision.

Hence the reason why UI walkthroughs and product explainers - basically any motion design work - is still unapproachable using these tools.

So, should you use it?

The more useful question is probably where it fits. AI video isn’t a shortcut to a finished piece.

It’s a different way of getting there.

Less time setting up production
More time shaping outputs
More iteration, less predictability

That shift has an interesting side effect.

Ideas that might have felt too big or risky start to become easier to explore.

And not just because they’re cheaper to produce, but because they’re easier to prototype too - you can get something in front of people earlier and test the direction before committing.

Used well, it doesn’t just change how things get made.

It changes what gets made in the first place.