TOOLS

Higgsfield, Nano Banana 2 and the AI video stack we actually use

Model by model, what each one does in a beauty production and why we chose it.

June 2026 · 7 min read · For Creative Directors, Content Teams, Brand Managers

Higgsfield AI beauty content stack — Highlight London

Every studio working in AI content production has a stack. Most share it vaguely. We prefer to be specific about what we use, which model does what, and why the choices matter for beauty specifically. Higgsfield sits at the centre of our video production. Here is how it actually works in practice.

Why Higgsfield is where our video production lives

Higgsfield is a multi-model AI generation platform with four video models, two image generation paths and a Marketing Studio designed explicitly for ad production. What sets it apart for beauty is the identity logic: Seedance 2.0 holds a character's likeness across frames at a quality level that makes beauty product interaction, the pour, the application, the skin close-up, credible rather than uncanny. That identity lock is the technical requirement most beauty brands care about, and most competing platforms still stumble on it.

We use Higgsfield alongside Flux.1 Pro for stills and ComfyUI for batch workflows. Each tool handles a different layer of production. Higgsfield handles the motion and the ad creative. The others handle volume and quality at the still-image layer.

Nano Banana 2 for editorial stills and packaging context

Nano Banana 2 is Higgsfield's top-tier image model. For beauty production it earns its place in two specific situations: editorial stills that need 4K resolution and sharp material rendering, and any image that requires readable text inside the frame, such as a pack in context, a shelf shot with product copy visible, or an ingredient callout with a label. Most AI image models render text as decoration. Nano Banana 2 renders it as text. For beauty packaging work, that distinction removes an entire retouching step.

The model rewards specificity. Generic prompts produce generic output. A prompt that specifies the light source direction, the surface material, the lens focal length and the precise composition coordinates produces something a creative director can actually use. We treat Nano Banana 2 prompts the way we treat retouching briefs: every variable named, nothing left to interpretation.

Seedance 2.0 holds the identity. That matters in beauty.

Seedance 2.0 is the model we reach for when a specific person needs to appear consistently across a video sequence and the identity cannot drift. In beauty, that means founder content, talent-led campaigns and any product application video where the skin needs to read as a real person's skin rather than a generated approximation. The model accepts a reference image and holds the likeness through motion, which means a creative director can lock the talent direction in a still and carry it into video without re-establishing the character in each shot.

Prompt craft for Seedance is about concrete motion verbs. The model performs well when told "turns slowly toward camera" or "lifts hand to cheek in three seconds", and drifts when given abstract direction like "expressive" or "dynamic". The camera move and the subject motion get specified separately. Duration is set intentionally: we run three-second tests before committing to eight-second finals. The output quality difference between a validated prompt and a first attempt is substantial enough to make the iteration cost worthwhile.

Kling 3.0 for multi-shot sequences and audio-driven content

Kling 3.0 handles the briefs where Seedance 2.0 is the wrong choice: multi-shot sequences where each shot carries a different camera move, audio-driven content where the visual needs to follow a voiceover or music beat, and motion-transfer work where an existing reference video provides the movement template. For a brand film that runs three distinct visual beats or a product reel cut to a track, Kling's multi-shot logic means the sequence can be described shot by shot in a single prompt rather than stitched from separate generations.

The tradeoff is identity preservation. Kling 3.0 is less strict about holding a specific face across shots than Seedance 2.0. For product-centred content where the hero is a bottle, a texture or an ingredient rather than a person, that tradeoff rarely matters. For talent-led content, Seedance is still the call.

Marketing Studio for turning a product page into an ad

Higgsfield's Marketing Studio takes a product URL, a product image or an ad reference video and produces a formatted ad creative around it, with hooks and settings drawn from a library of proven ad mechanics. For beauty brands, this means a new SKU can have a UGC-style ad, a product review format and an unboxing cut within an hour of going live on a product page, with no talent booking, no shoot and no agency brief.

The workflow has three modes. URL-driven takes the product page and builds the creative around what the model extracts. Product entity mode starts from a hero product image supplied directly. Ad reference mode takes a reference TikTok or Reel and recreates the scenario with the brand's product inserted. We use ad reference mode most often for clients launching into a category where a proven creative format already exists in market: faster to iterate on what works than to discover it from scratch.

Soul Characters for recurring talent

When a brand needs the same person across ten or more generations, whether that is a founder avatar, a recurring campaign face or a creator character, we train a Higgsfield Soul Character. Training takes five to twenty reference images and runs in roughly ten minutes. The resulting soul ID locks the identity for all subsequent generations, meaning the same face, the same skin, the same features appear consistently across every image and video produced from that character. For brands building a long-term content library around a single talent, Soul training is the most cost-efficient decision in the workflow: one session, reusable indefinitely.

For one-off needs, we use Soul 2 in single-reference mode, which requires no training and produces a high-fidelity result from a single image. The distinction matters for budget planning: trained Soul for campaigns, one-shot Soul 2 for isolated productions.

How the stack connects in practice

A typical Highlight London AI production cycle for a product launch runs through three phases. Concept and direction uses Midjourney and Flux for fast visual territory exploration and mood reference, with Claude for prompt iteration and brief synthesis. The hero still production layer uses Nano Banana 2 for pack-in-context images and Flux.1 Pro for skin-forward editorial. The video and ad layer uses Seedance 2.0 for talent-led content, Kling 3.0 for multi-shot brand film sequences and Marketing Studio for the ad formats. ComfyUI handles batch processing wherever the brief calls for volume: colourway variants, format families, market adaptations.

Each model has a defined role. The prompt library for each is maintained separately, anchored to the brand book, and version-controlled as the models update. That is the discipline that keeps the output coherent across a production that may run two hundred assets. The stack is not the system. The system is what governs how the stack is used.