DeepSeek V4 Video Official Logo - Next-Gen AI Video SynthesisDeepSeek V4 Video ドキュメント

Seedance 2.0 Reference Input Guide

Use 1 to 3 reference images more effectively in Seedance 2.0 to lock identity, product geometry, and scene consistency.

Reference-to-video is not just “image-to-video with more files.” It is the mode you use when stability is the assignment.

In the current Seedance 2.0 workflow, the reference path is built around one to three reference images. That is enough for most use cases, as long as each image has a clear role.

When to choose reference-to-video

Use reference-to-video when your biggest risk is drift:

  • face identity changes
  • the product changes shape
  • hands break during interaction
  • wardrobe or props mutate
  • the scene loses continuity across retries

If you only need to animate a still image, image-to-video is simpler. Use reference-to-video when continuity matters more than simplicity.

Give each reference a job

The cleanest reference workflows usually assign roles like this:

ReferenceBest role
Image 1main identity or hero object
Image 2supporting angle, outfit, or product detail
Image 3optional color, environment, or secondary continuity cue

Do not upload three images that all fight each other on pose, lighting, and styling. More references only help when they are aligned.

What makes a strong reference set

Your references should agree on the things you want preserved:

  • same person or same product
  • compatible lighting logic
  • compatible styling
  • similar quality level

Your references should not disagree on:

  • age or face shape
  • product proportions
  • costume color
  • camera distance

Conflicting references make the model average them, which is where drift starts.

The best prompt structure for reference mode

In reference mode, the order changes slightly:

  1. state the stability rule
  2. define the action
  3. define one camera move
  4. define style
  5. define constraints

Example:

@Image1 creator identity remains consistent, holds the skincare bottle near the face, subtle push-in, soft daylight beauty review setup, no face drift no finger artifacts no bottle shape change

The key difference is that consistency comes before atmosphere.

When reference mode is the better answer

Reference-to-video is usually the better choice when:

  • a creator must remain recognizable
  • a product demo depends on accurate shape
  • hands are touching the hero object
  • multiple retries need to stay on-brand
  • you are building a sequence from several short clips

This is especially relevant for:

  • UGC ads
  • beauty demos
  • packaging shots
  • fashion accessories
  • creator explainers

Common reference mistakes

Using references to solve a prompt problem

If the scene is vague, references will not save it. You still need:

  • one clear action
  • one camera instruction
  • one stable visual hierarchy

Using too many visual ideas in one shot

Reference mode is for protecting continuity, not for asking the model to do everything at once. Keep the shot narrow:

  • one action
  • one hero subject
  • one focal intention

Forgetting to protect hands

If hands are on screen, say so in the negative prompt. Hand stability does not improve just because references exist.

A simple reference workflow

For creators

  • Image 1: clear face and upper-body anchor
  • Image 2: product hold or outfit support
  • Prompt: one speaking or holding action only

For products

  • Image 1: clean hero packshot
  • Image 2: alternate angle or material detail
  • Prompt: one reveal or one hold, not a full mini-commercial

For characters

  • Image 1: identity anchor
  • Image 2: costume or silhouette support
  • Image 3: optional environment palette if it does not conflict

Failure modes and fixes

ProblemLikely causeFirst fix
Face changes across framesreferences conflict or action is too bigreduce pose change and use one clearer anchor
Product shape driftstoo much camera motionsimplify the move and name geometry preservation
Hands still breakthe hand action is too expressiveuse a simpler gesture and strengthen hand constraints
Output looks stiffcontinuity rules are too dominantkeep the lock rule, then add one subtle motion layer

When to switch back out of reference mode

If you notice that the output is stable but too stiff, that may mean the problem is no longer continuity. At that point:

  • switch to image-to-video if the first frame matters most
  • switch to text-to-video if you want broader creative exploration

Reference mode is strongest when protecting identity is the main objective.