Seedance 2.0 Reference Input Guide

Use 1 to 3 reference images more effectively in Seedance 2.0 to lock identity, product geometry, and scene consistency.

Reference-to-video is not just “image-to-video with more files.” It is the mode you use when stability is the assignment.

In the current Seedance 2.0 workflow, the reference path is built around one to three reference images. That is enough for most use cases, as long as each image has a clear role.

When to choose reference-to-video

Use reference-to-video when your biggest risk is drift:

face identity changes
the product changes shape
hands break during interaction
wardrobe or props mutate
the scene loses continuity across retries

If you only need to animate a still image, image-to-video is simpler. Use reference-to-video when continuity matters more than simplicity.

Give each reference a job

The cleanest reference workflows usually assign roles like this:

Reference	Best role
Image 1	main identity or hero object
Image 2	supporting angle, outfit, or product detail
Image 3	optional color, environment, or secondary continuity cue

Do not upload three images that all fight each other on pose, lighting, and styling. More references only help when they are aligned.

What makes a strong reference set

Your references should agree on the things you want preserved:

same person or same product
compatible lighting logic
compatible styling
similar quality level

Your references should not disagree on:

age or face shape
product proportions
costume color
camera distance

Conflicting references make the model average them, which is where drift starts.

The best prompt structure for reference mode

In reference mode, the order changes slightly:

state the stability rule
define the action
define one camera move
define style
define constraints

Example:

@Image1 creator identity remains consistent, holds the skincare bottle near the face, subtle push-in, soft daylight beauty review setup, no face drift no finger artifacts no bottle shape change

The key difference is that consistency comes before atmosphere.

When reference mode is the better answer

Reference-to-video is usually the better choice when:

a creator must remain recognizable
a product demo depends on accurate shape
hands are touching the hero object
multiple retries need to stay on-brand
you are building a sequence from several short clips

This is especially relevant for:

UGC ads
beauty demos
packaging shots
fashion accessories
creator explainers

Common reference mistakes

Using references to solve a prompt problem

If the scene is vague, references will not save it. You still need:

one clear action
one camera instruction
one stable visual hierarchy

Using too many visual ideas in one shot

Reference mode is for protecting continuity, not for asking the model to do everything at once. Keep the shot narrow:

one action
one hero subject
one focal intention

Image 1: clear face and upper-body anchor
Image 2: product hold or outfit support
Prompt: one speaking or holding action only

For products

Image 1: clean hero packshot
Image 2: alternate angle or material detail
Prompt: one reveal or one hold, not a full mini-commercial

For characters

Image 1: identity anchor
Image 2: costume or silhouette support
Image 3: optional environment palette if it does not conflict

Failure modes and fixes

Problem	Likely cause	First fix
Face changes across frames	references conflict or action is too big	reduce pose change and use one clearer anchor
Product shape drifts	too much camera motion	simplify the move and name geometry preservation
Hands still break	the hand action is too expressive	use a simpler gesture and strengthen hand constraints
Output looks stiff	continuity rules are too dominant	keep the lock rule, then add one subtle motion layer

When to switch back out of reference mode

If you notice that the output is stable but too stiff, that may mean the problem is no longer continuity. At that point:

switch to image-to-video if the first frame matters most
switch to text-to-video if you want broader creative exploration

Reference mode is strongest when protecting identity is the main objective.