DeepSeek V4 Video Official Logo - Next-Gen AI Video SynthesisDeepSeek V4 Video ドキュメント

Seedance 2.0 Image Input Guide

Learn when to use image-to-video, how to prepare first and last frames, and how to protect product or scene fidelity.

Image-to-video is the best Seedance 2.0 mode when the first frame already exists and your real job is adding motion without losing the original composition.

That makes it especially useful for:

  • product hero shots
  • poster-to-motion tests
  • still-image campaign assets
  • packshots and tabletop scenes
  • environment shots with a strong starting frame

When image-to-video is the right choice

Choose image-to-video when at least one of these is true:

  • the frame composition is already approved
  • the product silhouette must stay recognizable
  • you need motion, but not a brand new scene design
  • the first frame carries the most value

If identity lock across the full clip is more important than animating the first frame, move to Reference Input Guide.

Use the first frame as the anchor

Your uploaded image is not just inspiration. It is the structural anchor of the shot.

The strongest first-frame images usually have:

  • one obvious subject
  • readable silhouette
  • clean separation from the background
  • stable lighting direction
  • minimal clutter around the hero object

For product work, keep the object large enough in frame that labels, materials, and edges are actually visible.

When to add a last frame

In the current workflow, image-to-video can use a first frame and optionally a last frame. That is useful when you already know how the shot should resolve.

Use a last frame when:

  • the ending composition is critical
  • you want a before/after or open/closed state
  • the shot should move from one approved layout to another

Do not add a last frame just because it is available. If the first and last frames are visually too far apart, the in-between motion often breaks.

A reliable prompt pattern for image input

Start with the uploaded image, then describe the motion:

@Image1 [subject], [single motion layer], [camera move], [lighting/style], [constraints]

Example:

@Image1 perfume bottle on dark marble, droplets slide down the glass, slow macro dolly-in, luxury studio contrast, no label blur no cap drift no extra objects

This works because the frame is already defined. Your prompt should focus on how the still image comes alive, not on redesigning the whole scene.

How to prepare stronger source images

For products

  • keep one hero product per image
  • use sharp source files
  • avoid heavy reflections that already hide the label
  • simplify props unless they are essential to the shot

For people

  • use a clean face angle
  • avoid cropped hands if hands will be part of the motion
  • prefer one lighting setup, not mixed light directions

For environments

  • keep horizon lines and architectural edges clean
  • avoid busy frames with many competing moving elements

What kinds of motion work best

Image-to-video usually performs better with:

  • push-ins
  • slow pull-backs
  • restrained orbits
  • controlled tracking
  • subtle environmental motion

It usually performs worse when you ask it to invent:

  • complex choreography
  • large pose changes
  • major perspective jumps
  • multiple subject interactions

Typical failure modes and first fixes

ProblemUsual causeFirst fix
Product shape warpsthe requested move is too aggressiveslow the move and keep one hero object
Label becomes unreadabletoo many reflections or particlessimplify the scene and reinforce label constraints
Motion feels flatprompt only describes the object, not the shotadd one camera move and one motion cue
Frame-to-frame weirdnessfirst and last frames conflict too muchremove the last frame or narrow the transition
Background starts meltingthe scene has too many secondary elementssimplify props and keep the focus tight

When image-to-video beats text-only generation

Image-to-video is usually the better choice when:

  • the client already approved a packshot
  • the ad needs to match a still campaign
  • product geometry is more important than scene invention
  • you are working from catalog, PDP, or lookbook assets

That is why many ecommerce tests should start from image-to-video, not text-to-video.

Practical iteration rules

When a clip fails, fix in this order:

  1. simplify the motion
  2. simplify the frame
  3. strengthen the negative prompt
  4. only then change the source image

Most teams change the image too early. In practice, the bigger problem is usually that the shot request is trying to do too much.