Can Seedance 2.0 do perfect lip sync?

Not reliably enough to assume for production. ByteDance describes Seedance 2.0 as supporting unified audio-video generation, but tight word-level lip sync still needs testing in the current platform. For exact scripts, keep a dedicated lip-sync workflow available.

What is the best prompt for a talking-head clip in Seedance?

A clear subject description, a medium close-up framing, soft lighting, 'subject speaking to camera with natural mouth movement', and low-to-medium motion intensity. If your current surface supports generated speech or audio input, test one short line first; otherwise add the audio track in post.

Should I use text-to-video or image-to-video for talking heads?

Image-to-video, almost always. Start with a clean portrait you already like and let Seedance animate it. Text-to-video for faces is fragile — you spend iterations fighting the model to produce the same identity across attempts.

How do I pair Seedance clips with voiceover?

Generate a Seedance clip of the character talking, testing the built-in audio-video mode if your account exposes it. If you need tighter mouth alignment, run the clip through a dedicated lip-sync tool that matches mouth shapes to the audio.

What motion intensity works best for talking heads?

Low to medium — around 30–45. Faces are the most fragile subject for AI video and every step up in motion intensity risks identity drift. Keep intensity low, get energy from subtle head movement and eye contact, and add dramatic motion with the camera prompt instead of the slider.

Can I make a consistent character across multiple talking-head clips?

Yes, with careful reference image handling. Use the same reference portrait for every clip in a sequence, keep the prompt description identical, and use the same seed if the platform exposes one. Even then, expect small variations — the consistent-character guide on this site covers the techniques that work best.

Video

How to Create Lip-Sync and Talking-Head Videos in Seedance 2.0 (2026)

By Sagnik Bhattacharya 11 Apr 2026 8 min read

Coding Liquids blog cover featuring Sagnik Bhattacharya for the Seedance 2.0 lip-sync and talking-head guide.

Talking-head and lip-sync video is the hardest thing to do well in any text-to-video model, and Seedance 2.0 is no exception. Most beginner tutorials overpromise — "just type 'character speaking' and Seedance will do the rest" — and then quietly ship clips where the mouth flaps randomly and the face melts halfway through. This guide is the honest version. It tells you what Seedance can do for faces and talking heads, what it cannot, and how to build a two-tool workflow that produces usable results today.

I teach Flutter and Excel with AI — explore my courses if you want structured learning.

By the end you will know when to use Seedance alone, when to pair it with a dedicated lip-sync tool, and when to reach for a different solution entirely.

Quick answer

Seedance 2.0 can generate believable talking-head motion — subtle head movement, blinks, generic mouth activity — and ByteDance now describes Seedance 2.0 as supporting unified audio-video generation. The production caveat is word-level precision: if the job needs a specific script to match mouth shapes tightly, test the current audio-enabled surface first and use a dedicated lip-sync tool when the built-in result is not clean enough.

You want a character to deliver a line to camera without paying an actor.
You have tried "just type it in Seedance" and the mouth flaps randomly.
You need to make a short talking-head explainer and are picking a stack.

What Seedance 2.0 can and cannot do for faces

Let me be blunt about the limitations up front, because most of the frustration around Seedance talking heads comes from expecting the wrong thing.

Can do: generic talking mouth motion, natural head movement, eye blinks, soft expression changes, subtle emotional reads.
Can do with care: a consistent character identity across multiple clips, if you use the same reference image and prompt.
Still review carefully: tight word-level lip sync to a specific script or audio track. Use the current audio-enabled mode if available, but verify short lines before committing to a long talking-head sequence.
Cannot do reliably: photoreal human faces with no warping across a full 10 seconds — expect some identity drift near the end.

Once you separate believable talking motion from production-grade synced speech, the workflow becomes obvious: use Seedance to generate the base video, test any built-in audio-video output your current surface provides, and add a dedicated mouth-sync layer when exact dialogue matters.

Pick the right starting image

Character work lives or dies on the reference image. Seedance will do a competent job animating a clean portrait and a terrible job animating a complicated one. What makes a good reference image for talking-head work:

Medium close-up framing — shoulders-up to chest-up. Too tight and the model has nothing to animate around the face; too wide and the face is too small to animate well.
Face clearly visible and front-facing — slight angle is fine, extreme profile is not.
Soft even lighting — hard side light creates shadows that shift weirdly when the model animates.
Mouth closed or slightly parted — neutral mouth starting state animates more naturally than a huge grin.
Simple background — a solid colour, soft bokeh, or a minimal environment. Busy backgrounds distract the model from the face.

The reference images for characters guide goes deeper on picking portraits that animate well. Read it alongside this post if you are making multiple talking-head clips.

Prompt structure for talking-head clips

The prompt for a talking head looks different from a typical Seedance prompt. You are not asking for a dramatic camera move — you are asking for subtle, natural human motion that sells "this person is speaking".

A working pattern:

Subject description — who the character is, clearly.
Framing cue — "medium close-up, chest-up, eye contact with camera".
Motion cue — "subject speaking naturally to camera, subtle head movement, natural blinks, mouth moving as if talking".
Lighting cue — "soft even lighting, gentle rim light".
Mood cue — "calm, professional, warm" — this drives micro-expression.

Notice what is not in that prompt: specific words, dramatic camera moves, high-energy verbs. Those all fight against face stability. For more on the prompting principles behind this, see better Seedance prompts.

Settings for talking heads

Setting	Value	Why
Mode	Image-to-video	Text-to-video identity is fragile for faces
Reference image	Clean portrait, medium close-up	Gives the model a stable face to preserve
Duration	5 seconds	Face stability drops after ~6 seconds
Resolution	1080p for finals	Face detail matters at playback size
Aspect ratio	9:16 or 1:1	Vertical or square keeps face large on phone
Motion intensity	30–45	Low enough to preserve identity

Do not push motion intensity above 50 for talking heads. Every step up increases the chance of the face drifting into someone who looks slightly different by the end of the clip. The motion intensity guide explains the tradeoff in more depth.

The two-tool workflow for real lip sync

Here is the workflow that actually produces usable talking-head content in 2026.

Write your script. Keep each line short — 5 to 10 seconds of speech per Seedance clip.
Record or generate the voiceover. Use a voice actor, your own voice, or a text-to-speech tool.
Generate a Seedance clip of the character speaking to camera. If your current surface supports audio generation, test one short line first; otherwise generate a silent base clip at the same length as the audio line.
Run the silent clip through a dedicated lip-sync tool. The tool takes your Seedance video and your audio file, and retargets the mouth shapes to match the words. This is the step that produces real sync.
Import the synced clip into your editor alongside any other footage, background music, and captions.
Repeat per line and stitch into the final video.

Skipping the lip-sync review is the single biggest reason talking-head attempts look amateur. Seedance may provide audio-video output in current surfaces, but a specialist lip-sync pass is still the safer choice when exact words, client approvals, or long dialogue matter.

Keeping the same character across multiple clips

If you are making a multi-clip explainer with the same character, consistency is the hard part. Seedance does not natively "remember" a character across sessions — you have to enforce identity yourself.

Use the exact same reference image for every clip in the sequence. Do not swap it for a different angle halfway through.
Keep the subject description identical in every prompt — copy-paste the character block, do not paraphrase.
Fix the seed if the platform exposes it. Same seed plus same reference plus same prompt gives the most consistent results.
Keep framing consistent — always medium close-up, always the same rough angle.
Grade clips in your editor to match colour and exposure if they drift slightly.

For deeper technique, read consistent characters in Seedance — it is the companion to this post for multi-clip character work.

Common talking-head mistakes

Mistake	Symptom	Fix
Expecting perfect lip sync from Seedance alone	Mouth timing may drift or feel uncanny	Test short audio-enabled clips first; add a dedicated lip-sync tool in post when exact speech matters
High motion intensity on faces	Identity drifts mid-clip	Cap at 45 for talking heads
Busy reference portrait	Model loses the face to background	Use a clean simple portrait
10-second talking-head clips	Face morphs near the end	Use 5-second clips and stitch
Different reference per clip	Character looks inconsistent	One reference image, reuse it
Writing actual dialogue into the prompt	Prompt ignores the words	Describe motion, not dialogue content

Worked example: a 30-second explainer with one character

Here is the full pipeline for a 30-second explainer featuring a single AI character delivering three lines to camera.

Script three lines, each about 10 seconds of speech. Record voiceover (or generate with a TTS tool) as three separate audio files.
Pick one clean portrait that fits the character. Medium close-up, soft lighting, simple background, neutral mouth.
Generate three Seedance clips using the same portrait, the same prompt except for subtle mood variation ("calm and warm", "slightly excited", "thoughtful and serious"). Motion intensity 40, duration 5s each.
Run each clip through a dedicated lip-sync tool with its matching audio file. You now have three video files with synced mouths.
Open a video editor (CapCut, Premiere, Resolve). Drop the three synced clips on the timeline with brief crossfades.
Layer background music at low volume, add captions for accessibility, grade all three clips to match colour.
Export at 1080p vertical for social or 1080p horizontal for YouTube. Done.

Total time: 1–2 hours for someone who has done this pipeline before. The tricky parts are not the Seedance generations — they are the script, the audio, and the consistency across clips.

When Seedance is not the right tool for talking heads

Be honest about this. If your project needs a single character delivering 60+ seconds of dialogue with perfect sync and photoreal fidelity, Seedance alone may not be the fastest path. Test the current audio-video mode, but expect a dedicated AI avatar or lip-sync tool to produce tighter sync with less work. Seedance wins when you want a unique stylised character, a non-photoreal look, or a specific cinematic framing that avatar tools cannot produce.

For audio pairing generally (not just dialogue), Seedance audio prompts covers the broader topic of making clips that work well with a soundtrack.

Related guides on this site

Talking-head work pulls in several other Seedance topics. These are the natural companions.