Talking-head and lip-sync video is the hardest thing to do well in any text-to-video model, and Seedance 2.0 is no exception. Most beginner tutorials overpromise — "just type 'character speaking' and Seedance will do the rest" — and then quietly ship clips where the mouth flaps randomly and the face melts halfway through. This guide is the honest version. It tells you what Seedance can do for faces and talking heads, what it cannot, and how to build a two-tool workflow that produces usable results today.
By the end you will know when to use Seedance alone, when to pair it with a dedicated lip-sync tool, and when to reach for a different solution entirely.
Quick answer
Seedance 2.0 can generate believable talking-head motion — subtle head movement, blinks, generic mouth activity — but it cannot tightly sync mouth shapes to specific words in an audio track. The workflow that actually ships is: generate a silent Seedance clip of a character talking, then overlay tight lip sync with a dedicated lip-sync tool that takes your audio and retargets the mouth shapes. Seedance is the base video; the specialist tool handles the mouth.
- You want a character to deliver a line to camera without paying an actor.
- You have tried "just type it in Seedance" and the mouth flaps randomly.
- You need to make a short talking-head explainer and are picking a stack.
What Seedance 2.0 can and cannot do for faces
Let me be blunt about the limitations up front, because most of the frustration around Seedance talking heads comes from expecting the wrong thing.
- Can do: generic talking mouth motion, natural head movement, eye blinks, soft expression changes, subtle emotional reads.
- Can do with care: a consistent character identity across multiple clips, if you use the same reference image and prompt.
- Cannot do: tight word-level lip sync to a specific audio track. The model does not know what the audio says.
- Cannot do reliably: photoreal human faces with no warping across a full 10 seconds — expect some identity drift near the end.
Once you accept that Seedance produces believable talking motion rather than synced speech, the workflow becomes obvious: use Seedance to generate the base video of the character in motion, and layer the mouth sync separately.
Pick the right starting image
Character work lives or dies on the reference image. Seedance will do a competent job animating a clean portrait and a terrible job animating a complicated one. What makes a good reference image for talking-head work:
- Medium close-up framing — shoulders-up to chest-up. Too tight and the model has nothing to animate around the face; too wide and the face is too small to animate well.
- Face clearly visible and front-facing — slight angle is fine, extreme profile is not.
- Soft even lighting — hard side light creates shadows that shift weirdly when the model animates.
- Mouth closed or slightly parted — neutral mouth starting state animates more naturally than a huge grin.
- Simple background — a solid colour, soft bokeh, or a minimal environment. Busy backgrounds distract the model from the face.
The reference images for characters guide goes deeper on picking portraits that animate well. Read it alongside this post if you are making multiple talking-head clips.
Prompt structure for talking-head clips
The prompt for a talking head looks different from a typical Seedance prompt. You are not asking for a dramatic camera move — you are asking for subtle, natural human motion that sells "this person is speaking".
A working pattern:
- Subject description — who the character is, clearly.
- Framing cue — "medium close-up, chest-up, eye contact with camera".
- Motion cue — "subject speaking naturally to camera, subtle head movement, natural blinks, mouth moving as if talking".
- Lighting cue — "soft even lighting, gentle rim light".
- Mood cue — "calm, professional, warm" — this drives micro-expression.
Notice what is not in that prompt: specific words, dramatic camera moves, high-energy verbs. Those all fight against face stability. For more on the prompting principles behind this, see better Seedance prompts.
Settings for talking heads
| Setting | Value | Why |
|---|---|---|
| Mode | Image-to-video | Text-to-video identity is fragile for faces |
| Reference image | Clean portrait, medium close-up | Gives the model a stable face to preserve |
| Duration | 5 seconds | Face stability drops after ~6 seconds |
| Resolution | 1080p for finals | Face detail matters at playback size |
| Aspect ratio | 9:16 or 1:1 | Vertical or square keeps face large on phone |
| Motion intensity | 30–45 | Low enough to preserve identity |
Do not push motion intensity above 50 for talking heads. Every step up increases the chance of the face drifting into someone who looks slightly different by the end of the clip. The motion intensity guide explains the tradeoff in more depth.
The two-tool workflow for real lip sync
Here is the workflow that actually produces usable talking-head content in 2026.
- Write your script. Keep each line short — 5 to 10 seconds of speech per Seedance clip.
- Record or generate the voiceover. Use a voice actor, your own voice, or a text-to-speech tool.
- Generate a silent Seedance clip of the character "speaking to camera" at the same length as the audio line.
- Run the silent clip through a dedicated lip-sync tool. The tool takes your Seedance video and your audio file, and retargets the mouth shapes to match the words. This is the step that produces real sync.
- Import the synced clip into your editor alongside any other footage, background music, and captions.
- Repeat per line and stitch into the final video.
Skipping step 4 — the dedicated lip-sync tool — is the single biggest reason Seedance-only talking-head attempts look amateur. The mouth needs a specialist; Seedance provides the base identity and natural motion around it.
Keeping the same character across multiple clips
If you are making a multi-clip explainer with the same character, consistency is the hard part. Seedance does not natively "remember" a character across sessions — you have to enforce identity yourself.
- Use the exact same reference image for every clip in the sequence. Do not swap it for a different angle halfway through.
- Keep the subject description identical in every prompt — copy-paste the character block, do not paraphrase.
- Fix the seed if the platform exposes it. Same seed plus same reference plus same prompt gives the most consistent results.
- Keep framing consistent — always medium close-up, always the same rough angle.
- Grade clips in your editor to match colour and exposure if they drift slightly.
For deeper technique, read consistent characters in Seedance — it is the companion to this post for multi-clip character work.
Common talking-head mistakes
| Mistake | Symptom | Fix |
|---|---|---|
| Expecting perfect lip sync from Seedance alone | Mouth flaps randomly, feels uncanny | Add a dedicated lip-sync tool in post |
| High motion intensity on faces | Identity drifts mid-clip | Cap at 45 for talking heads |
| Busy reference portrait | Model loses the face to background | Use a clean simple portrait |
| 10-second talking-head clips | Face morphs near the end | Use 5-second clips and stitch |
| Different reference per clip | Character looks inconsistent | One reference image, reuse it |
| Writing actual dialogue into the prompt | Prompt ignores the words | Describe motion, not dialogue content |
Worked example: a 30-second explainer with one character
Here is the full pipeline for a 30-second explainer featuring a single AI character delivering three lines to camera.
- Script three lines, each about 10 seconds of speech. Record voiceover (or generate with a TTS tool) as three separate audio files.
- Pick one clean portrait that fits the character. Medium close-up, soft lighting, simple background, neutral mouth.
- Generate three Seedance clips using the same portrait, the same prompt except for subtle mood variation ("calm and warm", "slightly excited", "thoughtful and serious"). Motion intensity 40, duration 5s each.
- Run each clip through a dedicated lip-sync tool with its matching audio file. You now have three video files with synced mouths.
- Open a video editor (CapCut, Premiere, Resolve). Drop the three synced clips on the timeline with brief crossfades.
- Layer background music at low volume, add captions for accessibility, grade all three clips to match colour.
- Export at 1080p vertical for social or 1080p horizontal for YouTube. Done.
Total time: 1–2 hours for someone who has done this pipeline before. The tricky parts are not the Seedance generations — they are the script, the audio, and the consistency across clips.
When Seedance is not the right tool for talking heads
Be honest about this. If your project needs a single character delivering 60+ seconds of dialogue with perfect sync and photoreal fidelity, Seedance is not the fastest path. A dedicated AI avatar tool will produce tighter sync with less work. Seedance wins when you want a unique stylised character, a non-photoreal look, or a specific cinematic framing that avatar tools cannot produce — and you are willing to add the lip-sync tool in post to make it work.
For audio pairing generally (not just dialogue), Seedance audio prompts covers the broader topic of making clips that work well with a soundtrack.
Related guides on this site
Talking-head work pulls in several other Seedance topics. These are the natural companions.
- How to Prompt Seedance for Audio-Friendly Clips
- How to Keep Characters Consistent in Seedance
- Reference Images for Characters in Seedance
Want to use AI tools more effectively?
My courses cover practical AI workflows, from spreadsheet automation to app development, with real projects and honest tool comparisons.
Browse AI courses