Starting from a single topic keyword, the B-roll Science Explainer automatically gathers research, writes a voiceover script, generates bilingual Chinese-English TTS narration, then breaks each line down to subject-verb-object level to match visuals, insert famous paintings precisely, remove duplicates, and burn in subtitles. In this example, using “Impressionism” as the topic, multiple rounds of visual refinement produced a 76-second landscape explainer video with sentence-level alignment between visuals and script.




Just provide a science or educational topic keyword, and the Skill automatically searches multiple sources in parallel, extracts the key information, and turns it into a voiceover script—without the user needing to prepare any materials.
The Skill automatically selects a voiceover style based on the tone of the content, and supports one-click generation of high-quality Chinese and English TTS audio, with delivery suited to documentary or explainer-creator formats.
Each subtitle line can be broken down to subject-verb-object level for visual planning, ensuring that when a critic is mentioned you see a critic, and when a Monet work is mentioned you see the corresponding painting—eliminating mismatches between visuals and script.
The Skill tracks every asset that has already been used and automatically avoids repeating it. It also supports fine-grained pacing instructions such as rapid cuts for parallel phrasing and flash appearances for nouns, making the final edit feel more professional.
Voiceover audio, B-roll visuals, and burned-in subtitles are composited within a single workflow, producing a publish-ready landscape explainer video with no need for extra editing software.