The AI model is trained on videos featuring human voices. Its performance can be significantly affected by:
- Background music or noise
- Extended periods without talking
Additionally, the output may be unstable if the video contains multiple languages. Since your video contains Chinese and English, some English subtitles might be missing (but not necessarily) if you choose the video language as Chinese.
We recommend uploading an edited audio file that removes any non-talking segments lasting more than a minute.