Today I tested a video tutorial hastily created by me in English. From the point of view of the English language, I am not at a native level and to be sure that the content in English was correct, I used two tools: automatic translation with subtitles and OpenAI’s Whisper algorithm. About the automatic translation of YouTube videos, you already know that it works. Here’s what you should know about OpenAI’s Whisper:
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web., see OpenAI’s Whisper webpage.
To use OpenAI’s Whisper with a video I used the tools from the website Hugging Face, more precisely one created by jeffistyping known as Jeff.
This tool can be found no this webpage.
These tools can allow you to report your video content to a neutral supervisor based on artificial intelligence and not on a human factor.
Processing with OpenAI’s Whisper required several interventions and settings until it was possible to extract a text from the video.
You can see how my pronunciation of “add” was understood as “had“.
The result created by OpenAI’s Whisper is this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | 0.0,17.0, Hello from the last video, I try to fix some issue and I create some new feature with game. 17.0,27.0, I add an interface, you can see this avatar interface. 27.0,36.0, I create some object in Blender 3D, like Pine. 36.0,45.0, Pine was downloaded from ScringeFrab and I create was Snow on this spine. 45.0,53.0, Also I created Rock with Snow on this rock. 53.0,56.0, And I create an elevator. 56.0,62.0, And also I try to create an NPC, Polar Beer. 62.0,77.0, Polar Beer is not working very well because Blender 3D don't export FBX file with an avatar. 77.0,85.0, I need to search to find this problem and fix it. 85.0,97.0, The animation on FBX file is packet and can be shown Unity 3D. 97.0,115.0, But when I try to use the animator, the animator needs to know about the avatar in order to move and use the animation. 115.0,120.0, And let's make a demo. 120.0,128.0, How is working? 128.0,139.0, You can see the white, the playworks of Crate, the rock. 139.0,153.0, Snow on the body is not very good, as the X-TRAD, the X-TRAD is dirty, and you fix it. 153.0,171.0, Wait the interface with the avatar and the appears Polar Beer. 171.0,199.0, Let's make a demo. 199.0,225.0, This is the animation type. 225.0,235.0, Let's play the animation. 235.0,243.0, I have to animation. 243.0,251.0, This is the common error. 251.0,261.0, Bye bye. |
You will see in the two images how OpenAI’s Whisper understands my pronunciation better than the YouTube tool, but from data processing, it is also normal.