
The behaviour above occurs with the regular transcribe mode (the default, ie. So the 'exploit' is that the models can be used to parse English audio and then translate it to a supported language. Parts of the audio input that are not in English will be transcribed although depending on the language it might not always work or it might generate garbage translations. es but the audio input contains English then the English part of the input will be translated to Spanish. The undocumented glitch that was observed is that if you set a source language e.g. Yet, despite the help text above this can have potentially useful undocumented side effects. To perform language detection (default: None)

Language spoken in the audio, specify None The -language parameter is defined in the cli as: -language For your use case, this can transcribe to English text, but there has to be some an outside system to translate from English text to Polish text. So unfortunately there is no direct way, the model wasn't trained on it. An exception was made for a portion of the training data to match any spoken language to English text (X->en) translation. If they did not match, the sample was excluded. The dataset was cleaned by using a different model to match spoken language with text language. The question is if there is a known way to configure the models to do just text-to-text translation? Or is the behavior just some sort of glitch that is not something that can be 'exploited' or configured on a lower level that would allow using the models just for text translation between any of the supported languages?Īccording to a comment in the whisper's issue tracker this might be a possible answer:įrom the paper, the dataset that was used did not use any English audio to polish text samples. Yet the behavior mentioned above indicates that the models are capable of doing translation to other languages too. The docs for whisper mention translation to English as the only available target language (with the option -task translate in the command line version), but there is no mention of translating to other target languages. Important: in both cases no input language was specified, and no task type was passed (which implies the default -task transcribe). With the example input above either the entire sentence would be in English (with Chinese bits translated to English), or the entire sentence would be in Chinese (with the English bits translated to Chinese).

a fragment would be translated either into the first or the second language that appears in the audio. Yet, the same audio input on a different pass (with the same model, or a smaller/bigger model) would intermittently result in glitches where the entire sentence is being translated rather than transcribed.

So 多 and 几 are interchangeable and they can both mean several. Of course, new languages are being added all the time, so if you don't see the language or dialect you need in our list of supported languages, keep checking back.I noticed that transcribing speech in multiple languages with openai whisper speech-to-text library sometimes accurately recognizes inserts in another language and would provide the expected output, for example: 八十多个人 is the same as 八十几个人. Īnd our text translator is available in more than 60 languages for clear, seamless instant messaging. Our voice translator can currently translate conversations from following languages, including Arabic, Bulgarian, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, German, Greek, English (UK), English (US), Spanish (Spain), Spanish (Mexico), Estonian, Finnish, French (Canada), French (France), Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Lithuanian, Latvian, Malay, Norwegian Bokmål, Polish, Portuguese (Portugal), Portuguese (Brazil), Romanian, Russian, Slovak, Slovenian, Swedish, Thai, Turkish, Vietnamese. Whether you need to translate English to Spanish, English to French, or communicate in voice or text in dozens of languages, Skype can help you do it all in real time – and break down language barriers with your friends, family, clients and colleagues.
