Python如何将语音转换为文本(详细实现指南)

Python语音转换为文本指南:学习如何使用语音识别 Python 库执行语音识别,将音频语音转换为 Python 中的文本。Python如何将语音转换为文本?语音识别是计算机软件识别口语中的单词和短语并将其转换为人类可读文本的能力。在本教程中,你将学习如何使用SpeechRecognition 库在 Python 中将语音转换为文本 。因此,我们不需要从头开始构建任何机器学习模型,该库为我们提供了各种众所周知的公共语音识别 API(例如 Google Cloud Speech API、IBM Speech To Text 等)的便捷包装器。推荐阅读:如何在 Python 中翻译文本。Python将语音转换为文本示例 - 好的,让我们开始吧,使用pip以下命令安装库:

pip3 install SpeechRecognition pydub

好的,打开一个新的 Python 文件并导入它:
import speech_recognition as sr

这个库的好处是它支持多种识别引擎:
  • CMU Sphinx(离线)
  • 谷歌语音识别
  • 谷歌云语音 API
  • 人工智能
  • 微软必应语音识别
  • 猎犬API
  • IBM 语音转文本
  • Snowboy 启动指令检测(离线)
我们将在这里使用 Google 语音识别,因为它很简单,不需要任何 API 密钥。从文件中读取Python语音转换为文本指南:确保当前目录中有一个包含英语语音的音频文件(如果你想跟我一起学习,请在此处获取音频文件):
filename = "16-122828-0002.wav"

这个文件是从LibriSpeech数据集中抓取的,但你可以使用任何你想要的音频 WAV 文件,只需更改文件名,让我们初始化我们的语音识别器:
# initialize the recognizer r = sr.Recognizer()

Python将语音转换为文本示例:以下代码负责加载音频文件,并使用 Google Speech Recognition 将语音转换为文本:
# open the file with sr.AudioFile(filename) as source: # listen for the data (load audio to memory) audio_data = https://www.lsbin.com/r.record(source) # recognize (convert from speech to text) text = r.recognize_google(audio_data) print(text)

这将需要几秒钟才能完成,因为它将文件上传到 Google 并获取输出,这是我的结果:
I believe you're just talking nonsense

上面的代码适用于中小型音频文件。在下一节中,我们将为大文件编写代码。读取大型音频文件Python如何将语音转换为文本?如果你想对长音频文件执行语音识别,那么下面的函数可以很好地处理:
# importing libraries import speech_recognition as sr import os from pydub import AudioSegment from pydub.silence import split_on_silence# create a speech recognition object r = sr.Recognizer()# a function that splits the audio file into chunks # and applies speech recognition def get_large_audio_transcription(path): """ Splitting the large audio file into chunks and apply speech recognition on each of these chunks """ # open the audio file using pydub sound = AudioSegment.from_wav(path) # split audio sound where silence is 700 miliseconds or more and get chunks chunks = split_on_silence(sound, # experiment with this value for your target audio file min_silence_len = 500, # adjust this per requirement silence_thresh = sound.dBFS-14, # keep the silence for 1 second, adjustable as well keep_silence=500, ) folder_name = "audio-chunks" # create a directory to store the audio chunks if not os.path.isdir(folder_name): os.mkdir(folder_name) whole_text = "" # process each chunk for i, audio_chunk in enumerate(chunks, start=1): # export audio chunk and save it in # the `folder_name` directory. chunk_filename = os.path.join(folder_name, f"chunk{i}.wav") audio_chunk.export(chunk_filename, format="wav") # recognize the chunk with sr.AudioFile(chunk_filename) as source: audio_listened = r.record(source) # try converting it to text try: text = r.recognize_google(audio_listened) except sr.UnknownValueError as e: print("Error:", str(e)) else: text = f"{text.capitalize()}. " print(chunk_filename, ":", text) whole_text += text # return the text for all chunks detected return whole_text

注意:你需要安装Pydub使用pip上述代码才能工作。上面的函数使用split_on_silence()来自pydub.silence模块的函数将音频数据拆分为静音的块。min_silence_len参数是用于拆分的最小静音长度。silence_thresh是任何比这更安静的东西都将被视为静音的阈值,我已将其设置为平均dBFS减去14,keep_silence参数是在检测到的每个块的开始和结束时离开的静音量(以毫秒为单位)。这些参数并非适用于所有声音文件,请尝试根据你的大型音频需求尝试使用这些参数。之后,我们遍历所有块并将每个语音音频转换为文本并将它们加在一起,这是一个运行示例:
path = "7601-291468-0006.wav" print("\nFull text:", get_large_audio_transcription(path))

注意:你可以在此处获取7601-291468-0006.wav文件。输出:
audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat. audio-chunks\chunk2.wav : At a short distance from the city. audio-chunks\chunk3.wav : Just at what is now called dutch street. audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity. audio-chunks\chunk5.wav : Patent smokejacks. audio-chunks\chunk6.wav : It required a horse to work some. audio-chunks\chunk7.wav : Dutch oven roasted meat without fire. audio-chunks\chunk8.wav : Carts that went before the horses. audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances. audio-chunks\chunk10.wav : So just understand can found it all beholders. Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.

因此,该函数会自动为我们创建一个文件夹,并放置我们指定的原始音频文件的块,然后对所有这些块运行语音识别。从麦克风阅读Python如何将语音转换为文本?这需要在你的机器上安装PyAudio,以下是安装过程,具体取决于你的操作系统:视窗
你可以直接pip安装它:
pip3 install pyaudio

Linux
你需要先安装依赖项:
sudo apt-get install python-pyaudio python3-pyaudio pip3 install pyaudio

苹果系统
你需要先安装portaudio,然后你就可以pip install它了:
brew install portaudio pip3 install pyaudio

Python将语音转换为文本示例:现在让我们使用我们的麦克风来转换我们的语音:
with sr.Microphone() as source: # read the audio data from the default microphone audio_data = https://www.lsbin.com/r.record(source, duration=5) print("Recognizing...") # convert speech to text text = r.recognize_google(audio_data) print(text)

这将听到你的麦克风 5 秒钟,然后尝试将该语音转换为文本!它和前面的代码非常相似,但是我们在这里使用Microphone()对象从默认麦克风读取音频,然后我们使用record()函数中的持续时间参数在5 秒后停止读取,然后上传音频数据到谷歌以获取输出文本。你还可以在record()函数中使用offset参数在offset秒后开始记录。此外,你可以通过将语言参数传递给identify_google()函数来识别不同的语言。例如,如果你想识别西班牙语语音,你可以使用:
text = r.recognize_google(audio_data, language="es-ES")

在此 stackoverflow 答案中查看支持的语言。Python语音转换为文本指南总结Python如何将语音转换为文本?如你所见,使用此库将语音转换为文本非常简单。这个库在野外被广泛使用,查看他们的官方文档。如果你不想使用 Python 并且想要一个自动为你执行此操作的服务,我建议你使用 audext,它可以快速且经济高效地将你的音频在线转换为文本。一探究竟!如果你还想在 Python中将文本转换为语音,请查看本教程。

    推荐阅读