
Python语音转换为文本指南:学习如何使用语音识别 Python 库执行语音识别,将音频语音转换为 Python 中的文本。Python如何将语音转换为文本?语音识别是计算机软件识别口语中的单词和短语并将其转换为人类可读文本的能力。在本教程中,你将学习如何使用SpeechRecognition 库在 Python 中将语音转换为文本 。因此,我们不需要从头开始构建任何机器学习模型,该库为我们提供了各种众所周知的公共语音识别 API(例如 Google Cloud Speech API、IBM Speech To Text 等)的便捷包装器。推荐阅读:如何在 Python 中翻译文本。Python将语音转换为文本示例 - 好的,让我们开始吧,使用pip以下命令安装库:

pip3 install SpeechRecognition pydub

好的,打开一个新的 Python 文件并导入它:
import speech_recognition as sr

  • CMU Sphinx(离线)
  • 谷歌语音识别
  • 谷歌云语音 API
  • 人工智能
  • 微软必应语音识别
  • 猎犬API
  • IBM 语音转文本
  • Snowboy 启动指令检测(离线)
我们将在这里使用 Google 语音识别,因为它很简单,不需要任何 API 密钥。从文件中读取Python语音转换为文本指南:确保当前目录中有一个包含英语语音的音频文件(如果你想跟我一起学习,请在此处获取音频文件):
filename = "16-122828-0002.wav"

这个文件是从LibriSpeech数据集中抓取的,但你可以使用任何你想要的音频 WAV 文件,只需更改文件名,让我们初始化我们的语音识别器:
# initialize the recognizer r = sr.Recognizer()

Python将语音转换为文本示例:以下代码负责加载音频文件,并使用 Google Speech Recognition 将语音转换为文本:
# open the file with sr.AudioFile(filename) as source: # listen for the data (load audio to memory) audio_data = # recognize (convert from speech to text) text = r.recognize_google(audio_data) print(text)

这将需要几秒钟才能完成,因为它将文件上传到 Google 并获取输出,这是我的结果:
I believe you're just talking nonsense

# importing libraries import speech_recognition as sr import os from pydub import AudioSegment from pydub.silence import split_on_silence# create a speech recognition object r = sr.Recognizer()# a function that splits the audio file into chunks # and applies speech recognition def get_large_audio_transcription(path): """ Splitting the large audio file into chunks and apply speech recognition on each of these chunks """ # open the audio file using pydub sound = AudioSegment.from_wav(path) # split audio sound where silence is 700 miliseconds or more and get chunks chunks = split_on_silence(sound, # experiment with this value for your target audio file min_silence_len = 500, # adjust this per requirement silence_thresh = sound.dBFS-14, # keep the silence for 1 second, adjustable as well keep_silence=500, ) folder_name = "audio-chunks" # create a directory to store the audio chunks if not os.path.isdir(folder_name): os.mkdir(folder_name) whole_text = "" # process each chunk for i, audio_chunk in enumerate(chunks, start=1): # export audio chunk and save it in # the `folder_name` directory. chunk_filename = os.path.join(folder_name, f"chunk{i}.wav") audio_chunk.export(chunk_filename, format="wav") # recognize the chunk with sr.AudioFile(chunk_filename) as source: audio_listened = r.record(source) # try converting it to text try: text = r.recognize_google(audio_listened) except sr.UnknownValueError as e: print("Error:", str(e)) else: text = f"{text.capitalize()}. " print(chunk_filename, ":", text) whole_text += text # return the text for all chunks detected return whole_text

path = "7601-291468-0006.wav" print("\nFull text:", get_large_audio_transcription(path))

audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat. audio-chunks\chunk2.wav : At a short distance from the city. audio-chunks\chunk3.wav : Just at what is now called dutch street. audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity. audio-chunks\chunk5.wav : Patent smokejacks. audio-chunks\chunk6.wav : It required a horse to work some. audio-chunks\chunk7.wav : Dutch oven roasted meat without fire. audio-chunks\chunk8.wav : Carts that went before the horses. audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances. audio-chunks\chunk10.wav : So just understand can found it all beholders. Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.

pip3 install pyaudio

sudo apt-get install python-pyaudio python3-pyaudio pip3 install pyaudio

你需要先安装portaudio,然后你就可以pip install它了:
brew install portaudio pip3 install pyaudio

with sr.Microphone() as source: # read the audio data from the default microphone audio_data =, duration=5) print("Recognizing...") # convert speech to text text = r.recognize_google(audio_data) print(text)

这将听到你的麦克风 5 秒钟,然后尝试将该语音转换为文本!它和前面的代码非常相似,但是我们在这里使用Microphone()对象从默认麦克风读取音频,然后我们使用record()函数中的持续时间参数在5 秒后停止读取,然后上传音频数据到谷歌以获取输出文本。你还可以在record()函数中使用offset参数在offset秒后开始记录。此外,你可以通过将语言参数传递给identify_google()函数来识别不同的语言。例如,如果你想识别西班牙语语音,你可以使用:
text = r.recognize_google(audio_data, language="es-ES")

在此 stackoverflow 答案中查看支持的语言。Python语音转换为文本指南总结Python如何将语音转换为文本?如你所见,使用此库将语音转换为文本非常简单。这个库在野外被广泛使用,查看他们的官方文档。如果你不想使用 Python 并且想要一个自动为你执行此操作的服务,我建议你使用 audext,它可以快速且经济高效地将你的音频在线转换为文本。一探究竟!如果你还想在 Python中将文本转换为语音,请查看本教程。
