人生处万类,知识最为贤。这篇文章主要讲述鸿蒙AI能力之语音识别相关的知识,希望能为你提供帮助。
【本文正在参与优质创作者激励】
文章旨在帮助大家开发录音及语音识别时少踩一点坑。
效果
文章图片
左侧为简易UI布局及识别成果,右侧为网易云播放的测试音频
开发步骤IDE安装、项目创建等在此略过。App采用SDK版本为API 6,使用JS UI
1.权限申请
AI语音识别不需要任何权限,但此处使用到麦克风录制音频,就需要申请麦克风权限。
在config.json配置文件中添加权限:
"reqPermissions": ["name": "ohos.permission.MICROPHONE"]
在MainAbility中显示申明麦克风权限
@Override
public void onStart(Intent intent)
super.onStart(intent);
requestPermission();
//获取权限
private void requestPermission()
String[] permission =
"ohos.permission.MICROPHONE",
;
List<
String>
applyPermissions = new ArrayList<
>
();
for (String element : permission)
if (verifySelfPermission(element) != 0)
if (canRequestPermission(element))
applyPermissions.add(element);
requestPermissionsFromUser(applyPermissions.toArray(new String[0]), 0);
2.创建音频录制的工具类
首先创建音频录制的工具类
AudioCaptureUtils
而音频录制需要用到
AudioCapturer
类,而在创建AudioCapture
类时又会用到AudiostreamInfo
类及AudioCapturerInfo
类,所以我们分别申明以上3个类的变量private AudioStreamInfo audioStreamInfo;
private AudioCapturer audioCapturer;
private AudioCapturerInfo audioCapturerInfo;
在语音识别时对音频的录制是由限制的,限制如下:
文章图片
所以我们在录制音频时需要注意:
1.采样率16000HZ
2.声道为单声道
3.仅支持普通话
作为工具类,为了使
AudioCaptureUtils
能多处使用,我们在创建构造函数时,提供声道与频率的参数重载,并在构造函数中初始化AudioStreamInfo
类及AudioCapturerInfo
类//channelMask 声道
//SampleRate 频率
public AudioCaptureUtils(AudioStreamInfo.ChannelMask channelMask, int SampleRate)
this.audioStreamInfo = new AudioStreamInfo.Builder()
.encodingFormat(AudioStreamInfo.EncodingFormat.ENCODING_PCM_16BIT)
.channelMask(channelMask)
.sampleRate(SampleRate)
.build();
this.audioCapturerInfo = new AudioCapturerInfo.Builder().audioStreamInfo(audioStreamInfo).build();
在init函数中进行
audioCapturer
的初始化,在初始化时对音效进行设置,默认为降噪模式//packageName 包名
public void init(String packageName)
this.init(SoundEffect.SOUND_EFFECT_TYPE_NS,packageName );
//soundEffect 音效uuid
//packageName 包名
public void init(UUID soundEffect, String packageName)
if (audioCapturer == null || audioCapturer.getState() == AudioCapturer.State.STATE_UNINITIALIZED)
audioCapturer = new AudioCapturer(this.audioCapturerInfo);
audioCapturer.addSoundEffect(soundEffect, packageName);
初始化后提供
start
、stop
和destory
方法,分别开启音频录制、停止音频录制和销毁,此处都是调用AudioCapturer
类中对应函数。public void stop()
this.audioCapturer.stop();
public void destory()
this.audioCapturer.stop();
this.audioCapturer.release();
public Boolean start()
if (audioCapturer == null)
return false;
return audioCapturer.start();
提供一个读取音频流的方法及获取
AudioCapturer
实例的方法//buffers 需要写入的数据流
//offset 数据流的偏移量
//byteslength 数据流的长度
public int read(byte[] buffers, int offset, int bytesLength)
return audioCapturer.read(buffers,offset,bytesLength);
//获取AudioCapturer的实例audioCapturer
public AudioCapturer get()
return this.audioCapturer;
3.创建语音识别的工具类
在上面我们已经创建好一个音频录制的工具类,接下来在创建一个语音识别的工具类
AsrUtils
我们再回顾一下语音识别的约束与限制:
文章图片
在此补充一个隐藏限制,PCM流的长度只允许640与1280两种长度,也就是我们音频读取流时只能使用640与1280两种长度。
接下来我们定义一些基本常量:
//采样率限定16000HZ
private static final int VIDEO_SAMPLE_RATE = 16000;
//VAD结束时间 默认2000ms
private static final int VAD_END_WAIT_MS = 2000;
//VAD起始时间 默认4800ms
//这两参数与识别准确率有关,相关信息可百度查看,在此使用系统默认
private static final int VAD_FRONT_WAIT_MS = 4800;
//输入时常 20000ms
private static final int TIMEOUT_DURATION = 20000;
//PCM流长度仅限640或1280
private static final int BYTES_LENGTH = 1280;
//线程池相关参数
private static final int CAPACITY = 6;
private static final int ALIVE_TIME = 3;
private static final int POOL_SIZE = 3;
因为要在后台持续录制音频,所以需要开辟一个新的线程。此处用到java的
ThreadPoolExecutor
类进行线程操作。定义一个线程池实例以及其它相关属性如下:
//录音线程
private ThreadPoolExecutor poolExecutor;
/* 自定义状态信息
**错误:-1
**初始:0
**init:1
**开始输入:2
**结束输入:3
**识别结束:5
**中途出识别结果:9
**最终识别结果:10
*/
public int state = 0;
//识别结果
public String result;
//是否开启语音识别
//当开启时才写入PCM流
boolean isStarted = false;
//ASR客户端
private AsrClient asrClient;
//ASR监听对象
private AsrListener listener;
AsrIntent asrIntent;
//音频录制工具类
private AudioCaptureUtils audioCaptureUtils;
在构造函数中初始化相关属性
public AsrUtils(Context context)
//实例化一个单声道,采集频率16000HZ的音频录制工具类实例
this.audioCaptureUtils = new AudioCaptureUtils(AudioStreamInfo.ChannelMask.CHANNEL_IN_MONO, VIDEO_SAMPLE_RATE);
//初始化降噪音效
this.audioCaptureUtils.init("com.panda_coder.liedetector");
//结果值设为空
this.result = "";
//给录音控件初始化一个新的线程池
poolExecutor = new ThreadPoolExecutor(
POOL_SIZE,
POOL_SIZE,
ALIVE_TIME,
TimeUnit.SECONDS,
new LinkedBlockingQueue<
>
(CAPACITY),
new ThreadPoolExecutor.DiscardOldestPolicy());
if (asrIntent == null)
asrIntent = new AsrIntent();
//设置音频来源为PCM流
//此处也可设置为文件
asrIntent.setAudioSourceType(AsrIntent.AsrAudioSrcType.ASR_SRC_TYPE_PCM);
asrIntent.setVadEndWaitMs(VAD_END_WAIT_MS);
asrIntent.setVadFrontWaitMs(VAD_FRONT_WAIT_MS);
asrIntent.setTimeoutThresholdMs(TIMEOUT_DURATION);
if (asrClient == null)
//实例化AsrClient
asrClient = AsrClient.createAsrClient(context).orElse(null);
if (listener == null)
//实例化MyAsrListener
listener = new MyAsrListener();
//初始化AsrClient
this.asrClient.init(asrIntent, listener);
//够建一个实现AsrListener接口的类MyAsrListener
class MyAsrListener implements AsrListener @Override
public void onInit(PacMap pacMap)
HiLog.info(TAG, "====== init");
state = 1;
@Override
public void onBeginningOfSpeech()
state = 2;
@Override
public void onRmsChanged(float v) @Override
public void onBufferReceived(byte[] bytes) @Override
public void onEndOfSpeech()
state = 3;
@Override
public void onError(int i)
state = -1;
if (i == AsrError.ERROR_SPEECH_TIMEOUT)
//当超时时重新监听
asrClient.startListening(asrIntent);
else
HiLog.info(TAG, "======error code:" + i);
asrClient.stopListening();
//注意与onIntermediateResults获取结果值的区别
//pacMap.getString(AsrResultKey.RESULTS_RECOGNITION);
@Override
public void onResults(PacMap pacMap)
state = 10;
//获取最终结果
//"result":["confidence":0,"ori_word":"你 好 ","pinyin":"NI3 HAO3 ","word":"你好。"]
String results = pacMap.getString(AsrResultKey.RESULTS_RECOGNITION);
ZSONObject zsonObject = ZSONObject.stringToZSON(results);
ZSONObject infoObject;
if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject)
infoObject = zsonObject.getZSONArray("result").getZSONObject(0);
String resultWord = infoObject.getString("ori_word").replace(" ", "");
result += resultWord;
//中途识别结果
//pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE)
@Override
public void onIntermediateResults(PacMap pacMap)
state = 9;
//String result = pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE);
//if (result == null)
//return;
//ZSONObject zsonObject = ZSONObject.stringToZSON(result);
//ZSONObject infoObject;
//if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject)
//infoObject = zsonObject.getZSONArray("result").getZSONObject(0);
//String resultWord = infoObject.getString("ori_word").replace(" ", "");
//HiLog.info(TAG, "=========== 9 " + resultWord);
//@Override
public void onEnd()
state = 5;
//当还在录音时,重新监听
if (isStarted)
asrClient.startListening(asrIntent);
@Override
public void onEvent(int i, PacMap pacMap) @Override
public void onAudioStart()
state = 2;
@Override
public void onAudioEnd()
state = 3;
开启识别与停止识别的函数
public void start()
if (!this.isStarted)
this.isStarted = true;
asrClient.startListening(asrIntent);
poolExecutor.submit(new AudioCaptureRunnable());
public void stop()
this.isStarted = false;
asrClient.stopListening();
audioCaptureUtils.stop();
//音频录制的线程
private class AudioCaptureRunnable implements Runnable
@Override
public void run()
byte[] buffers = new byte[BYTES_LENGTH];
//开启录音
audioCaptureUtils.start();
while (isStarted)
//读取录音的PCM流
int ret = audioCaptureUtils.read(buffers, 0, BYTES_LENGTH);
if (ret <
= 0)
HiLog.error(TAG, "======Error read data");
else
//将录音的PCM流写入到语音识别服务中
//若buffer的长度不为1280或640时,则需要手动处理成1280或640
asrClient.writePcm(buffers, BYTES_LENGTH);
识别结果是通过listener的回调获取的结果,所以我们在处理时是将结果赋值给result,通过getresult或getResultAndClear函数获取结果。
public String getResult()
return result;
public String getResultAndClear()
if (this.result == "")
return "";
String results = getResult();
this.result = "";
return results;
4.创建一个简易的JS UI,并通过JS调ServerAbility的能力调用Java
hml代码
<
div class="container">
<
div>
<
button class="btn" @touchend="start">
开启<
/button>
<
button class="btn" @touchend="sub">
订阅结果<
/button>
<
button class="btn" @touchend="stop">
关闭<
/button>
<
/div>
<
text class="title">
语音识别内容:text
<
/text>
<
/div>
样式代码
.container
flex-direction: column;
justify-content: flex-start;
align-items: center;
width: 100%;
height: 100%;
padding: 10%;
.title
font-size: 20px;
color: #000000;
opacity: 0.9;
text-align: left;
width: 100%;
margin: 3% 0;
.btn
padding: 10px 20px;
margin:3px;
border-radius: 6px;
js逻辑控制代码
//js调Java ServiceAbility的工具类
importjsCallJavaAbilityfrom ../../common/JsCallJavaAbilityUtils.js;
export default
data:
text: ""
,
//开启事件
start()
jsCallJavaAbility.callAbility("ControllerAbility",100,).then(result=>
console.log(result)
)
,
//关闭事件
stop()
jsCallJavaAbility.callAbility("ControllerAbility",101,).then(result=>
console.log(result)
)
jsCallJavaAbility.unSubAbility("ControllerAbility",201).then(result=>
if (result.code == 200)
console.log("取消订阅成功");
)
,
//订阅Java端结果事件
sub()
jsCallJavaAbility.subAbility("ControllerAbility", 200, (data) =>
let text = data.data.text
text &
&
(this.text += text)
).then(result =>
if (result.code == 200)
console.log("订阅成功");
)
ServerAbility
public class ControllerAbility extends Ability
AnswerRemote remote = new AnswerRemote();
AsrUtils asrUtils;
//订阅事件的委托
private static HashMap<
Integer, IRemoteObject>
remoteObjectHandlers = new HashMap<
Integer, IRemoteObject>
();
@Override
public void onStart(Intent intent)
HiLog.error(LABEL_LOG, "ControllerAbility::onStart");
super.onStart(intent);
//初始化语音识别工具类
asrUtils = new AsrUtils(this);
@Override
public void onCommand(Intent intent, boolean restart, int startId) @Override
public IRemoteObject onConnect(Intent intent)
super.onConnect(intent);
return remote.asObject();
class AnswerRemote extends RemoteObject implements IRemoteBroker
AnswerRemote()
super("");
@Override
public boolean onRemoteRequest(int code, MessageParcel data, MessageParcel reply, MessageOption option)
Map<
String, Object>
zsonResult = new HashMap<
String, Object>
();
String zsonStr = data.readString();
ZSONObject zson = ZSONObject.stringToZSON(zsonStr);
switch (code)
case 100:
//当js发送code为100时,开启语音识别
asrUtils.start();
break;
case 101:
//当js发送code为101时,关闭语音识别
asrUtils.stop();
break;
case 200:
//当js发送code为200时,订阅获取识别结果事件
remoteObjectHandlers.put(200 ,data.readRemoteObject());
//定时获取语音识别结果并返回JS UI
getAsrText();
break;
default:
reply.writeString("service not defined");
return false;
reply.writeString(ZSONObject.toZSONString(zsonResult));
return true;
@Override
public IRemoteObject asObject()
return this;
public void getAsrText()
new Thread(() ->
while (true)
try
Thread.sleep(1 * 500);
Map<
String, Object>
zsonResult = new HashMap<
String, Object>
();
zsonResult.put("text",asrUtils.getResultAndClear());
ReportEvent(200, zsonResult);
catch (RemoteException | InterruptedException e)
break;
).start();
private void ReportEvent(int remoteHandler, Object backData) throws RemoteException
MessageParcel data = https://www.songbingjia.com/android/MessageParcel.obtain();
MessageParcel reply = MessageParcel.obtain();
MessageOption option = new MessageOption();
data.writeString(ZSONObject.toZSONString(backData));
IRemoteObject remoteObject = remoteObjectHandlers.get(remoteHandler);
remoteObject.sendRequest(100, data, reply, option);
reply.reclaim();
data.reclaim();
至此简易的语音识别功能完毕。
相关演示:https://www.bilibili.com/video/BV1E44y177hv/
完整代码开源:https://gitee.com/panda-coder/harmonyos-apps/tree/master/AsrDemo
想了解更多关于鸿蒙的内容,请访问:
51CTO和华为官方合作共建的鸿蒙技术社区
https://harmonyos.51cto.com/#bkwz
::: hljs-center
文章图片
【鸿蒙AI能力之语音识别】:::
推荐阅读
- Linux之find常用命令汇总
- AI实战分享 | 基于CANN的辅助驾驶应用案例
- Flink CDC 系列 - Flink MongoDB CDC 在 XTransfer 的生产实践
- #yyds干货盘点#Hyperf结合PhpOffice/PhpSpreadsheet实现Excel&CSV文件导出导入
- QCon-小布助手对话系统工程实践
- #私藏项目实操分享#Netty技术专题「原理分析系列」Netty强大特性之ByteBuf零拷贝技术原理分析
- #私藏项目实操分享# Angular Change Detection 的学习笔记
- 批量导出设备TPM信息---Intune终结点管理
- #2021年底大盘点#TCPIP协议-多路复用