【NET|NET6使用PaddleOCR识别图片中的文字信息】最近发现一个开源的OCR项目,PaddleOCR,支持通过离线部署Hub Serving服务来识别和本地程序包识别。
运行环境 :Windows 10
开发工具: Visual Studio 2022
NET版本:NET6
需要安装的程序包:PaddleOCR,版本:0.0.5 。以及PaddleOCRUtf8,版本:0.0.5
刚刚开始时候使用PaddleOCR来识别,发现英文和数字可以成功识别,准确率还很高。后面发现识别中文的时候,出现中文乱码(识别模型都是用的同一个)。后面用PaddleOCRUtf8包识别,发现可以解决中文乱码的问题,如下图:
识别图片:
文章图片
识别的基础代码如下:
using System.Text;
using System.Text.Json;
namespace JuCheap_Demo_OCR
{
internal class PaddleOCRService
{
//基础路径
private readonly static string _basePath = AppDomain.CurrentDomain.BaseDirectory;
//识别图片的路径
private readonly static string _imagePath = $"{_basePath}\\id_card.jpg";
private readonly string _detPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_det_infer";
private readonly string _recPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_rec_infer";
private readonly string _clsPath = $"{_basePath}\\PaddleModel\\ch_ppocr_mobile_v2.0_cls_infer";
private readonly string _charListFileListPath = $"{_basePath}\\PaddleModel\\chinese_zh_dict.txt";
private readonly string _fileBase64 = Convert.ToBase64String(File.ReadAllBytes(_imagePath), Base64FormattingOptions.None);
///
/// PaddleOCR包本地识别
///
public async Task RecognizeByPaddleOCR()
{
WriteOneLine();
//通过本地程序包识别(英文和数字可以。中文会出现乱码)
PaddleOCR.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true);
var result = await PaddleOCR.PaddleOCR.Recognize(_imagePath);
foreach (var box in result.Boxes)
{
Console.WriteLine($"PaddleOCR本地包识别结果={box.Text},信任度={box.Score}");
}
}///
/// PaddleOCRUtf8本地识别
///
public async Task RecognizeByPaddleOCRUtf8()
{
WriteOneLine();
//解决中文乱码问题
PaddleOCRUtf8.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true);
var resultUtf8 = await PaddleOCRUtf8.PaddleOCR.Recognize(_imagePath);
foreach (var box in resultUtf8.Boxes)
{
Console.WriteLine($"PaddleOCRUtf8本地包识别结果={box.Text},信任度={box.Score}");
}
}///
/// 使用python搭建的HubServing解析服务识别
///
public async Task RecognizeByHubServing()
{
WriteOneLine();
try
{
//通过hub ocr_system识别
var client = new HttpClient();
client.BaseAddress = new Uri("http://127.0.0.1:8866/");
var postData = https://www.it610.com/article/new
{
images = new string[] { _fileBase64 }
};
var content = new StringContent(JsonSerializer.Serialize(postData), Encoding.UTF8,"application/json");
var response = await client.PostAsync("predict/ocr_system", content);
var responseContent = await response.Content.ReadAsStringAsync();
var responseResult = JsonSerializer.Deserialize(responseContent);
if (responseResult != null && responseResult.Data != null)
{
foreach (var items in responseResult.Data)
{
foreach (var box in items)
{
Console.WriteLine($"HubServing识别结果={box.Text},信任度={box.Confidence}");
}
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Hub Serving识别异常:{ex.Message}");
}WriteOneLine();
}private void WriteOneLine()
{
Console.WriteLine($"--------------------------------------------------------------------------------------------------");
}
}
}
识别结果:
文章图片
源代码:
https://gitee.com/jucheap/demo
里面的JuCheap-Demo-OCR项目,直接运行,可以看到效果。
总结:本地包的识别,多少会有点问题,比如:【公民身份证】没有识别完整。推荐使用Hub Serving来搭建服务识别。准确率更高。
推荐阅读
- PaddleOCR|基于PaddleOCR银行卡识别实现(一)
- ASP.NET Web Forms – ArrayList 对象简介
- ASP.NET Web Forms – Button 控件简介
- 在 ASP.NET Core 中使用 HTTP 标头传播
- 到ASP.NET Core或Vue,DevExtreme包
- net|net java python_TIOBE 6 月编程语言排行榜(Python 势不可挡,或在四年之内超越 Java、C--中享思途...)
- ASP.NET CoreMVC 控制器的模型绑定(宏观篇)
- Asp.Net Core Swagger 页面适配 Nginx 二级目录 | 完美解决方案 #yyds干货盘点#
- ASP.NET Core 自动刷新JWT Token #yyds干货盘点#