自然语言处理|自然语言处理（中文分句）——————中文逆向最大匹配，文章中含有验证源码 python|自然语言处理

【自然语言处理|自然语言处理（中文分句）——————中文逆向最大匹配，文章中含有验证源码】首先准备一个txt文件，这是一个库。
按行循环读取txt(库)中的字符串存入数组
输入一句话，逆向进行数据字典比对，从后往前数n个字，n为数组中单个最大字符串
一样则保留，在比较其他的

class IMM(object): def __init__(self,dic_path): #给个这样子的集合 self.dictionary=set() #字典里面最大常数for example :m=5 self.maximum=0 #读取字典 with open(dic_path,"r",encoding="utf8") as f: for line in f: #去除空格 line=line.strip() if not line: continue self.dictionary.add(line) if len(line)>self.maximum: self.maximum=len(line) def cut(self,text): result=[] index=len(text) while index>0: word=None for size in range(self.maximum,0,-1): if index-size<0: continue piece=text[(index-size):index] if piece in self.dictionary: word=piece result.append(word) index-=size break if word is None: index-=1 returnresult[::-1]if __name__=="__main__": text="南京市长江大桥" tokenizer=IMM("data/imm_dic.txt") print(tokenizer.cut(text))