Stanford|StandFord的parser的调用API API|Stanford|Parser|API|Parser

注意 Parser针对已分好词的中文句子生成语义生成树。
Parser下载地址 https://nlp.stanford.edu/software/lex-parser.shtml
API（Java） 【Stanford|StandFord的parser的调用API】将Jar包导入到项目后在Java程序中import一下

import java.util.ArrayList; import java.util.Collection; import java.util.List; import java.io.*; import edu.stanford.nlp.process.DocumentPreprocessor; import edu.stanford.nlp.ling.HasWord; import edu.stanford.nlp.ling.TaggedWord; import edu.stanford.nlp.trees.*; import edu.stanford.nlp.parser.lexparser.LexicalizedParser;

不带词性的句子在将不带词性分好词的句子输入到Parser中，会被Parser自动的进行词性标注，然后生成语义生成树。下面的代码是基于Parser一起提供的样例代码修改的。由于这个样例代码中使用的读封装好的类来读取句子，按照提供的代码去跑自己的数据发现这个样例本身是为一个句子设计的，也就是说它不会自动断句。自己没有找到相关的说明文档，所以在这个样例基础之上简单地手动切分句子。其中，String类型的doc\oh\en\up变量分别是代表着几种句子结尾的标点符号。
这里使用的文本文件作为输入，同时将生成树也输出到文本文件。

class ParserDemo { public static void main(String[] args) { String parserModel = "edu\\stanford\\nlp\\models\\lexparser\\chinesePCFG.ser.gz"; String testFile = "C:\\Users\\codinglee\\Desktop\\NLP\\Project_coding\\data\\test.txt"; String outTest = "C:\\Users\\codinglee\\Desktop\\NLP\\Project_coding\\cpp\\cpp\\testTree.txt"; demoDP(parserModel, testFile, outTest); }/** * demoDP demonstrates turning a file into tokens and then parse * trees.Note that the trees are printed by calling pennPrint on * the Tree object.It is also possible to pass a PrintWriter to * pennPrint if you want to capture the output. * This code will work with any supported language. */ public static void demoDP(String parserModel,String filename, String outname) { // This option shows loading, sentence-segmenting and tokenizing // a file using DocumentPreprocessor. LexicalizedParser lp = LexicalizedParser.loadModel(parserModel); TreebankLanguagePack tlp = lp.treebankLanguagePack(); // a PennTreebankLanguagePack for Englishtry { FileWriter fw=new FileWriter(outname); PrintWriter pw=new PrintWriter(fw); for (List sentence : new DocumentPreprocessor(filename)) { String doc="。"; String oh = "！"; String en = "？"; String up = "''"; int n = sentence.size(), cur = 0, next = 0, step = 1; for (; cur "); System.out.print(cur); System.out.print(", "); System.out.print(next); System.out.print(": "); System.out.println(sentence.subList(cur,next)); Tree parse = lp.apply(sentence.subList(cur,next)); parse.pennPrint(pw); cur = next; step += 1; } } } pw.close(); } catch (IOException e) { e.printStackTrace(); } } }

带词性标注的句子自己有带词性标注好句子时，希望使用自己的这个词性标注时，会用到TaggedWord来为单词添加词性。下面代码中“dev.txt”保存的分好词的句子，“devAttr.txt”是对应“dev.txt”中单词的词性标注。“devTree.txt”保存生成的语义生成树。

class ParserDemo { public static void main(String[] args) { String parserModel = "edu\\stanford\\nlp\\models\\lexparser\\chinesePCFG.ser.gz"; String leeWord = "C:\\Users\\codinglee\\Desktop\\NLP\\Project_coding\\data\\dev.txt"; String leeAttr = "C:\\Users\\codinglee\\Desktop\\NLP\\Project_coding\\data\\devAttr.txt"; String leeOut = "C:\\Users\\codinglee\\Desktop\\NLP\\Project_coding\\data\\devTree.txt"; demoLee(parserModel, leeWord, leeAttr, leeOut); }public static void demoLee(String parserModel, String leeWord, String leeAttr, String leeOut) { // This option shows loading, sentence-segmenting and tokenizing // a file using DocumentPreprocessor. LexicalizedParser lp = LexicalizedParser.loadModel(parserModel); TreebankLanguagePack tlp = lp.treebankLanguagePack(); // a PennTreebankLanguagePack for Englishtry { FileWriter fw=new FileWriter(leeOut); PrintWriter pw=new PrintWriter(fw); FileReader frWord = new FileReader(leeWord); BufferedReader brWord = new BufferedReader(frWord); FileReader frAttr =new FileReader(leeAttr); BufferedReader brAttr = new BufferedReader(frAttr); String line = ""; String[] words = null; String[] attrs = null; int counter = 1; while ((line=brWord.readLine())!=null) { if (line.length() > 0) { words=line.split(" "); attrs = brAttr.readLine().split(" "); //while (attrs.length == 0) //attrs = brAttr.readLine().split(" "); System.out.println(counter+"th parser is moving on --->>> "+words.length+" - "+attrs.length); List sentence = new ArrayList(); for (int i=0; i





		  	

    
    




    
    
    


推荐阅读

           
                  
              
                  福建农村医保报销比例和报销范围2023年新规 
                
                   
                
              
            

                  
              
                  我的处境 
                
                   
                
              
            

                  
              
                  去年2600万能成交，今年2000万难卖杭州南星桥普降，新政后是否会反弹 
                
                   
                
              
            

                  
              
                  夫妻肺片是什么肉做的 
                
                   
                
              
            

                  
              
                  艾草枕头的功效和危害 什么人不能用艾草枕头 
                
                   
                
              
            

                  
              
                  三国志战略版夏侯惇多少兵力开六 
                
                   
                
              
            

                  
              
                  孩子|别让【口呼吸】成为孩子颜值与智力的“杀手”！ 
                
                   
                
              
            

                  
              
                  借款合同到期了会自动失效吗 
                
                   
                
              
            

                  
              
                  暑假到了,幼儿园老师要求家长做绘本,画什么什么丑的家长该怎么办？ 
                
                   
                
              
            

                  
              
                  防止兔种退化的方法 养兔知识怎样预防 
                
                   
                
              
            

                  
              
                  学习应对焦虑 
                
                   
                
              
            

                  
              
                  《掌中之物同人》囚鸟（二十六） 
                
                   
                
              
            

                  
              
                  自定义OAM错误页面 
                
                   
                
              
            

                  
              
                  怎样清洗羽绒服的油渍，怎样清洗羽绒服上的油渍 
                
                   
                
              
            

                  
              
                  真的肉末鸡蛋饼怎么做的 真的肉末鸡蛋饼怎么做 
                
                   
                
              
            

                  
              
                  spss数据输出结果分析 
                
                   
                
              
            

                  
              
                  交警为什么不管改装牧马人 
                
                   
                
              
            

                  
              
                  redis缓存服务器作用 redis有哪些服务 
                
                   
                
              
            

                  
              
                  花甲死了能吃吗 
                
                   
                
              
            

                  
              
                  电影《八佰》再度定档,8月21日上映,大家会去看吗？ 
                
                   
                
              
            

          

stanford|stanford cs143 Compilers 7.4 LL(1) Parsing Tables 
 C|堆和栈 
 java|java环境变量配置 
 支付宝网页支付（签名） 
 最好的 6 个免费天气 API 接口对比测评