NLTK入门-常用函数

1.text.concordance(word)
这个函数就是用来搜索单词word在text 中出现多的情况,包括出现的那一行,重点强调上下文,实例如下:

>>> text1.concordance("monstrous") Displaying 11 of 11 matches: ong the former , one was of a most monstrous size . ... This came towards us , ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r ll over with a heathenish array of monstrous clubs and spears . Some were thick d as you gazed , and wondered what monstrous cannibal and savage could ever hav that has survived the flood ; most monstrous and most mountainous ! That Himmal they might scout at Moby Dick as a monstrous fable , or still worse and more de th of Radney .'" CHAPTER 55 Of the monstrous Pictures of Whales . I shall ere l ing Scenes . In connexion with the monstrous pictures of whales , I am strongly ere to enter upon those still more monstrous stories of them which are to be fo ght have been rummaged out of this monstrous cabinet there is no telling . But of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u >>>

**2.text.similar(word)** 这个函数的作用则是根据word 的上下文的单词的情况,来查找具有相似的上下文的单词.similar() 函数会在文本中 搜索具有类似结构的其他单词, 不过貌似这个函数只会考虑一些简单的指标,来作为相似度,比如上下文的词性,更多的完整匹配, 不会涉及到语义.可以看看下面的例子:```python >>> text1.similar("monstrous") mean part maddens doleful gamesome subtly uncommon careful untoward exasperate loving passing mouldy christian few true mystifying imperial modifies contemptible >>> text2.similar("monstrous") very heartily so exceedingly remarkably as vast a great amazingly extremely good sweet >>>

这个可以看出的是, text1 和text2 对同一个单词monstrous 的不同使用风格.
3.text.common_contexts([word1,word2…])
这个函数跟simailar() 有点类似,也是在根据上下文搜索的.
不同的是,这个函数是用来搜索 共用 参数中的列表中的所有单词,的上下文.即: word1,word2 相同的上下文.看例子:
>>> text2.common_contexts(["monstrous", "very"]) a_pretty is_pretty am_glad be_glad a_lucky >>>

4.text.dispersion_plot([word1, word2,])
这个函数是用离散图 表示 语料中word 出现的位置序列表示.
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

NLTK入门-常用函数
文章图片
其中横坐标表示文本的单词位置.纵坐标表示查询的单词, 坐标里面的就是,单词出现的位置.就是 单词的分布情况。
【NLTK入门-常用函数】5.text.generate()
以上述不同风格产生随机文本。虽然文本是随机的,但重复使用了源文本中常见的单词和短语,从而能使我们感受到它的风格和内容。
>>> text3.generate() In the beginning of his brother is a hairy man , whose top may reach unto heaven ; and ye shall sow the land of Egypt there was no bread in all that he was taken out of the month , upon the earth . So shall thy wages be ? And they made their father ; and Isaac was old , and kissed him : and Laban with his cattle in the midst of the hands of Esau thy first born , and Phichol the chief butler unto his son Isaac , she

    推荐阅读