java代码词频 java词频统计

java计算一篇英文文档词频 并按出现次数从高到低输出(以下基础上补充)谢谢!String result = sb.toString();
String[] Str = result.split("[^A-Za-z0-9]"); //quanbu
for(String string:Str){
singleSet.add(string);
if("".equals(string)){//这里是我加的,去除空格次数的处理
singleSet.remove("");
}
}
MapString, Integer map=new HashMapString, Integer();
for (String childString : singleSet){
int count=0;
for(String fatherString : Str){
if(fatherString.equals(childString)){
count++;
}
}
map.put(childString, count);//存储在hashmap中
}
ArrayListEntryString,Integer l = new ArrayListEntryString,Integer(map.entrySet());
Collections.sort(l, new ComparatorObject(){
public int compare(Object e1, Object e2){
int v1 = Integer.parseInt(((EntryString,Integer)e1).getValue().toString());
int v2 = Integer.parseInt(((Entry)e2).getValue().toString());
return v2-v1;//改为v1-v2就是从小到大了
}
});
for (EntryString, Integer e: l){
System.out.println(e.getKey()+""+e.getValue());
}
代码仅供参考!希望对你有用
java词频统计在Java里面一个File既可以代表一个文件也可以代表一个目录(就是java代码词频你所说的文件夹). 因此你可以直接把一个文件夹的path传进去new File(path), 然后再用list()就可以获得该文件夹下的所有文件数组, 再一个个的输入File流就行java代码词频了, 可以这样写:
public void directory() {
File dir = new File("E:\temp");
File[] files = dir.listFiles();
}
用JAVA语言设计一个类,统计一篇英文文章的词频,并按照词频由高到低输出 。修改下面代码就行了 。这题目如果能增加一个类的话会高效很多 。。。如果非要在这个框框里面,代码麻烦 效率低下呢 。
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.Iterator;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
public class Article {
//保存文章的内容
String content;
//保存分割后的单词集合
String[] rawWords;
//保存统计后的单词集合
String[] words;
//保存单词对应的词频
int[] wordFreqs;
//构造函数,输入文章内容
//提高部分:从文件中读取
public Article() {
content = "kolya is one of the richest films i've seen in some time . zdenek sverak plays a confirmed old bachelor ( who's likely to remain so ) , who finds his life as a czech cellist increasingly impacted by the five-year old boy that he's taking care of . though it ends rather abruptly-- and i'm whining , 'cause i wanted to spend more time with these characters-- the acting , writing , and production values are as high as , if not higher than , comparable american dramas . this father-and-son delight-- sverak also wrote the script , while his son , jan , directed-- won a golden globe for best foreign language film and , a couple days after i saw it , walked away an oscar . in czech and russian , with english subtitles . ";
}
//对文章根据分隔符进行分词,将结果保存到rawWords数组中
public void splitWord(){
//分词的时候 , 因为标点符号不参与 , 所以所有的符号全部替换为空格
final char SPACE = ' ';
content = content.replace('\'', SPACE).replace(',', SPACE).replace('.', SPACE);
content = content.replace('(', SPACE).replace(')', SPACE).replace('-', SPACE);
rawWords = content.split("\\s+");//凡是空格隔开的都算单词,上面替换了', 所以I've 被分成2个 //单词
}
//统计词,遍历数组
public void countWordFreq() {
//将所有出现的字符串放入唯一的set中,不用map,是因为map寻找效率太低了

推荐阅读