FP-Growth算法的Java实现+具体实现思路+代码 FP-Growth算法的Java实现+具体实现

FP-Growth算法原理其他大佬的讲解
FP-Growth算法详解
FP-Growth算法的Java实现这篇文章重点讲一下实现。如果看了上述给的讲解，可知，需要两次扫描来构建FP树
第一次扫描

第一次扫描，过滤掉所有不满足最小支持度的项；对于满足最小支持度的项，按照全局支持度降序排序。

按照这个需求，可能的难点为如何按照全局支持度对每个事务中的item排序。我的实现思路

扫描原数据集将其保存在二维列表sourceData中
维护一个Table，使其保存每个item的全局支持度TotalSup
在Table中过滤掉低于阈值minSup的项
将Table转换为List，并使其按照TotalSup降序排序
新建一个二维列表freqSource，其保留sourceData中的频繁项，并将每个事务按全局支持度降序排序

代码

/*** 扫描原数据集,生成事务集* @param path 数据集路径* @throws IOException*/private void scanDataSet(String path) throws IOException {if(path.equals("")){path = filePath; }FileReader fr = new FileReader(path); BufferedReader bufferedReader = new BufferedReader(fr); String str; //int maxLength = 0; while ( (str = bufferedReader.readLine())!=null){ArrayList transaction = new ArrayList<>(); String[] tempEntry ; tempEntry = str.split(" "); for(int i =0; i< tempEntry.length; i++){if(!tempEntry[i].equals("")){int itemValue = https://www.it610.com/article/Integer.parseInt(tempEntry[i]); transaction.add(itemValue); if(!similarSingleItemLinkedListHeadsTable.containsKey(itemValue)){similarSingleItemLinkedListHeadsTable.put(itemValue, new SimilarSingleItemLinkedListHead(itemValue,null,1)); }else{//将该项的全局支持度+1similarSingleItemLinkedListHeadsTable.get(itemValue).addSupTotal(); }}}//if(tempEntry.length>maxLength){//maxLength = tempEntry.length; //}sourceDataSet.add(transaction); }//System.out.println(maxLength); deleteNonFreqInSSILLHTAndSort(); deleteNonFreqInSDSAndSort(); bufferedReader.close(); fr.close(); }/*** 去除相似项表(similarSingleItemLinkedListHeadsTable)的非频繁项,并按全局支持度对similarSingleItemLinkedListHeads降序排序*/private void deleteNonFreqInSSILLHTAndSort() {Hashtable copyOfSSILLHT =(Hashtable) similarSingleItemLinkedListHeadsTable.clone(); Set keySet = copyOfSSILLHT.keySet(); //删除非频繁项for(int key: keySet){if(similarSingleItemLinkedListHeadsTable.get(key).getSupTotal()(similarSingleItemLinkedListHeadsTable.values()); similarSingleItemLinkedListHeadList.sort(new Comparator() {@Overridepublic int compare(SimilarSingleItemLinkedListHead o1, SimilarSingleItemLinkedListHead o2) {return o2.getSupTotal() - o1.getSupTotal(); }}); }/*** 去除事务集(sourceDataSet)的非频繁项,并且按全局支持度对每个事务的item进行降序排序* 其结果保存在freqSourceSortedDataSet*/private void deleteNonFreqInSDSAndSort(){freqSourceSortedDataSet = (ArrayList>) sourceDataSet.clone(); for(int i =0; ie == Integer.MIN_VALUE); insertSort(freqSourceSortedDataSet.get(i)); }freqSourceSortedDataSet.removeIf(e->e.size() == 0); }

第二次扫描

第二次扫描，构造FP树。参与扫描的是过滤后的数据，如果某个数据项是第一次遇到，则创建该节点，并在headTable中添加一个指向该节点的指针；否则按路径找到该项对应的节点，修改节点信息

这里比较简单，因为已经有过滤、排序好的数据freqSourceSortedDataSet。我们只需要

遍历freqSourceSortedDataSet的每一个事务trans，遍历trans中的每一个item构建FP树和相似项链表
如果某item第一次遇到，则需要创建该节点并在相似项链表中链接它。
链表不用多说。
这里的FP树的子节点是不定个数的，需要用特殊的数据结构。我这里使用了HashTable

/*** 构建FP树*/private void buildFPTree(){for(ArrayListtrans:freqSourceSortedDataSet){Node curTreeNode = fpTree.root; for(int item :trans){if(!curTreeNode.children.containsKey(item)){Node node = new Node(item,1); curTreeNode.children.put(item,node); node.father = curTreeNode; buildSimilarSingleItemLinkedList(item,curTreeNode); }else{curTreeNode.children.get(item).sup++; }curTreeNode=curTreeNode.children.get(item); }}}/*** 构建相似项链表*/private void buildSimilarSingleItemLinkedList(int item,Node curTreeNode){//找到该item在相似项链表中的位置int index = searchForItemInHeadsList(item,(ArrayList) similarSingleItemLinkedListHeadList); if(similarSingleItemLinkedListHeadList.get(index).next == null){similarSingleItemLinkedListHeadList.get(index).next = curTreeNode.children.get(item); }else{Node visitNode = similarSingleItemLinkedListHeadList.get(index).next; while (visitNode.nextSimilar!=null){visitNode = visitNode.nextSimilar; }if(visitNode != curTreeNode.children.get(item))visitNode.nextSimilar = curTreeNode.children.get(item); }}/*** 在HeadList中搜索某项的位置* @param item 项* @param similarSingleItemLinkedListHeads 头结点链表* @return 位置,-1表示未找到*/private int searchForItemInHeadsList(int item, ArrayList similarSingleItemLinkedListHeads) {for(int i =0; i
挖掘频繁项集
这一部分个人觉得是实现上最困难的部分。但是我在B站或其他地方一涉及到这个地方都讲得很快（B站也没两个视频讲这玩意儿，吐）。还有不同的概念，比如在黑皮书上讲的是前缀路径，在其他地方有条件模式基等概念。接下来的代码均按照前缀路径的说法来实现。
我们来捋一捋思路，挖掘频繁项集需要干什么。

首先需要从后向前遍历相似项链表的列表（这一列表已经在第一次扫描中按全局支持度排过序了）的每一项。

对每一项递归地进行如下步骤：

①记录前缀路径。我使用的方法是用一个HashSet记录前缀路径中出现的所有节点。
②记录该FP树的每一item的支持度。类似于前面的第一次扫描。
③根据记录的支持度，如果item频繁，则该item和当前的后缀为频繁项集。
④再根据record构建该FP树的相似项链表列表，去除掉非频繁项（类似第一次扫描）和当前item构成条件FP树。这里并不需要重新建立一个FP树的结构来构成条件FP树，因为记录前缀路径只需要访问相似项和父项。
⑤对相似项链表列表的剩余项再进行①步骤，直到相似项链表列表中没有项，为终止。

/*** 算法执行函数* @param minSupCnt 最小支持度计数* @param path 文件路径* @param pT 输出结果的项集大小阈值*/public void run(int minSupCnt,String path,int pT) throws IOException {this.printThreshold = pT; this.minSupCnt = minSupCnt; scanDataSet(path); buildFPTree(); for(int i = similarSingleItemLinkedListHeadList.size()-1; i>=0; i--){genFreqItemSet(similarSingleItemLinkedListHeadList.get(i).getItemValue(),fpTree,similarSingleItemLinkedListHeadList,new TreeSet<>()); }//genFreqItemSet(14,fpTree,similarSingleItemLinkedListHeadList,new TreeSet<>()); System.out.println("频繁项集个数:\t"+cntOfFreqSet); }/*** 生成频繁项集* @param last 最后项* @param fPTree 条件FP树* @param fatherSimilarSingleItemLinkedListHeads 父树的相似项头结点链表* @param freqItemSet 频繁项集*/private void genFreqItemSet(int last,FPTree fPTree,ListfatherSimilarSingleItemLinkedListHeads,TreeSetfreqItemSet) {FPTree conditionalFPTree = new FPTree(); ListsimilarSingleItemLinkedListHeads = new ArrayList<>(); TreeSetlocalFreqItemSet = (TreeSet) freqItemSet.clone(); int index ; index = searchForItemInHeadsList(last,(ArrayList) fatherSimilarSingleItemLinkedListHeads); Node firstNode = fatherSimilarSingleItemLinkedListHeads.get(index).next; HashSetrecord = new HashSet<>(); //用于记录前缀路径上出现的节点//记录前缀路径if(firstNode!=null){record.add(firstNode); Node nodeToVisitFather = firstNode; Node nodeToVisitSimilar = firstNode; while (nodeToVisitSimilar!=null){nodeToVisitSimilar.supInCFP = nodeToVisitSimilar.sup; nodeToVisitFather = nodeToVisitSimilar; while (nodeToVisitFather!=null){// 计算supInCFTif(nodeToVisitFather!=nodeToVisitSimilar)nodeToVisitFather.supInCFP += nodeToVisitSimilar.supInCFP; record.add(nodeToVisitFather); nodeToVisitFather = nodeToVisitFather.father; }nodeToVisitSimilar = nodeToVisitSimilar.nextSimilar; }//记录在子树中的支持度Hashtable supRecord = new Hashtable<>(); record.forEach(new Consumer() {@Overridepublic void accept(Node node) {int item = node.item; if(item == -1 ){//根节点return; }if(supRecord.containsKey(item)){supRecord.put(item,supRecord.get(item)+ node.supInCFP); }else{supRecord.put(item,node.supInCFP); }}}); //输出结果if(supRecord.get(last)>=minSupCnt){localFreqItemSet.add(last); if(localFreqItemSet.size()>=printThreshold && !result.contains(localFreqItemSet)){cntOfFreqSet++; //for(int i = localFreqItemSet.size()-1; i>=0; i--){//System.out.print(localFreqItemSet.get(i)+" "); //}localFreqItemSet.forEach(new Consumer() {@Overridepublic void accept(Integer integer) {System.out.print(integer+" "); }}); result.add(localFreqItemSet); System.out.println(""); }}//构建相似项链表record.forEach(new Consumer() {@Overridepublic void accept(Node node) {if(node.item == -1){//根节点Node visitNode = node; buildConditionalFPTree(conditionalFPTree.root, visitNode,record,(ArrayList) similarSingleItemLinkedListHeads,supRecord,last); }}}); //按支持度降序排序similarSingleItemLinkedListHeads.sort(new Comparator() {@Overridepublic int compare(SimilarSingleItemLinkedListHead o1, SimilarSingleItemLinkedListHead o2) {return o2.getSupTotal() - o1.getSupTotal(); }}); if(similarSingleItemLinkedListHeads.size()>=1){//递归搜索频繁项for(int i =similarSingleItemLinkedListHeads.size()-1; i>=0; i--){genFreqItemSet(similarSingleItemLinkedListHeads.get(i).getItemValue(),conditionalFPTree,similarSingleItemLinkedListHeads,localFreqItemSet); // similarSingleItemLinkedListHeads.remove(i); }}}}/*** 递归构建条件FP树* @param rootNode 以该节点为根向下建立条件FP树* @param originalNoderootNode对应在原树中的节点* @param record前缀路径* @param similarSingleItemLinkedListHeads相似项表头链表* @param supRecord 支持度计数的记录* @param last 最后项*/private void buildConditionalFPTree(Node rootNode,Node originalNode,HashSetrecord,ArrayListsimilarSingleItemLinkedListHeads,HashtablesupRecord,int last){if(originalNode.children!=null){for(int key:originalNode.children.keySet()){//遍历originalNode的所有儿子节点,检查其是否在前缀路径中Node tempNode = originalNode.children.get(key); if(record.contains(tempNode)){Node addedNode = new Node(tempNode.item, tempNode.supInCFP); if(last == key){//去除last的所有节点tempNode.supInCFP = 0; continue; }if(supRecord.get(key)>=minSupCnt){//addedNode 拷贝 tempNode除儿子节点外的属性addedNode.supInCFP = tempNode.supInCFP; rootNode.children.put(tempNode.item, addedNode); addedNode.father = rootNode; //构建相似项表int i = searchForItemInHeadsList(tempNode.item,similarSingleItemLinkedListHeads); if(i==-1){similarSingleItemLinkedListHeads.add(new SimilarSingleItemLinkedListHead(key,addedNode, addedNode.supInCFP)); }else{similarSingleItemLinkedListHeads.get(i).setSupTotal(similarSingleItemLinkedListHeads.get(i).getSupTotal()+addedNode.supInCFP); Node visitNode = similarSingleItemLinkedListHeads.get(i).next; while (visitNode.nextSimilar!=null){visitNode = visitNode.nextSimilar; }if(visitNode!=addedNode){visitNode.nextSimilar= addedNode; }}buildConditionalFPTree(addedNode,originalNode.children.get(key),record,similarSingleItemLinkedListHeads,supRecord,last); addedNode.supInCFP = 0; //将supInCFP重置为0; }else{buildConditionalFPTree(rootNode,originalNode.children.get(key),record,similarSingleItemLinkedListHeads,supRecord,last); }}}}}

总结 【FP-Growth算法的Java实现+具体实现思路+代码】这篇文章就到这里，希望能给你带来帮助，也希望你可以多多关注脚本之家的其他精彩内容！

推荐阅读

北京摇号社保补缴算连续吗还是累计北京摇号社保补缴算连续吗?

惠普m30a打印机芯片清零，让你的打印变得更高效

蓖麻毒素|神秘邮件，引发紧张，蓖麻毒素为何可怕？

什么对人生很有用的名人名言吗

捷达vs5车身颜色有哪些捷达vs5图片外观颜色

宝骏530汽车钥匙怎么换电池宝骏530钥匙怎么换电池

做完抽脂手术会有什么样的危害

全民领皮肤真的能领到皮肤吗，为什么那么多人发免费领皮肤的到底是不是真的哇

pdf无损转图片，pdf转图片无水印免费

《黑暗之魂3》锐利宝石获得方法解析攻略

QQ浏览器怎么设置主页？浏览器主页设置方法分享

乳胶凉席热吗

想做个一天能挣个3到5百的小生意,大家有没有什么好的推荐？

两个浮点数或double数的模数

马桶买什么牌子好希箭马桶好不好

大宇壁挂炉价格是多少呢？大宇壁挂炉显示e8

云顶之弈狂野换形龙法7人口成型技巧狂野换形龙法怎么玩

服装|大雪寒潮下的年轻人，谁还没条“秋裤”

虎年新年快乐的祝福语30句

10万左右外观最帅车 10万左右的轿车哪款好

热闹中的孤独

JAVA（抽象类与接口的区别&重载与重写&内存泄漏）

放屁有这三个特征的，请注意啦！这说明你的身体毒素太多

一个人的旅行，三亚

布丽吉特，人生绝对的赢家

慢慢的美丽

尽力

一个小故事，我的思考。

家乡的那条小河

《真与假的困惑》???|《真与假的困惑》??? ——致良知是一种伟大的力量