讲解（Data|讲解：Data Streams、Leskovec, Rajaraman、JAVA、 Python/C++ Stat）讲解：DataStreams、Leskovec|Rajar

【讲解（Data|讲解：Data Streams、Leskovec, Rajaraman、JAVA、 Python/C++ Stat）】Assignment 2: Similar Items, Data Streams,PageRankFormative, Weight (15%), Learning objectives (1, 2, 3),Abstraction (4), Design (4), Communication (4), Data (5), Programming (5)Due date: 11 : 59pm, 27 April, 20191 OverviewAssignments should be done in groups consisting of two students. If you haveproblems finding a group partner use the forum to search for group partners orcontact the lecturer.2 AssignmentExercise 1 S-curve (exercise 3.4.1 in Leskovec, Rajaraman and Ullman) (5+5+5points)Evaluate the S-curve 1for s = 0.1, 0.2, ....0.9, for the followingvalues of r and b; 1. r=3 and b=10.2. r=6 and b=20.3. r=5 and b=50.Exercise 2 Filtering Streams (similar to Exercises of 4.3 in Leskovec, Rajaramanand Ullman) (8 + 8 points)1. For the situation of our running example of Section 4.3.1 with changedconditions (10 billion bits, 2 billion members of the set S), calculate thefalse-positive rate when using three hash functions. Do the same for fourhash functions.1COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 20192. As a function of n, the number of bits and m the number of membersin the set S, what number of hash functions minimizes the false-positiverate?Exercise 3 PageRank (22+13 points)1. Implement the PageRank Algorithm as discussed in Section 5.1 and 5.2(Leskovec, Rajaraman and Ullman) in JAVA, Python or C++. Your implementationshould make use of the improvements regarding efficiencyand the methods of dealing with dead-ends and spider traps. There areseveral PageRank implementations available on the web. You have to doyour own implementation without using any code from other sources.2. Run your algorithm on the Google Web Graph 2002 available athttp://snap.stanford.edu/data/web-Google.htmland provide a file listing the PageRank for each node. Report separately,the ordered list of the ten nodes having the largest PageRankYour approach should be efficient as possible in terms of runtime and memoryrequirementsExercise 4 Data streams (7 + 7 points)Follow the scenario 1 and 2 below and answer the related questions regardingthe FlajoletMartin Algorithm. The hash functiData Streams作业代写、Leskovec, Rajaraman作业代做、JAVA实验作业代写、 Python/ons are of the form h(x) = ax+bmod 32 for some a and b. You should treat the result as a 5-bit binary integer1. Scenario 1: Suppose a data stream consists of the integers 3, 1, 4, 6, 5, 9.Determine (a) the maximum tail length for each stream element and (b)the resulting estimate of the number of distinct elements for the hashfunctions in Question 1-3 below.– Question 1: Hash function: h(x) = (2x + 1) mod 32– Question 2: Hash function: h(x) = (3x + 7) mod 32– Question 3: Hash function: h(x) = 4x mod 322. Scenario 2: Suppose a data stream consists of the integers 4, 5, 6, 7, 10, 15.2COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 2019Determine (a) the maximum tail length for each stream element and (b)the resulting estimate of the number of distinct elements for the hashfunctions in Question 4-6 below.– Question 4: Hash function: h(x) = (6x + 2) mod 32– Question 5: Hash function: h(x) = (2x + 5) mod 32– Question 6: Hash function: h(x) = 2x mod 32Exercise 5 Summary of 3.6 and 3.7 (10 +10 points) (Postgraduate Students(COMP SCI 7306) only)For this exercise you have to read Section 3.6 and 3.7 in Leskovec, Rajaraman,Ullman (second edition, 2014).1. Summarize the content of 3.6 in your own words (600 words).2. Summarize the content of 3.7 in your own words (600 words).3 Procedure for handing in the assignmentWork should be handed in using Canvas. The submission should include: pdf file of your solutions for theoretical assignments. The solutions shouldcontain of a detailed description of how to obtain the result. all source files, all the project files. pdf or txt file with descriptions of your implementations to understandyour code. files containing the results of your algorithms on the benchmark sets. pdf or txt file of your computation times of the algorithms on benchmarksets. for Exercise 4: input and output files, pdf or txt file with the calculationsused to obtain your answers. a README.txt file containing instructions to run the code, the names,student numbers, and email addresses of the group members. the names, student numbers, and email addresses of the group members3转自：http://www.7daixie.com/2019042722062483.html

讲解（Data|讲解：Data Streams、Leskovec, Rajaraman、JAVA、 Python/C++ Stat）

推荐阅读

正翔语一百贝51丨进入青春期的老妈

小米手机2015数据分析,小米官方旗舰店数据分析

多吃蔬菜是否可能造成胃肠不适？

产后减肥常吃一物甩脂效果好

关于爱的定义关于爱的定义

在哪查询北京市月最低工资标准

windg 分析

咸鱼电商卖什么卖的快一点咸鱼电商卖什么卖的快，咸鱼电商卖什么卖的快

2000支付宝红包怎么用？有2000的支付宝红包怎么用

韩国电影《奸臣》讲的是什么故事？

如何成为年入百万的网红主播，尤其在5G移动网络时代！

网络项目需求分析报告,图书管理系统项目需求分析报告

「小程序制作」微信小程序商城制作好后该如何运营?

科比单场81分的比赛数据

vue数据间prop的双向数据绑定

吉他单板和双板的区别

家用吧台设计怎么样家里设计吧台实用吗

11月25日科技资讯|网易回应裁撤生病员工（确实存在简单粗暴不近人情行为）

强者之路最快强者之路怎么加速

三星s8的usb调节在哪三星s8的usb调试在哪