Java8采用stream、parallelStream迭代的区别

我们都知道在Java 8 API添加了一个新的抽象称为流Stream,可以让你以一种声明的方式处理数据。Stream 使用一种类似用 SQL 语句从数据库查询数据的直观方式来提供一种对 Java 集合运算和表达的高阶抽象。Stream API可以极大提高Java程序员的生产力,让程序员写出高效率、干净、简洁的代码。这种风格将要处理的元素集合看作一种流, 流在管道中传输, 并且可以在管道的节点上进行处理, 比如筛选, 排序,聚合等。元素流在管道中经过中间操作(intermediate operation)的处理,最后由最终操作(terminal operation)得到前面处理的结果。
通过查看API能够看到Java8 API为我们提供了Stream和parallelStream两个不同的方法,那么同样是流处理,这两个方法又有什么区别呢?首先我们来看看以下的代码:

public static void main(String[] args) { List numberList = Arrays.asList(1,2,3,4,5,6,7,8,9); System.out.println("运行结果:"); // stream method numberList.stream().forEach(number -> { System.out.print(String.format("%d ",number)); }); System.out.println("\r"); // parallelStream method numberList.parallelStream().forEach(number -> { System.out.print(String.format("%d ",number)); }); System.out.println("\r"); // parallelStream method numberList.parallelStream().forEachOrdered(number -> { System.out.print(String.format("%d ",number)); }); System.out.println("\r"); }

通过多次运行上述代码,我们可以发现,通过parallelStream方法迭代集合,每次输出的结果都不一样,而通过steam方法或parallelStream方法并以forEachOrdered方式,每次执行输出的结果都是一样的,并且顺序符合集合元素的存放顺序。
那么,为什么会造成这样的结果差异呢,难道parallelStram是采用多线程并行的方式运行?于是,我们进一步修改下我们的代码来验证一下猜测。
public static void main(String[] args) { System.out.println("运行结果:"); List numberList = Arrays.asList(1,2,3,4,5,6,7,8,9); // stream method numberList.stream().forEach(number -> { System.out.println(String.format("Stream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number)); }); System.out.println("\r"); // parallelStream method numberList.parallelStream().forEach(number -> { System.out.println(String.format("ParallelStream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number)); }); System.out.println("\r"); // parallelStream method numberList.parallelStream().forEachOrdered(number -> { System.out.println(String.format("ParallelStream forEach Ordered The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number)); }); System.out.println("\r"); }

【Java8采用stream、parallelStream迭代的区别】修改后代码运行结果如下:
运行结果: Stream The Current Thread's ID is 1 and output number 1 Stream The Current Thread's ID is 1 and output number 2 Stream The Current Thread's ID is 1 and output number 3 Stream The Current Thread's ID is 1 and output number 4 Stream The Current Thread's ID is 1 and output number 5 Stream The Current Thread's ID is 1 and output number 6 Stream The Current Thread's ID is 1 and output number 7 Stream The Current Thread's ID is 1 and output number 8 Stream The Current Thread's ID is 1 and output number 9 ParallelStream The Current Thread's ID is 1 and output number 6 ParallelStream The Current Thread's ID is 19 and output number 9 ParallelStream The Current Thread's ID is 18 and output number 1 ParallelStream The Current Thread's ID is 15 and output number 2 ParallelStream The Current Thread's ID is 17 and output number 4 ParallelStream The Current Thread's ID is 14 and output number 8 ParallelStream The Current Thread's ID is 13 and output number 3 ParallelStream The Current Thread's ID is 16 and output number 7 ParallelStream The Current Thread's ID is 1 and output number 5 ParallelStream forEach Ordered The Current Thread's ID is 15 and output number 1 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 2 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 3 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 4 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 5 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 6 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 7 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 8 ParallelStream forEach Ordered The Current Thread's ID is 14 and output number 9 Disconnected from the target VM, address: '127.0.0.1:52976', transport: 'socket'Process finished with exit code 0

通过上面的运行结果,我们可以看到通过ParallelStream方法迭代的方法,是采用多线程的,可以看过每次输出都是不同的线程ID,而ParallelStream(). forEach Ordered是在多线程的基础上,保证了数据的顺序输出。到此,我们验证了我们的猜测ParallelStream方法是多线程的,而关于线程是否并行的验证,我们需进一步修改下我们的代码,于是有了下面的代码:
public static void main(String[] args) throws InterruptedException { System.out.println("运行结果:"); List numberList = Arrays.asList(1,2,3,4,5,6,7,8,9); //for Long forBegin = System.currentTimeMillis(); for(Integer number : numberList){ //System.out.println(String.format("For The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number)); Thread.sleep(1000); } System.out.println(String.format("For execute time cost %d ms",System.currentTimeMillis()-forBegin)); System.out.println("\r"); // stream method Long streamBegin = System.currentTimeMillis(); numberList.stream().forEach(number -> { //System.out.println(String.format("Stream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number)); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } }); System.out.println(String.format("Stream execute time cost %d ms",System.currentTimeMillis()-streamBegin)); System.out.println("\r"); // parallelStream method Long parallelStreamBegin = System.currentTimeMillis(); numberList.parallelStream().forEach(number -> { //System.out.println(String.format("ParallelStream The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number)); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } }); System.out.println(String.format("ParallelStream execute time cost %d ms",System.currentTimeMillis()-parallelStreamBegin)); System.out.println("\r"); // parallelStream method Long parallelStreamForEachOrderBegin = System.currentTimeMillis(); numberList.parallelStream().forEachOrdered(number -> { //System.out.println(String.format("ParallelStream forEachOrdered The Current Thread's ID is %d and output number %d ",Thread.currentThread().getId(),number)); try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } }); System.out.println(String.format("ParallelStream forEachOrdered execute time cost %d ms",System.currentTimeMillis()-parallelStreamForEachOrderBegin)); System.out.println("\r"); }

这里我们加入了传统的for循环迭代方式,加入一起比较,由于要体现多线程并行的优势,这里我们将每次循环里加入线程休眠1秒钟,运行后的结果如下:
运行结果: For execute time cost 9032 msStream execute time cost 9079 msParallelStream execute time cost 2011 msParallelStream forEachOrdered execute time cost 9037 ms

通过运行结果,我们可以看到parallelStream().forEach方式耗时最短,而另外其他3种方式运行的耗时都几乎接近。因此,我们可以断定我们的猜测是正确的,parallelStream().forEach是通过多线程并行的方式来执行我们的代码,而parallelStream(). forEachOrdered也是采用多线程,但由于加入了顺序执行约束,故程序是采用多线程同步的方式运行的,最终耗时与for、stream两种单线程执行的耗时接近,但parallelStream(). forEachOrdered由于是多线程,与for、stream两种单线程的方式相比,优势在于很好的利用了CPU多核的资源。感兴趣的同学可以通过以下代码查看CPU的核数,并通过jstack dump出堆栈来查看线程对CPU使用的情况。
System.out.println("系统一共有"+Runtime.getRuntime().availableProcessors()+"个cpu");

    推荐阅读