更新丢失（当HashMap遇见CompletableFuture） java

HashMap是面试题中老生常谈的话题，自己面试时也曾经侃侃而谈。万万没想到，老司机在开发过程中也载到这上了。
具体场景是这样的。有一个列表查询的数据，某个字段的状态需要从其他接口中查询获取，为了优化查询速度，用了CompletableFuture.runAsync()的方式并行调用接口查询，查询结果放到一个HashMap中供后面使用。大致简化后的代码如下：

private static void task() { Map contractsMap = Maps.newHashMap(); List futures = Lists.newArrayList(); Lists.newArrayList(123L, 456L).forEach(uid -> { futures.add(CompletableFuture.runAsync(() -> { Boolean effect = true; contractsMap.put(uid, effect); })); }); CompletableFuture.allOf(futures.toArray(new CompletableFuture[]{})).join(); System.out.println(contractsMap); }

【更新丢失（当HashMap遇见CompletableFuture）】这个查询的列表多刷新几次，就会发现每次的状态值都不同。在查询之后打断点后发现，contractsMap有时候size都不一样，有些数据莫名其妙的丢失了。
上面这个简化后的task方法，多运行几次后很容易就能复现出这个现象。

public static void main(String[] args) { for (int i = 0; i < 100; i++) { task(); } }

输出：

{456=false, 123=true} {456=false, 123=true} {456=false, 123=true} {456=false, 123=true} {123=true} {456=false, 123=true} {456=false, 123=true} {123=true} {456=false, 123=true} {456=false, 123=true} {456=false, 123=true}

确认了逻辑没问题后，猛然想起这个HashMap用到了多线程环境中，换成了ConcurrentHashMap之后果然不再出现。
在网上搜索了一番后，很多文章都提到了JDK 8中HashMap可能会出现的数据丢失问题，但基本都在说哈希冲突时的情况。但上面的例子，其实就两个key，哈希值还绝不相同，那么就不可能是哈希冲突时出现的数据丢失。
来看一下HashMap中的putVal方法：

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node[] tab; Node p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key V oldValue = https://www.it610.com/article/e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size> threshold) resize(); afterNodeInsertion(evict); return null; }

总共三个分支，第一个是初次使用HashMap时发现tab为空，于是调用resize来构造tab；第二个分支是没有哈希冲突时，构造一个节点并直接放到tab的槽位上；第三个复杂的分支是用来处理哈希冲突的情况。
对于上面的例子而言，并不存在哈希冲突的情况，因此发生数据丢失只可能在第一和第二个分支的处理过程。
个人猜测的可能是

第一个线程进到putVal方法时，发现tab为空，于是准备resize方法；
但在构造tab前，切换到第二个线程执行，同样发现tab为空，于是进入到resize方法并构造了tab；然后进入第二个分支将键值对放入tab。
切换回第一个线程，构造了一个tab并赋值，于是将第二步中的tab覆盖；然后进入第二个分支将键值对放入tab。

为了验证猜想，需要明确resize方法和newNode方法的执行相对顺序。但并发条件又不好打断点，于是选择了采用Java Agent的方式，来修改resize和newNode方法，在方法进入前和进入后打印日志。

public class JvmAgentDemo {public static void premain(String args, Instrumentation instrumentation){ instrumentation.addTransformer(new TestTransformer(), true); try{ instrumentation.retransformClasses(Test.class); instrumentation.retransformClasses(HashMap.class); System.out.println("agent load done"); }catch (Exception e){ System.out.println("agent load failed"); } } }public class TestTransformer implements ClassFileTransformer {@Override public byte[] transform(ClassLoader loader, String className, Class classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException {System.out.println("Transforming " + className); if ("java/util/HashMap".equals(className)) { try { ClassPool cp = ClassPool.getDefault(); CtClass cc = cp.get("java.util.HashMap"); CtMethod m = cc.getDeclaredMethod("resize"); m.insertBefore("{ System.out.println(Thread.currentThread().getName() + \": resize start\"); }"); m.insertAfter("{ System.out.println(Thread.currentThread().getName() + \": resize end\"); ; }"); CtMethod m2 = cc.getDeclaredMethod("newNode"); m2.insertBefore("{ System.out.println(Thread.currentThread().getName() + \": newNode start\"); }"); m2.insertAfter("{ System.out.println(Thread.currentThread().getName() + \": newNode end\"); }"); return cc.toBytecode(); } catch (Exception e) { e.printStackTrace(); } } return null; } }

将上述Agent打成jar，运行前面task用例时加载这个agent，观察控制台的输出：

ForkJoinPool.commonPool-worker-3: resize start ForkJoinPool.commonPool-worker-3: resize end ForkJoinPool.commonPool-worker-3: newNode start ForkJoinPool.commonPool-worker-2: newNode start ForkJoinPool.commonPool-worker-3: newNode end ForkJoinPool.commonPool-worker-2: newNode end {456=false, 123=true} ForkJoinPool.commonPool-worker-3: resize start ForkJoinPool.commonPool-worker-3: resize end ForkJoinPool.commonPool-worker-3: newNode start ForkJoinPool.commonPool-worker-3: newNode end ForkJoinPool.commonPool-worker-2: resize start ForkJoinPool.commonPool-worker-2: resize end ForkJoinPool.commonPool-worker-2: newNode start ForkJoinPool.commonPool-worker-2: newNode end {456=false, 123=true} ForkJoinPool.commonPool-worker-2: resize start ForkJoinPool.commonPool-worker-3: resize start ForkJoinPool.commonPool-worker-3: resize end ForkJoinPool.commonPool-worker-3: newNode start ForkJoinPool.commonPool-worker-3: newNode end ForkJoinPool.commonPool-worker-2: resize end ForkJoinPool.commonPool-worker-2: newNode start ForkJoinPool.commonPool-worker-2: newNode end {123=true}

截取了三次运行时的日志，第一次和第二次都是正常情况，resize和newNode的执行顺序没什么问题，无非是第二次多了一次resize的，但resize内部在创建新的tab时会将老的tab内容复制过来，因此最终结果也是正常的。
但第三次的执行序列和前面的猜想是一致的，worker-2线程进入resize方法后、返回前，worker-3线程也进入到了resize方法中，从resize返回后进入newNode方法创建了一个节点；之后worker-2线程继续执行，覆盖了worker-3线程写入的数据。
还有一些执行序列也会导致数据丢失，但我没太想明白可能的执行顺序，比如：

ForkJoinPool.commonPool-worker-1: resize start ForkJoinPool.commonPool-worker-2: resize start ForkJoinPool.commonPool-worker-2: resize end ForkJoinPool.commonPool-worker-1: resize end ForkJoinPool.commonPool-worker-1: newNode start ForkJoinPool.commonPool-worker-1: newNode end ForkJoinPool.commonPool-worker-2: newNode start ForkJoinPool.commonPool-worker-2: newNode end {456=false}

以上便是整个分析的过程。虽然大家都知道HashMap是非线程安全的，但开发过程中还是很容易就在多线程环境下使用上了。面经看了很多，但和最终的应用还是有点距离。