详细介绍java中hashmap java

hello，大家好，这里是可傥。好久不见，之前一段时间因为在忙一些事情，然后好久没有更新了，今天继续更新。闲话不多说，今天，我们讲一下java中hashmap的底层原理和实现。

概述 HashMap是基于哈希表实现的,每一个元素是一个键值对允许使用null值和null键,属于非线性安全的无序集合,
多线程环境可以采用并发包下的concurrentHashMap。
HashMap 实现了Serializable接口，因此它支持序列化，实现了Cloneable接口，所以可以被克隆。
数据结构相信大家都听过这么一句话，叫做，jdk1.7之前，hashmap底层是数组+链表的形式组成，jdk1.8之后，hashmap底层是数组+链表+红黑树组成。
但是如果看过java底层，你会发现，官方说明了，其实转化为红黑树的概率，也只是不到一千万分之一。下面为官方说明：

/* factorial(k)). The first values are: * * 0:0.60653066 * 1:0.30326533 * 2:0.07581633 * 3:0.01263606 * 4:0.00157952 * 5:0.00015795 * 6:0.00001316 * 7:0.00000094 * 8:0.00000006 * more: less than 1 in ten million 超过8的概率不到1千万分之1 * */

【详细介绍java中hashmap】超过8才会转化为红黑树(且数组大于64)。那么接下来，我将结合画图以及源码的·方式将数据结构展开：
如下图：

文章图片

图中，jdk1.8之后的数据就是很清晰的表达了底层数据结构。接下来，把hashmap的参数展示出来就可以更直观的说明问题，可傥把源码和注释全部放出来了，有一定阅读能力的可以看英文，或者就直接看可傥的汉字注释，如下代码：

/** * The default initial capacity - MUST be a power of two. * 默认容量，默认为16，必须为2的倍数 */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16/** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. * 最大容量，为2的30次方 */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. * 装载因子//后续会提到为什么是0.75 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The bin count threshold for using a tree rather than list for a * bin.Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. * 链表转化为红黑树树的阈值为8 */ static final int TREEIFY_THRESHOLD = 8; /** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. * 红黑树重新转化为链表的阈值为6 */ static final int UNTREEIFY_THRESHOLD = 6; /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. * 转红黑树时, table的最小长度 */ static final int MIN_TREEIFY_CAPACITY = 64; /** * Basic hash bin node, used for most entries.(See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) * 链表结点，用来存放键值对，继承自Entey */ static class Node implements Map.Entry { final int hash; final K key; V value; Node next; Node(int hash, K key, V value, Node next) { this.hash = hash; this.key = key; this.value = https://www.it610.com/article/value; this.next = next; } //... 省略部分代码 }/** * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn * extends Node) so can be used as extension of either regular or * linked node. * 红黑树结点，用来存放红黑树的键值对 */static final class TreeNode extends LinkedHashMap.Entry { TreeNode parent; // red-black tree links TreeNode left; TreeNode right; TreeNode prev; // needed to unlink next upon deletion boolean red; TreeNode(int hash, K key, V val, Node next) { super(hash, key, val, next); }/** * Returns root of tree containing this node. */ final TreeNode root() { for (TreeNode r = this, p; ; ) { if ((p = r.parent) == null) return r; r = p; } }/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) * 存储数据的数组 */ transient Node[] table; /** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). * 遍历的映射 */ transient Set> entrySet; /** * The number of key-value mappings contained in this map. * 键值对的数量 */ transient int size; /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash).This field is used to make iterators on Collection-views of * the HashMap fail-fast.(See ConcurrentModificationException). * 结构性变更的次数 * 记录的是映射数量的修改 */ transient int modCount; /** * The next size value at which to resize (capacity * load factor). * * @serial */ // (The javadoc description is true upon serialization. // Additionally, if the table array has not been allocated, this // field holds the initial array capacity, or zero signifying // DEFAULT_INITIAL_CAPACITY.) //下次扩容操作的大小 int threshold; /** * The load factor for the hash table. * * @serial * 扩容阈值为 size * DEFAULT_LOAD_FACTOR */ final float loadFactor;

结合画图和源代码，参数具体含义已经在注释上说明，接下来，将会具体讲解数据结构。
hashmap在一开始并没有初始化，而是在put(K,V)第一个元素的时候，初始化一个大小为16的数组，然后根据哈希计算存放存入的值在初始化数组的位置，当存入的元素个数超过扩容阈值的时候，数据会进行扩容，之前的数组个数 *2，扩容阈值和size都会变更。当数组大于64，且某个数组下的链表个数超过8的时候，该数组下的数据将会转化为红黑树。而扩容后若该数组下的链表个数小于6的时候，又会重新从红黑树转为链表。这就是hashmap的数据结构。
HashMap元素存储问题那么，元素在存放的时候，是如何确定存放在哪个数组上的呢？咱们结合源码进行分析。

/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower.Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.)So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. * */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

文章图片

从源码中可得知，通过hashCode()的高16位异或低16位实现获取hash值：(h = k.hashCode()) ^ (h >>> 16)，然后通过与数组容量(n-1)进行与操作获取数组下标。如下图：

文章图片

存放的时候，put中调用了putVal()方法，结合源码进行分析：

文章图片

HashMap的put方法执行过程可以通过下图来理解

文章图片

扩容机制扩容还是结合源代码来讲会更加清晰。

文章图片

初始化为16的数组，然后在元素填充到装载因子 数组容量的时候，就会触发扩容。第一次扩容为 16 0.75 =12 ，每次扩容都为2的倍数，结合hash计算数组位置就可以知道，为啥需要2的倍数。而扩容之后，与操作向左在进行一位计算，所以每次扩容，原数组上的成员都是要么还在原数组，要么就在原数组的下标加上n的位置，第一次为加16的位置，第二次扩容为加32的位置，以此类推。而源码中也可以看到，不需要重新去计算hash值，直接拿原来的hash值就可以确定接下来的位置。
装载因子默认为什么为0.75 加载因子过高，虽然减少了空间开销，提高了空间利用率，但同时也增加了查询时间成本；
加载因子过低，虽然可以减少查询时间成本，但是空间利用率很低，同时提高了rehash操作的次数。
默认0.75是提高空间利用率和减少查询成本的折中，主要是泊松分布，0.75的话碰撞最小。
hashmap先讲到这里，后续可能会将内部的方法都讲一遍，这里是可傥，会分享自己的所学和所得，大家晚
安csdn地址为：https://blog.csdn.net/kaneand...。