Java 性能数据采集利器 _数据

炒沙作縻终不饱，缕冰文章费工巧。这篇文章主要讲述Java 性能数据采集利器相关的知识，希望能为你提供帮助。
运行良好的应用离不开对性能指标的收集。这些性能指标可以有效地对生产系统的各方面行为进行监控，帮助运维人员掌握系统运行状态和查找问题原因。性能指标监控通常由两个部分组成：第一个部分是性能指标数据的收集，需要在应用程序代码中添加相应的代码来完成；另一个部分是后台监控系统，负责对数据进行聚合计算和提供 API 接口。在应用中使用计数器、计量仪和计时器来记录关键的性能指标。在专用的监控系统中对性能指标进行汇总，并生成相应的图表来进行可视化分析。
目前已经有非常多的监控系统，常用的如 Prometheus、New Relic、Influx、Graphite 、Elastic、StatsD和 Datadog等等，每个系统都有自己独特的数据收集方式。这些监控系统有的是需要自主安装的软件，有的则是云服务。它们的后台实现千差万别，数据接口也是各有不同。在指标数据收集方面，大多数时候都是使用与后台监控系统对应的客户端程序。此外，这些监控系统一般都会提供不同语言和平台使用的第三方库，这不可避免的会带来供应商锁定的问题。一旦针对某监控系统的数据收集代码添加到应用程序中，当需要切换监控系统时，也要对应用程序进行大量的修改。Micrometer 的出现恰好解决了这个问题，其作用可以类比于 SLF4J 在 java 日志记录中的作用。正如 SLF4J 在 Java 日志记录中的作用一样，Micrometer 为 Java 平台上的性能指标数据收集提供了一个通用的可依赖的 API，避免了可能的供应商锁定问题。利用 Micrometer 提供的多种计量器，可以收集多种类型的性能指标数据，并通过计量器注册表发送到不同的监控系统。
一、Micrometer 简介
Micrometer 为 Java 平台上的性能数据收集提供了一个通用的 API，应用程序只需要使用 Micrometer 的通用 API 来收集性能指标即可。Micrometer 会负责完成与不同监控系统的适配工作。这就使得切换监控系统变得很容易。Micrometer 还支持推送数据到多个不同的监控系统。
在 Java 应用中使用 Micrometer 非常的简单。只需要在 Maven 或 Gradle 项目中添加相应的依赖即可。

< dependency>
< groupId> io.micrometer< /groupId>
< artifactId> micrometer-core< /artifactId>
< version> 1.3.1< /version>
< /dependency>

Micrometer 中有两个最核心的概念，分别是计量器（Meter）和计量器注册表（MeterRegistry）。计量器表示的是需要收集的性能指标数据，而计量器注册表负责创建和维护计量器。
二、核心概念- MeterRegistry
Micrometer 支持多个不同的监控系统，各个监控系统均有各自的MeterRegistry实现，如InfluxMeterRegistry、PrometheusMeterRegistry，每个支持的监控系统都必须实现MeterRegistry。模块 micrometer-core 中提供的类 SimpleMeterRegistry 是一个基于内存的计量器注册表实现，并且不支持导出数据到监控系统，主要用来进行本地开发和测试。通过计量器注册表实现类 CompositeMeterRegistry 可以把多个计量器注册表组合起来，从而允许同时发布数据到多个监控系统。对于由这个类创建的计量器，它们所产生的数据会对 CompositeMeterRegistry 中包含的所有计量器注册表都产生影响。如下我创建了一个 CompositeMeterRegistry 对象，并在其中添加了两个 SimpleMeterRegistry 对象。一个 SimpleMeterRegistry 对象在创建时通过实现 SimpleConfig 接口提供了不同的名称前缀。

CompositeMeterRegistry registry = new CompositeMeterRegistry();
registry.add(new SimpleMeterRegistry());
registry.add(new SimpleMeterRegistry(new MyConfig(), Clock.SYSTEM));
Counter counter = registry.counter("simple");
counter.increment();

private static class MyConfig implements SimpleConfig
public String get(final String key)
return null;

public String prefix()
return "my";

Micrometer 本身提供了一个静态的全局计量器注册表对象 Metrics.globalRegistry。该注册表是一个组合注册表。使用 Metrics 类中的静态方法创建的计量器，都会被添加到该全局注册表中。对于大多数应用来说，这个全局注册表对象就可以满足需求，不需要额外创建新的注册表对象。不过由于该对象是静态的，在某些场合，尤其是进行单元测试时，会产生一些问题。

public static void main(String[] args)
Metrics.addRegistry(new SimpleMeterRegistry());
Counter counter = Metrics.counter("simple");
counter.increment();

三、核心概念-Meter
计量器用来收集不同类型的性能指标信息。Micrometer 提供了不同类型的计量器实现。计量器对象由计量器注册表创建并管理。
计量器名称和标签，每个计量器都有自己的名称。由于不同的监控系统有自己独有的推荐命名规则，Micrometer 使用句点 . 分隔计量器名称中的不同部分，如 a.b.c 。Micrometer 会负责完成所需的转换，以满足不同监控系统的需求。每个Micrometer的实现都要负责将Micrometer这种以.分隔的小写字符命名转换成对应监控系统推荐的命名。你可以提供一个自己的NamingConvention来覆盖默认的命名转换：

registry.config().namingConvention(myCustomNamingConvention);

有了命名约定以后，下面这个timer在不同的监控系统中看起来就是这样的：

registry.timer("http.server.requests");

在Prometheus中，它是http_server_requests_duration_seconds
在Atlas中，它对应的是httpServerRequests
在InfluxDB中，对应的是http_server_requests
每个计量器在创建时都可以指定一系列标签。标签以名值对的形式出现。监控系统使用标签对数据进行过滤。除了每个计量器独有的标签之外，每个计量器注册表还可以添加通用标签。所有该注册表导出的数据都会带上这些通用标签。使用 MeterRegistry 的 config() 方法可以得到该注册表对象的 MeterRegistry.Config 对象，再使用 commonTags() 方法来设置通用标签。多个标签按照名称和值依次排列的方式来指定。在创建计量器时，在提供了名称之后，以同样的方式指定该计量器的标签。

SimpleMeterRegistry registry = new SimpleMeterRegistry();
registry.config().commonTags("tag1", "a", "tag2", "b");
Counter counter = registry.counter("simple", "tag3", "c");
counter.increment();

3.1 Counter 计数器（Counter）表示的是单个的只允许增加的值。通过 MeterRegistry 的 counter() 方法来创建表示计数器的 Counter 对象。还可以使用 Counter.builder() 方法来创建 Counter 对象的构建器(注册到全局变量MeterRegistry)。Counter 所表示的计数值是 double 类型，其 increment() 方法可以指定增加的值。默认情况下增加的值是 1.0。如果已经有一个方法返回计数值，可以直接从该方法中创建类型为 FunctionCounter 的计数器，FunctionCounter使用的一个明显的好处是，我们不需要感知FunctionCounter实例的存在，实际上我们只需要操作作为FunctionCounter实例构建元素实例即可，这种接口的设计方式在很多框架里面都可以看到。

Counter counter1 = registry.counter("counter1");
counter1.increment(2.0);
Counter counter2 = Counter.builder("counter2")
.description("A simple counter")
.tag("tag1", "a")
.register(registry);
counter2.increment();

registry.more().counter("function1", Collections.emptyList(), new Random().nextDouble());
registry.more().counter("function2", Collections.emptyList(), this, MainController::getValue);

FunctionCounter function3 = FunctionCounter.builder("function3", this, MainController::getValue)
.description("description")
.tags(Collections.emptyList())
.register(registry);

【Java 性能数据采集利器】参考promethus输出

# HELP function3_total description
# TYPE function3_total counter
function3_total 0.1702260507535086
# HELP function1_total
# TYPE function1_total counter
function1_total 0.0
# HELP counter1_total
# TYPE counter1_total counter
counter1_total 4.0
# HELP function2_total
# TYPE function2_total counter
function2_total 0.5199302460797031
# HELP counter2_total A simple counter
# TYPE counter2_total counter
counter2_totaltag1="a", 2.0

使用场景：Counter的作用是记录XXX的总量或者计数值，适用于一些增长类型的统计，例如下单、支付次数、Http请求总量记录等等，通过Tag可以区分不同的场景，对于下单，可以使用不同的Tag标记不同的业务来源或者是按日期划分，对于Http请求总量记录，可以使用Tag区分不同的URL。
3.2 Gauge 计量仪（Gauge）表示的是单个的变化的值，是获取当前值的句柄，典型的例子是，获取集合、map、或运行中的线程数等。与计数器的不同之处在于，计量仪的值并不总是增加的。与创建 Counter 对象类似，Gauge 对象可以从计量器注册表中创建，也可以使用 Gauge.builder() 方法返回的构造器来创建。如下示例，其中 gauge() 方法创建的是记录任意 Number 对象的值， gaugeCollectionSize() 方法记录集合的大小，gaugeMapSize() 方法记录 Map 的大小。需要注意的是，这 3 个方法返回的并不是 Gauge 对象，而是被记录的对象。这是由于 Gauge 对象一旦被创建，就不能手动对其中的值进行修改。在每次取样时，Gauge 会返回当前值。正因为如此，得到一个 Gauge 对象，除了进行测试之外，没有其他的意义。

// 手动加减Gauge
AtomicInteger value = https://www.songbingjia.com/android/registry.gauge("gauge1", new AtomicInteger(0));
value.set(1);

// 监视集合大小
List< String> list = registry.gaugeCollectionSize("list.size", Collections.emptyList(), new ArrayList< > ());
list.add("a");

// 监视集合大小
Map< String, String> map = registry.gaugeMapSize("map.size", Collections.emptyList(), new HashMap< > ());
map.put("a", "b");

Gauge.builder("gauge2", this, MainController::getValue)
.description("a simple gauge")
.tag("tag1", "a")
.register(registry);

参考promethus输出

# HELP gauge2 a simple gauge
# TYPE gauge2 gauge
gauge2tag1="a", 0.9622736974723566
# HELP map_size
# TYPE map_size gauge
map_size 1.0
# HELP gauge1
# TYPE gauge1 gauge
gauge1 1.0
# HELP list_size
# TYPE list_size gauge
list_size 6.0

使用场景：1、有自然(物理)上界的浮动值的监测，例如物理内存、集合、映射、数值等。2、有逻辑上界的浮动值的监测，例如积压的消息、(线程池中)积压的任务等，其实本质也是集合或者映射的监测。
3.3 Timer 计时器（Timer）通常用来记录耗时比较短的事件的执行时间，通过时间分布展示事件的序列和发生频率。计时器会记录两类数据：事件的数量和总的持续时间。在使用计时器之后，就不再需要单独创建一个计数器。计时器可以从注册表中创建，或者使用 Timer.builder() 方法返回的构建器来创建。Timer 提供了不同的方式来记录持续时间。第一种方式是使用 record() 方法来记录 Runnable 和 Callable 对象的运行时间；第二种方式是使用 Timer.Sample 来保存计时状态，方法 sample()首先使用 Timer.start() 来创建一个新的 Timer.Sample 对象并启动计时，调用 Timer.Sample 的 stop() 方法把记录的时间保存到 Timer 对象中。如果一个任务的耗时很长，直接使用 Timer 并不是一个好的选择，因为 Timer 只有在任务完成之后才会记录时间。更好的选择是使用 LongTaskTimer，LongTaskTimer 可以在任务进行中记录当前已经耗费的时间，是一个瞬态变量，监控时只有正在执行时才能采集到数据，它通过注册表的 more().longTaskTimer() 来创建：

// Timer
Timer timer1 = registry.timer("timer1");
timer1.record(() ->
try
Thread.sleep(3000);
catch (InterruptedException e)
e.printStackTrace();

);

Timer timer2 = registry.timer("timer2");
timer2.record(() -> dontCareAboutReturnValue());

// Timer.Sample
Timer.Sample timer2 = Timer.start();
new Thread(() ->
try
Thread.sleep(2000);
catch (InterruptedException e)
e.printStackTrace();

timer2.stop(registry.timer("timer2"));
).start();

// LongTaskTimer
LongTaskTimer longTime = registry.more().longTaskTimer("long.time");
longTime.record(() ->
try
Thread.sleep(3000);
catch (InterruptedException e)
e.printStackTrace();

);

参考promethus输出

# HELP timer2_seconds
# TYPE timer2_seconds summary
timer2_seconds_count 4.0
timer2_seconds_sum 8.016410046
# HELP timer2_seconds_max
# TYPE timer2_seconds_max gauge
timer2_seconds_max 2.005070709
# HELP long_timer_seconds
# TYPE long_timer_seconds untyped
long_timer_seconds_active_count 1.0
long_timer_seconds_duration_sum 5.949865982
# HELP timer1_seconds
# TYPE timer1_seconds summary
timer1_seconds_count 4.0
timer1_seconds_sum 12.016879054
# HELP timer1_seconds_max
# TYPE timer1_seconds_max gauge
timer1_seconds_max 3.00473016

使用场景：1、记录指定方法的执行时间用于展示。2、记录一些任务的执行时间，从而确定某些数据来源的速率，例如消息队列消息的消费速率等。
3.4 Summary 分布概要（Distribution summary）用来记录事件的分布情况。计时器本质上也是一种分布概要。表示分布概要的类 DistributionSummary 可以从注册表中创建，也可以使用 DistributionSummary.builder() 提供的构建器来创建。分布概要根据每个事件所对应的值，把事件分配到对应的桶（bucket）中。Micrometer 默认的桶的值从 1 到最大的 long 值。可以通过 minimumExpectedValue 和 maximumExpectedValue 来控制值的范围。如果事件所对应的值较小，可以通过 scale 来设置一个值来对数值进行放大。与分布概要密切相关的是直方图和百分比（percentile）。
1、百分位直方图（Percentile histograms） - Micrometer 将值累加到基础直方图并将预定的一组水桶运送到监控系统。监控系统的查询语言负责计算此直方图的百分数。目前，只有普罗米修斯（Prometheus ）和Atlas支持基于直方图的百分位近似值，通过histogram_quantile和:percentile分别。如果以Prometheus或Atlas为目标，则更喜欢这种方法，因为您可以汇总跨维度的直方图（通过简单汇总一组维度中的存储区的值）并从直方图中导出可汇总的百分位数。
2、客户端百分点（Client-side percentiles）- Micrometer 计算每个仪表ID（名称和标签集）的百分比近似值，并将百分位数值发送给监控系统。这不像百分比直方图那么灵活，因为不可能汇总标签间的百分比近似值。尽管如此，它为监视系统的百分位分布提供了一些洞察，这些监视系统不支持基于直方图的服务器端百分比计算。
大多数时候，我们并不关注具体的数值，而是数值的分布区间。比如在查看 HTTP 服务响应时间的性能指标时，通常关注是的几个重要的百分比，如 50%，75%和 90%等。所关注的是对于这些百分比数量的请求都在多少时间内完成。Micrometer 提供了两种不同的方式来处理百分比。
对于 Prometheus 这样本身提供了对百分比支持的监控系统，Micrometer 直接发送收集的直方图数据，由监控系统完成计算。对于其他不支持百分比的系统，Micrometer 会进行计算，并把百分比结果发送到监控系统。如下，创建的 DistributionSummary 所发布的百分比包括 0.5 、 0.75 和 0.9 。使用 record() 方法来记录数值，而 takeSnapshot() 方法返回当前数据的快照。

DistributionSummary summary = DistributionSummary.builder("api_cost")
.description("vertices distribution summary")
.publishPercentiles(0.5, 0.75, 0.80, 0.85, 0.9, 0.95)
.minimumExpectedValue(1d)
.maximumExpectedValue(10000d)
.register(registry);
summary.record(100);
summary.record(200);
summary.record(300);
summary.record(400);
summary.record(500);
summary.record(600);
summary.record(700);
summary.record(800);
summary.record(900);
summary.record(1000);
System.out.println(summary.takeSnapshot());

参考promethus输出

# HELP api_cost_max vertices distribution summary
# TYPE api_cost_max gauge
api_cost_max 1000.0
# HELP api_cost vertices distribution summary
# TYPE api_cost summary
api_costquantile="0.5", 508.0
api_costquantile="0.75", 828.0
api_costquantile="0.8", 828.0
api_costquantile="0.85", 924.0
api_costquantile="0.9", 924.0
api_costquantile="0.95", 1020.0
api_cost_count 10.0
api_cost_sum 5500.0

使用场景：不依赖于时间单位的记录值的测量，例如服务器有效负载值，缓存的命中率等。
3.5 Histogram Timer 和 Distribution summaries 支持收集数据用于展示它们的百分位分布, 有两种主要的方法用于浏览百分位。
1、Percentitle histogram - Micrometer 会累积直方图的基础值,并且将准备好的一系列bucket运送往监控系统, 监控系统的查询语句可以从该柱状图中计算出百分位.现在, 只有Promethues和Atlas支持基于Histogram的百分位近似值计算, 分别是histogram_quantitle 和 :percentile, 如果是针对prometheus或者是Atlas, 更加建议使用这种下面这种方式, 因为你可以通过跨维度聚合直方图(简单的将一组维度中bucket的值简单的相加) 然后从直方图中推导聚合出百分位
2、Client-side percentiles - Micrometer计算出每个meterID(以name和tags确定唯一性)的百分位近似值,然后把百分位的值运送到监控系统, 这一点没有percentile histogram来的灵活，因为它没有办法根据tags聚合出百分位近似值, 不过, 它提供了一些更深入的直方图分布的方式，这是在服务端进行基于直方图的百分位计算所不具备的.
?直方图?

Timer timer = Timer.builder("api-cost")
.publishPercentileHistogram()
.sla(Duration.ofMillis(500), Duration.ofMillis(700),Duration.ofMillis(1000))
.minimumExpectedValue(Duration.ofMillis(1))
.maximumExpectedValue(Duration.ofMillis(1000))
.register(Meters.getRegister());
timer.record(100, TimeUnit.MILLISECONDS);
timer.record(300, TimeUnit.MILLISECONDS);
timer.record(500, TimeUnit.MILLISECONDS);
timer.record(700, TimeUnit.MILLISECONDS);
timer.record(900, TimeUnit.MILLISECONDS);

1、sla:使用SLA定义的存储桶发布累积直方图, 可用于和参数publishPercentileHistogram在监控系统中做协调, 以支持聚合百分位, 这个设置会增加额外的buckets以发布histogram, 用于那些不支持聚合百分位的系统，这个设置可以发布仅拥有这些buckets的直方图.
2、minimumExpectedValue/maximumExpectedValue:控制由publishPercentileHistogram产生的buckets的数，也可以用来控制精度和潜在Hdr直方图结构的memory footprint.
3、publishPercentileHistogram:用于发布直方图, 适用于计算聚合(多维度)百分位近似值，在prometheus中用histogram_quantile,在Atlas中用:percentile. 直方图结果中的buckets，由micrometer预先生成，基于生成器，根据Netflix的经验来确定的，以产生大多数合理的误差界限，在大多数现实世界的timers和distribution summaries. 生成器默认会产生276个buckets,但是micrometer只会将那些范围在minimumExpectedValue 和maximumExpectedValue之内的发送给监控系统, 同时, micrometer所产生的时间范围会在1毫秒到1分钟之间, 每个时间维度产生73个柱状图的bucket, publishPercentileHistogram对于那些不支持聚合百分位近似值计算的系统来说，是不会生效的.
参考promethus输出

# HELP api_cost_seconds_max
# TYPE api_cost_seconds_max gauge
api_cost_seconds_max 0.9
# HELP api_cost_seconds
# TYPE api_cost_seconds histogram
api_cost_seconds_bucketle="0.001", 0.0
api_cost_seconds_bucketle="0.001048576", 0.0
api_cost_seconds_bucketle="0.001398101", 0.0
api_cost_seconds_bucketle="0.001747626", 0.0
api_cost_seconds_bucketle="0.002097151", 0.0
api_cost_seconds_bucketle="0.002446676", 0.0
api_cost_seconds_bucketle="0.002796201", 0.0
api_cost_seconds_bucketle="0.003145726", 0.0
api_cost_seconds_bucketle="0.003495251", 0.0
api_cost_seconds_bucketle="0.003844776", 0.0
api_cost_seconds_bucketle="0.004194304", 0.0
api_cost_seconds_bucketle="0.005592405", 0.0
api_cost_seconds_bucketle="0.006990506", 0.0
api_cost_seconds_bucketle="0.008388607", 0.0
api_cost_seconds_bucketle="0.009786708", 0.0
api_cost_seconds_bucketle="0.011184809", 0.0
api_cost_seconds_bucketle="0.01258291", 0.0
api_cost_seconds_bucketle="0.013981011", 0.0
api_cost_seconds_bucketle="0.015379112", 0.0
api_cost_seconds_bucketle="0.016777216", 0.0
api_cost_seconds_bucketle="0.022369621", 0.0
api_cost_seconds_bucketle="0.027962026", 0.0
api_cost_seconds_bucketle="0.033554431", 0.0
api_cost_seconds_bucketle="0.039146836", 0.0
api_cost_seconds_bucketle="0.044739241", 0.0
api_cost_seconds_bucketle="0.050331646", 0.0
api_cost_seconds_bucketle="0.055924051", 0.0
api_cost_seconds_bucketle="0.061516456", 0.0
api_cost_seconds_bucketle="0.067108864", 0.0
api_cost_seconds_bucketle="0.089478485", 0.0
api_cost_seconds_bucketle="0.111848106", 1.0
api_cost_seconds_bucketle="0.134217727", 1.0
api_cost_seconds_bucketle="0.156587348", 1.0
api_cost_seconds_bucketle="0.178956969", 1.0
api_cost_seconds_bucketle="0.20132659", 1.0
api_cost_seconds_bucketle="0.223696211", 1.0
api_cost_seconds_bucketle="0.246065832", 1.0
api_cost_seconds_bucketle="0.268435456", 1.0
api_cost_seconds_bucketle="0.357913941", 2.0
api_cost_seconds_bucketle="0.447392426", 2.0
api_cost_seconds_bucketle="0.5", 3.0
api_cost_seconds_bucketle="0.536870911", 3.0
api_cost_seconds_bucketle="0.626349396", 3.0
api_cost_seconds_bucketle="0.7", 4.0
api_cost_seconds_bucketle="0.715827881", 4.0
api_cost_seconds_bucketle="0.805306366", 4.0
api_cost_seconds_bucketle="0.894784851", 4.0
api_cost_seconds_bucketle="0.984263336", 5.0
api_cost_seconds_bucketle="1.0", 5.0
api_cost_seconds_bucketle="+Inf", 5.0
api_cost_seconds_count 5.0
api_cost_seconds_sum 2.5

?分位图?

Timer timer = Timer.builder("api-cost")
.publishPercentiles(0.6,0.7,0.8,0.9,1)
.register(Meters.getRegister());
timer.record(100, TimeUnit.MILLISECONDS);
timer.record(300, TimeUnit.MILLISECONDS);
timer.record(500, TimeUnit.MILLISECONDS);
timer.record(700, TimeUnit.MILLISECONDS);
timer.record(900, TimeUnit.MILLISECONDS);

publishPercentiles - 用于发布你应用中计算出来的百分位, 不支持多维度的聚合
参考promethus输出

# HELP api_cost_seconds_max
# TYPE api_cost_seconds_max gauge
api_cost_seconds_max 0.9
# HELP api_cost_seconds
# TYPE api_cost_seconds summary
api_cost_secondsquantile="0.6", 0.499122176
api_cost_secondsquantile="0.7", 0.700448768
api_cost_secondsquantile="0.8", 0.700448768
api_cost_secondsquantile="0.9", 0.90177536
api_cost_secondsquantile="1.0", 0.90177536
api_cost_seconds_count 5.0
api_cost_seconds_sum 2.5

3.6 过滤器每次发送百分位数据到监控系统中时会产生额外的时间序列, 通常建议是不是把他们配置在核心组件中，而是应该作为程序中的依赖项,相反，应用可以通过meter filter来打开这些timers/distribution summaries的开关，通常timers/distribution summaries会费内存和CPU。
举例，假设我们有一些timers在基础包里面，这些timer名字的前缀叫myservice:

registry.timer("myservice.http.requests").record(..);
registry.timer("myservice.db.requests").record(..);

我们可以通过meter filter的方式打开客户端方式的百分位统计

We can turn on client-side percentiles for both timers via a meter filter:
registry.config().meterFilter(
new MeterFilter()
@Override
public DistributionStatisticConfig configure(Meter.Id id, DistributionStatisticConfig config)
if(id.getName().startsWith("myservice"))
return DistributionStatisticConfig.builder()
.percentiles(0.95)
.build()
.merge(config);

return config;

);

四、集成监控系统
Micrometer 提供了对多种不同的监控系统的支持。
4.1 Prometheus详细文档参见??官网???
?非SpringBoot?

< dependency>
< groupId> io.micrometer< /groupId>
< artifactId> micrometer-registry-prometheus< /artifactId>
< version> 1.3.1< /version>
< /dependency>

可以使用如下方法，用JDK的com.sun.net.httpserver.HttpServer来暴露数据。

// 注册Registry到全局变量中
PrometheusMeterRegistry prometheusRegistry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);

try
HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0);
server.createContext("/prometheus", httpExchange ->
String response = prometheusRegistry.scrape();
httpExchange.sendResponseHeaders(200, response.getBytes().length);
try (OutputStream os = httpExchange.getResponseBody())
os.write(response.getBytes());

);
new Thread(server::start).start();
catch (IOException e)
throw new RuntimeException(e);

promethues pushgateway

< dependency>
< groupId> io.prometheus< /groupId>
< artifactId> simpleclient_pushgateway< /artifactId>
< version> 0.9.0< /version>
< /dependency>

核心代码，需要注意的是，push的所有指标需要带上instance标签

Counter counterDemo = Counter.build()
.name("push_way_counter")
.labelNames("instance")
.help("Counter 实例");
counterDemo.labels("localhsot:9091").inc(1)

push代码

PrometheusMeterRegistry registry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
CollectorRegistry collectorRegistry = registry.getPrometheusRegistry();
PushGateway prometheusPush = new PushGateway("localhost:9091");
prometheusPush.push(collectorRegistry,"job-name");

?SpringBoot?
springboot2使用micrometer作为metrics组件，其提供了对prometheus的支持，只需要引入spring-boot-starter-actuator、micrometer-registry-prometheus，spring-boot-starter-actuator会自动完成所需的配置，这些注册表对象也会被自动添加到全局注册表对象中，如果需要对该注册表进行配置，添加类型为 MeterRegistryCustomizer 的 bean 即可。引入依赖包:

< dependency>
< groupId> org.springframework.boot< /groupId>
< artifactId> spring-boot-starter-actuator< /artifactId>
< /dependency>
< dependency>
< groupId> io.micrometer< /groupId>
< artifactId> micrometer-registry-prometheus< /artifactId>
< version> 1.3.1< /version>
< /dependency>

配置文件

management.endpoints.web.exposure.include=*

http地址 /actuator/prometheus

# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage 0.0
# HELP tomcat_global_request_seconds
# TYPE tomcat_global_request_seconds summary
tomcat_global_request_seconds_countname="http-nio-8080", 2.0
tomcat_global_request_seconds_sumname="http-nio-8080", 0.111
# HELP tomcat_global_received_bytes_total
# TYPE tomcat_global_received_bytes_total counter
tomcat_global_received_bytes_totalname="http-nio-8080", 0.0

4.2 Grafana Dashboard 参见GrafanaLabs，提供基于Micrometer的仪表盘(Spring，SpringBoot)，JVM (Micrometer)、Spring Boot 2.1 Statistics。为了区分不同的应用程序的监控，可以使用通用的标签"application"应用到每个指标上。

# 非SpringBoot
registry.config().commonTags("application", "MYAPPNAME");

# Spring Boot，
@Bean
MeterRegistryCustomizer< MeterRegistry> configurer(@Value("$spring.application.name") String applicationName)
return (registry) -> registry.config().commonTags("application", applicationName);

# application.properties(springboot)
management.metrics.tags.application=$spring.application.name

http地址 ?/actuator/prometheus?

对于SpringBoot 1.x的应用，需要引入micrometer-spring-legacy

< dependency>
< groupId> org.springframework.boot< /groupId>
< artifactId> spring-boot-starter-actuator< /artifactId>
< /dependency>
< dependency>
< groupId> io.micrometer< /groupId>
< artifactId> micrometer-spring-legacy< /artifactId>
< version> 1.0.6< /version>
< /dependency>
< dependency>
< groupId> io.micrometer< /groupId>
< artifactId> micrometer-registry-prometheus< /artifactId>
< version> 1.0.6< /version>
< /dependency>

需要在prometheus.yml中配置Job_Name属性

- job_name: my-app-prod
scheme: http
metrics_path: /metrics
static_configs:
- targets:
- "your_hostname:your_port"

参见GrafanaLabs，??Spring Boot 1.x??，导入后，可以在Prometheus Job处进行切换job_name，查看不同应用的监控信息
4.3 InfluxDB详细文档参见??官网???，pringboot2使用micrometer作为metrics组件，其提供了对influxdb的支持，只需要引入micrometer-registry-influx，然后进行配置即可。引入依赖包

< dependency>
< groupId> org.springframework.boot< /groupId>
< artifactId> spring-boot-starter-actuator< /artifactId>
< /dependency>
< dependency>
< groupId> io.micrometer< /groupId>
< artifactId> micrometer-registry-influx< /artifactId>
< version> 1.3.1< /version>
< /dependency>

InfluxConfig是一个接口类，可以覆盖其方法来绑定参数。

InfluxConfig config = new InfluxConfig()
@Override
public Duration step()
return Duration.ofSeconds(10);

@Override
public String db()
return "mydb";

@Override
public String get(String k)
return null; // accept the rest of the defaults

;
MeterRegistry registry = new InfluxMeterRegistry(config, Clock.SYSTEM);

SpringBoot支持management.metrics.export.influx开头的参数自动绑定到InfluxConfig中。

management.metrics.export.influx:
auto-create-db: true # Whether to create the Influx database if it does not exist before attempting to publish metrics to it. (Default: true)
batch-size: 10000 # Number of measurements per request to use for this backend. If more measurements are found, then multiple requests will be made. (Default: 10000)
compressed: true # Whether to enable GZIP compression of metrics batches published to Influx. (Default: true)
connect-timeout: 1s # Connection timeout for requests to this backend. (Default: 1s)
consistency: one # Write consistency for each point. (Default: one)
db: mydb # Tag that will be mapped to "host" when shipping metrics to Influx. (Defaut: mydb)
enabled: true # Whether exporting of metrics to this backend is enabled. (Default: true)
num-threads: 2 # Number of threads to use with the metrics publishing scheduler. (Default: 2)
password: mysecret # Login password of the Influx server.
read-timeout: 10s # Read timeout for requests to this backend. (Default: 10s)
retention-policy: my_rp # Retention policy to use (Influx writes to the DEFAULT retention policy if one is not specified).
step: 1m # Step size (i.e. reporting frequency) to use. (Default: 1m)
uri: http://localhost:8086 # URI of the Influx server. (Default: http://localhost:8086)
user-name: myusername # Login user of the Influx server.

创建InfluxDB

# 或者直接配置文件指定auto-create-db=true，就无需额外创建
curl -i -X POST http://192.168.99.100:8086/query --data-urlencode "q=CREATE DATABASE springboot"

查看InfluxDB数据

docker exec -it influx influx
> use springboot
> show MEASUREMENTS
name: measurements
name
----
jvm.buffer.count
jvm.buffer.memory.used
jvm.buffer.total.capacity
jvm.classes.loaded
...

show series from "http.server.requests"
key
---
http.server.requests,exception=None,method=GET,metric_type=histogram,status=200,uri=/actuator/health
> select * from "http.server.requests"
name: http.server.requests
timecount exception meanmethod metric_type status sumupperuri
--------- --------- ---------- ----------- ------ -----------
1529238292912000000 0None0GEThistogram200072.601487 /actuator/health
1529238352888000000 2None39.154634 GEThistogram20078.309267 72.601487 /actuator/health

五、JVM、Cache、OkHttpClient
Micrometer提供了几个用于监视JVM、Cache等的binder，用SpringBoot的话会自动配置，也可以自己配置。例如：

new ClassLoaderMetrics().bindTo(registry);
new JvmMemoryMetrics().bindTo(registry);
new JvmGcMetrics().bindTo(registry);
new ProcessorMetrics().bindTo(registry);
new JvmThreadMetrics().bindTo(registry);
// 通过添加OkHttpMetricsEventListener来收集OkHttpClient指标
OkHttpClient client = new OkHttpClient.Builder()
.eventListener(OkHttpMetricsEventListener.builder(registry, "okhttp.requests")
.tags(Tags.of("foo", "bar"))
.build())
.build();
// 为了配置URI mapper，可以用uriMapper()
OkHttpClient client = new OkHttpClient.Builder()
.eventListener(OkHttpMetricsEventListener.builder(registry, "okhttp.requests")
.uriMapper(req -> req.url().encodedPath())
.tags(Tags.of("foo", "bar"))
.build())
.build();