Skywalking-06(OAL基础)

OAL 基础知识 基本介绍

OAL(Observability Analysis Language) 是一门用来分析流式数据的语言。
因为 OAL 聚焦于度量 ServiceService InstanceEndpoint 的指标,所以它学习和使用起来非常简单。
OAL 基于 altlrjavassistoal 脚本转化为动态生成的类文件。
自从 6.3 版本后, OAL 引擎内置在 OAP 服务器中,可以看做 oal-rt(OAL Runtime)OAL 脚本位置 OAL 配置目录下( /config/oal ),使用者能够更改脚本并重启生效。注意: OAL 脚本仍然是一门编译语言, oal-rt 动态的生成 Java 代码。
如果你配置了环境变量 SW_OAL_ENGINE_DEBUG=Y,能在工作目录下的 oal-rt 目录下找到生成的 Class 文件。
语法
// 声明一个指标 METRICS_NAME = from(SCOPE.(* | [FIELD][,FIELD ...])) // 从某一个SCOPE中获取数据 [.filter(FIELD OP [INT | STRING])] // 可以过滤掉部分数据 .FUNCTION([PARAM][, PARAM ...]) // 使用某个聚合函数将数据聚合// 禁用一个指标 disable(METRICS_NAME);

语法案例 oap-server/server-bootstrap/src/main/resources/oal/java-agent.oal
// 从ServiceInstanceJVMMemory的used获取数据,只需要 heapStatus 为 true的数据,并取long型的平均值 instance_jvm_memory_heap = from(ServiceInstanceJVMMemory.used).filter(heapStatus == true).longAvg();

org.apache.skywalking.oap.server.core.source.ServiceInstanceJVMMemory
@ScopeDeclaration(id = SERVICE_INSTANCE_JVM_MEMORY, name = "ServiceInstanceJVMMemory", catalog = SERVICE_INSTANCE_CATALOG_NAME) @ScopeDefaultColumn.VirtualColumnDefinition(fieldName = "entityId", columnName = "entity_id", isID = true, type = String.class) public class ServiceInstanceJVMMemory extends Source { @Override public int scope() { return DefaultScopeDefine.SERVICE_INSTANCE_JVM_MEMORY; }@Override public String getEntityId() { return String.valueOf(id); }@Getter @Setter private String id; @Getter @Setter @ScopeDefaultColumn.DefinedByField(columnName = "name", requireDynamicActive = true) private String name; @Getter @Setter @ScopeDefaultColumn.DefinedByField(columnName = "service_name", requireDynamicActive = true) private String serviceName; @Getter @Setter @ScopeDefaultColumn.DefinedByField(columnName = "service_id") private String serviceId; @Getter @Setter private boolean heapStatus; @Getter @Setter private long init; @Getter @Setter private long max; @Getter @Setter private long used; @Getter @Setter private long committed; }

可供参考的官方文档:Observability Analysis Language
从一个案例开始分析 OAL 原理 缺少的类加载信息监控 默认的 APM/Instance 页面,缺少关于 JVM Class 的信息(如下图所示),故这次将相关信息补齐。由这次案例来分析 OAL 的原理。
Skywalking-06(OAL基础)
文章图片

在 Skywalking-04:扩展Metric监控信息 中,讲到了如何在已有 Source 类的情况下,增加一些指标。
这次直接连 Source 类以及 OAL 词法语法关键字都自己定义。
可供参考的官方文档:Source and Scope extension for new metrics
确定增加的指标 通过Java ManagementFactory解析这篇文章,可以确定监控指标为“当前加载类的数量”、“已卸载类的数量”、“一共加载类的数量”三个指标
ClassLoadingMXBean classLoadingMXBean = ManagementFactory.getClassLoadingMXBean(); // 当前加载类的数量 int loadedClassCount = classLoadingMXBean.getLoadedClassCount(); // 已卸载类的数量 long unloadedClassCount = classLoadingMXBean.getUnloadedClassCount(); // 一共加载类的数量 long totalLoadedClassCount = classLoadingMXBean.getTotalLoadedClassCount();

定义 agentoap server 通讯类 【Skywalking-06(OAL基础)】apm-protocol/apm-network/src/main/proto/language-agent/JVMMetric.proto 协议文件中增加如下定义。
apm-protocol/apm-network 目录下执行 mvn clean package -DskipTests=true 会生成新的相关 Java 类,org.apache.skywalking.apm.network.language.agent.v3.Class 该类就是我们在代码中实际操作的类。
message Class { int64 loadedClassCount = 1; int64 unloadedClassCount = 3; int64 totalLoadedClassCount = 2; }message JVMMetric { int64 time = 1; CPU cpu = 2; repeated Memory memory = 3; repeated MemoryPool memoryPool = 4; repeated GC gc = 5; Thread thread = 6; // 在JVM指标中添加Class的定义 Class clazz = 7; }

收集 agent 的信息后,将信息发送至 oap server 收集 Class 相关的指标信息
package org.apache.skywalking.apm.agent.core.jvm.clazz; import org.apache.skywalking.apm.network.language.agent.v3.Class; import java.lang.management.ClassLoadingMXBean; import java.lang.management.ManagementFactory; public enum ClassProvider { /** * instance */ INSTANCE; private final ClassLoadingMXBean classLoadingMXBean; ClassProvider() { this.classLoadingMXBean = ManagementFactory.getClassLoadingMXBean(); }// 构建class的指标信息 public Class getClassMetrics() { int loadedClassCount = classLoadingMXBean.getLoadedClassCount(); long unloadedClassCount = classLoadingMXBean.getUnloadedClassCount(); long totalLoadedClassCount = classLoadingMXBean.getTotalLoadedClassCount(); return Class.newBuilder().setLoadedClassCount(loadedClassCount) .setUnloadedClassCount(unloadedClassCount) .setTotalLoadedClassCount(totalLoadedClassCount) .build(); }}

org.apache.skywalking.apm.agent.core.jvm.JVMService#run 方法中,将 class 相关指标设置到 JVM 指标类中
@Override public void run() { long currentTimeMillis = System.currentTimeMillis(); try { JVMMetric.Builder jvmBuilder = JVMMetric.newBuilder(); jvmBuilder.setTime(currentTimeMillis); jvmBuilder.setCpu(CPUProvider.INSTANCE.getCpuMetric()); jvmBuilder.addAllMemory(MemoryProvider.INSTANCE.getMemoryMetricList()); jvmBuilder.addAllMemoryPool(MemoryPoolProvider.INSTANCE.getMemoryPoolMetricsList()); jvmBuilder.addAllGc(GCProvider.INSTANCE.getGCList()); jvmBuilder.setThread(ThreadProvider.INSTANCE.getThreadMetrics()); // 设置class的指标 jvmBuilder.setClazz(ClassProvider.INSTANCE.getClassMetrics()); // 将JVM的指标放在阻塞队列中 // org.apache.skywalking.apm.agent.core.jvm.JVMMetricsSender#run方法,会将相关信息发送至oap server sender.offer(jvmBuilder.build()); } catch (Exception e) { LOGGER.error(e, "Collect JVM info fail."); } }

创建 Source
public class DefaultScopeDefine { public static final int SERVICE_INSTANCE_JVM_CLASS = 11000; /** Catalog of scope, the metrics processor could use this to group all generated metrics by oal rt. */ public static final String SERVICE_INSTANCE_CATALOG_NAME = "SERVICE_INSTANCE"; }

package org.apache.skywalking.oap.server.core.source; import lombok.Getter; import lombok.Setter; import static org.apache.skywalking.oap.server.core.source.DefaultScopeDefine.SERVICE_INSTANCE_CATALOG_NAME; import static org.apache.skywalking.oap.server.core.source.DefaultScopeDefine.SERVICE_INSTANCE_JVM_CLASS; @ScopeDeclaration(id = SERVICE_INSTANCE_JVM_CLASS, name = "ServiceInstanceJVMClass", catalog = SERVICE_INSTANCE_CATALOG_NAME) @ScopeDefaultColumn.VirtualColumnDefinition(fieldName = "entityId", columnName = "entity_id", isID = true, type = String.class) public class ServiceInstanceJVMClass extends Source { @Override public int scope() { return SERVICE_INSTANCE_JVM_CLASS; }@Override public String getEntityId() { return String.valueOf(id); }@Getter @Setter private String id; @Getter @Setter @ScopeDefaultColumn.DefinedByField(columnName = "name", requireDynamicActive = true) private String name; @Getter @Setter @ScopeDefaultColumn.DefinedByField(columnName = "service_name", requireDynamicActive = true) private String serviceName; @Getter @Setter @ScopeDefaultColumn.DefinedByField(columnName = "service_id") private String serviceId; @Getter @Setter private long loadedClassCount; @Getter @Setter private long unloadedClassCount; @Getter @Setter private long totalLoadedClassCount; }

将从 agent 获取到的信息,发送至 SourceReceiveorg.apache.skywalking.oap.server.analyzer.provider.jvm.JVMSourceDispatcher 进行如下修改
public void sendMetric(String service, String serviceInstance, JVMMetric metrics) { long minuteTimeBucket = TimeBucket.getMinuteTimeBucket(metrics.getTime()); final String serviceId = IDManager.ServiceID.buildId(service, NodeType.Normal); final String serviceInstanceId = IDManager.ServiceInstanceID.buildId(serviceId, serviceInstance); this.sendToCpuMetricProcess( service, serviceId, serviceInstance, serviceInstanceId, minuteTimeBucket, metrics.getCpu()); this.sendToMemoryMetricProcess( service, serviceId, serviceInstance, serviceInstanceId, minuteTimeBucket, metrics.getMemoryList()); this.sendToMemoryPoolMetricProcess( service, serviceId, serviceInstance, serviceInstanceId, minuteTimeBucket, metrics.getMemoryPoolList()); this.sendToGCMetricProcess( service, serviceId, serviceInstance, serviceInstanceId, minuteTimeBucket, metrics.getGcList()); this.sendToThreadMetricProcess( service, serviceId, serviceInstance, serviceInstanceId, minuteTimeBucket, metrics.getThread()); // class指标处理 this.sendToClassMetricProcess( service, serviceId, serviceInstance, serviceInstanceId, minuteTimeBucket, metrics.getClazz()); }private void sendToClassMetricProcess(String service, String serviceId, String serviceInstance, String serviceInstanceId, long timeBucket, Class clazz) { // 拼装Source对象 ServiceInstanceJVMClass serviceInstanceJVMClass = new ServiceInstanceJVMClass(); serviceInstanceJVMClass.setId(serviceInstanceId); serviceInstanceJVMClass.setName(serviceInstance); serviceInstanceJVMClass.setServiceId(serviceId); serviceInstanceJVMClass.setServiceName(service); serviceInstanceJVMClass.setLoadedClassCount(clazz.getLoadedClassCount()); serviceInstanceJVMClass.setUnloadedClassCount(clazz.getUnloadedClassCount()); serviceInstanceJVMClass.setTotalLoadedClassCount(clazz.getTotalLoadedClassCount()); serviceInstanceJVMClass.setTimeBucket(timeBucket); // 将Source对象发送至SourceReceive进行处理 sourceReceiver.receive(serviceInstanceJVMClass); }

OAL 词法定义和语法定义中加入 Source 相关信息 在 oap-server/oal-grammar/src/main/antlr4/org/apache/skywalking/oal/rt/grammar/OALLexer.g4 定义 Class 关键字
// KeywordsFROM: 'from'; FILTER: 'filter'; DISABLE: 'disable'; SRC_ALL: 'All'; SRC_SERVICE: 'Service'; SRC_SERVICE_INSTANCE: 'ServiceInstance'; SRC_ENDPOINT: 'Endpoint'; SRC_SERVICE_RELATION: 'ServiceRelation'; SRC_SERVICE_INSTANCE_RELATION: 'ServiceInstanceRelation'; SRC_ENDPOINT_RELATION: 'EndpointRelation'; SRC_SERVICE_INSTANCE_JVM_CPU: 'ServiceInstanceJVMCPU'; SRC_SERVICE_INSTANCE_JVM_MEMORY: 'ServiceInstanceJVMMemory'; SRC_SERVICE_INSTANCE_JVM_MEMORY_POOL: 'ServiceInstanceJVMMemoryPool'; SRC_SERVICE_INSTANCE_JVM_GC: 'ServiceInstanceJVMGC'; SRC_SERVICE_INSTANCE_JVM_THREAD: 'ServiceInstanceJVMThread'; SRC_SERVICE_INSTANCE_JVM_CLASS:'ServiceInstanceJVMClass'; // 在OAL词法定义中添加Class的关键字 SRC_DATABASE_ACCESS: 'DatabaseAccess'; SRC_SERVICE_INSTANCE_CLR_CPU: 'ServiceInstanceCLRCPU'; SRC_SERVICE_INSTANCE_CLR_GC: 'ServiceInstanceCLRGC'; SRC_SERVICE_INSTANCE_CLR_THREAD: 'ServiceInstanceCLRThread'; SRC_ENVOY_INSTANCE_METRIC: 'EnvoyInstanceMetric';

oap-server/oal-grammar/src/main/antlr4/org/apache/skywalking/oal/rt/grammar/OALParser.g4 添加 Class 关键字
source : SRC_ALL | SRC_SERVICE | SRC_DATABASE_ACCESS | SRC_SERVICE_INSTANCE | SRC_ENDPOINT | SRC_SERVICE_RELATION | SRC_SERVICE_INSTANCE_RELATION | SRC_ENDPOINT_RELATION | SRC_SERVICE_INSTANCE_JVM_CPU | SRC_SERVICE_INSTANCE_JVM_MEMORY | SRC_SERVICE_INSTANCE_JVM_MEMORY_POOL | SRC_SERVICE_INSTANCE_JVM_GC | SRC_SERVICE_INSTANCE_JVM_THREAD | SRC_SERVICE_INSTANCE_JVM_CLASS |// 在OAL语法定义中添加词法定义中定义的关键字 SRC_SERVICE_INSTANCE_CLR_CPU | SRC_SERVICE_INSTANCE_CLR_GC | SRC_SERVICE_INSTANCE_CLR_THREAD | SRC_ENVOY_INSTANCE_METRIC | SRC_BROWSER_APP_PERF | SRC_BROWSER_APP_PAGE_PERF | SRC_BROWSER_APP_SINGLE_VERSION_PERF | SRC_BROWSER_APP_TRAFFIC | SRC_BROWSER_APP_PAGE_TRAFFIC | SRC_BROWSER_APP_SINGLE_VERSION_TRAFFIC ;

oap-server/oal-grammar 目录下执行 mvn clean package -DskipTests=true 会生成新的相关 Java
定义 OAL 指标 在 oap-server/server-bootstrap/src/main/resources/oal/java-agent.oal 中添加基于 OAL 语法的 Class 相关指标定义
// 当前加载类的数量 instance_jvm_class_loaded_class_count = from(ServiceInstanceJVMClass.loadedClassCount).longAvg(); // 已卸载类的数量 instance_jvm_class_unloaded_class_count = from(ServiceInstanceJVMClass.unloadedClassCount).longAvg(); // 一共加载类的数量 instance_jvm_class_total_loaded_class_count = from(ServiceInstanceJVMClass.totalLoadedClassCount).longAvg();

配置 UI 面板 将如下界面配置导入 APM 面板中
{ "name": "Instance", "children": [{ "width": "3", "title": "Service Instance Load", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "service_instance_cpm", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "CPM - calls per minute" }, { "width": 3, "title": "Service Instance Throughput", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "service_instance_throughput_received,service_instance_throughput_sent", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "Bytes" }, { "width": "3", "title": "Service Instance Successful Rate", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "service_instance_sla", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "%", "aggregation": "/", "aggregationNum": "100" }, { "width": "3", "title": "Service Instance Latency", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "service_instance_resp_time", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "ms" }, { "width": 3, "title": "JVM CPU (Java Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_jvm_cpu", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "%", "aggregation": "+", "aggregationNum": "" }, { "width": 3, "title": "JVM Memory (Java Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_jvm_memory_heap, instance_jvm_memory_heap_max,instance_jvm_memory_noheap, instance_jvm_memory_noheap_max", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "MB", "aggregation": "/", "aggregationNum": "1048576" }, { "width": 3, "title": "JVM GC Time", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_jvm_young_gc_time, instance_jvm_old_gc_time", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "ms" }, { "width": 3, "title": "JVM GC Count", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "queryMetricType": "readMetricsValues", "chartType": "ChartBar", "metricName": "instance_jvm_young_gc_count, instance_jvm_old_gc_count" }, { "width": 3, "title": "JVM Thread Count (Java Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "metricName": "instance_jvm_thread_live_count, instance_jvm_thread_daemon_count, instance_jvm_thread_peak_count,instance_jvm_thread_deadlocked,instance_jvm_thread_monitor_deadlocked" }, { "width": 3, "title": "JVM Thread State Count (Java Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_jvm_thread_new_thread_count,instance_jvm_thread_runnable_thread_count,instance_jvm_thread_blocked_thread_count,instance_jvm_thread_wait_thread_count,instance_jvm_thread_time_wait_thread_count,instance_jvm_thread_terminated_thread_count", "queryMetricType": "readMetricsValues", "chartType": "ChartBar" }, { "width": 3, "title": "JVM Class Count (Java Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_jvm_class_loaded_class_count,instance_jvm_class_unloaded_class_count,instance_jvm_class_total_loaded_class_count", "queryMetricType": "readMetricsValues", "chartType": "ChartArea" }, { "width": 3, "title": "CLR CPU(.NET Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_clr_cpu", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "%" }, { "width": 3, "title": "CLR GC (.NET Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_clr_gen0_collect_count, instance_clr_gen1_collect_count, instance_clr_gen2_collect_count", "queryMetricType": "readMetricsValues", "chartType": "ChartBar" }, { "width": 3, "title": "CLR Heap Memory (.NET Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "metricName": "instance_clr_heap_memory", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "unit": "MB", "aggregation": "/", "aggregationNum": "1048576" }, { "width": 3, "title": "CLR Thread (.NET Service)", "height": "250", "entityType": "ServiceInstance", "independentSelector": false, "metricType": "REGULAR_VALUE", "queryMetricType": "readMetricsValues", "chartType": "ChartLine", "metricName": "instance_clr_available_completion_port_threads,instance_clr_available_worker_threads,instance_clr_max_completion_port_threads,instance_clr_max_worker_threads" } ] }

结果校验 可以看到导入的界面中,已经有 Class 相关指标了
Skywalking-06(OAL基础)
文章图片

代码贡献
  • Add some new thread metric and class metric to JVMMetric #7230
  • add some new thread metric and class metric to JVMMetric #52
  • Remove Terminated State and New State in JVMMetric (#7230) #53
  • Add some new thread metric and class metric to JVMMetric (#7230) #7243
参考文档
  • Observability Analysis Language
  • Source and Scope extension for new metrics
  • Java ManagementFactory解析
分享并记录所学所见

    推荐阅读