记一次 .NET 某招聘网后端服务 内存暴涨分析
一:背景
1. 讲故事
前段时间有位朋友wx找到我,说他的程序存在内存阶段性暴涨,寻求如何解决,和朋友沟通下来,他的内存平时大概是5G
左右,在某些时点附近会暴涨到 10G+
, 画个图大概就是这样。
文章图片
所以接下来就是想办法给他找到那莫名奇妙的 5-6G
是个啥,上 windbg 说话。
二:Windbg 分析
1. 判断托管还是非托管
从描述上看大概率是托管层面的问题,但为了文章的完整性,我们还是用 !address -summary
和 !eeheap -gc
来看一下。
0:000> !address -summary--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free11647f5`58f12000 (7.958 TB)99.48%
6924a`6de84000 (41.717 GB)97.90%0.51%
Stack11230`16340000 ( 355.250 MB)0.81%0.00%
Image40630`1607d000 ( 352.488 MB)0.81%0.00%
Heap710`0c9ea000 ( 201.914 MB)0.46%0.00%
TEB3740`002ec000 (2.922 MB)0.01%0.00%
Other130`001c6000 (1.773 MB)0.00%0.00%
PEB10`00001000 (4.000 kB)0.00%0.00%--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE5423a`87200000 (42.111 GB)98.83%0.51%
MEM_IMAGE70330`1e5d6000 ( 485.836 MB)1.11%0.01%
MEM_MAPPED1130`01908000 (25.031 MB)0.06%0.00%--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE11647f5`58f12000 (7.958 TB)99.48%
MEM_RESERVE41658`1b873000 (32.430 GB)76.11%0.40%
MEM_COMMIT84042`8b86b000 (10.180 GB)23.89%0.12%0:000> !eeheap -gc
Number of GC Heaps: 32
------------------------------
Heap 0 (00000000004106d0)
generation 0 starts at 0x0000000082eb0e58
generation 1 starts at 0x0000000082d79b20
generation 2 starts at 0x000000007fff1000
ephemeral segment allocation context: none
segmentbeginallocatedsize
000000007fff0000000000007fff10000000000083f801280x3f8f128(66646312)
Large object heap starts at 0x000000087fff1000
segmentbeginallocatedsize
000000087fff0000000000087fff10000000000883fe41900x3ff3190(67056016)
0000000927ff00000000000927ff1000000000092bfe24300x3ff1430(67048496)
0000000a81c500000000000a81c510000000000a8221c8580x5cb858(6076504)
Heap Size:Size: 0xc53ef40 (206827328) bytes.
------------------------------
...
Heap 31 (0000000019c84130)
generation 0 starts at 0x0000000844fc5170
generation 1 starts at 0x0000000844f851f8
generation 2 starts at 0x000000083fff1000
ephemeral segment allocation context: none
segmentbeginallocatedsize
000000083fff0000000000083fff10000000000845171ca00x5180ca0(85462176)
Large object heap starts at 0x00000008fbff1000
segmentbeginallocatedsize
00000008fbff000000000008fbff100000000008fffe22900x3ff1290(67048080)
000000094bff0000000000094bff1000000000094ea2ebb80x2a3dbb8(44293048)
000000096bff0000000000096bff1000000000096dbdec000x1bedc00(29285376)
Heap Size:Size: 0xd79d6e8 (226088680) bytes.
------------------------------
GC Heap Size:Size: 0x1f1986a88 (8348265096) bytes.
从卦中得知,
10G
的内存,托管堆吃掉了 8.3G
,很明显托管层问题,知道大方向后,接下来就可以到托管堆看一看,根据过往经验程序肯定是生成了大量的类对象所致,上命令 !dumpheap -stat
。
0:000> !dumpheap -stat
Statistics:
MTCountTotalSize Class Name
...
000007fe9ddd5fc034128030032640 System.ServiceModel.Description.MessagePartDescription
000007fe9c4865a086634941584752 System.Xml.XmlDictionaryString
000007fe9defb09893780145014448 System.Xml.XmlDictionaryString
000007fe9c66bd2810505245086880 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Xml.XmlDictionaryString, System.Runtime.Serialization]][]
000007fe9e0f4d2011329949050864 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Xml.XmlDictionaryString, System.Runtime.Serialization]][]
00000000003c919044573618414438Free
000007fef8f6c1684284101209974642 System.Char[]
000007fef8f4f1b828497581246912848 System.Object[]
000007fef8f6f0585319631670620873 System.Byte[]
000007fef8f6aee023684312382587716 System.String
真是皂滑弄人,并没有命中过往经验,可以看出占用最大的都是些
Byte,String,Char,Object
基础类型,其实这些基础类型排查起来很难搞,要么不断的用 -min, -max
去筛选,要么就写一个脚本对它进行分组排序,蹩脚脚本如下:
"use strict";
/*
按 mt 对托管堆类型的size进行分组
*/let platform = 64
let mtlist = ["000007fef8f4f1b8"];
let maxlimit = 100;
function initializeScript() { return [new host.apiVersionSupport(1, 7)];
}
function log(str) { host.diagnostics.debugLog(str + "\n");
}
function exec(str) { log("\n" + str);
return host.namespace.Debugger.Utility.Control.ExecuteCommand(str);
}
function invokeScript() { for (var mt of mtlist) { groupby_mtsize_inheap(mt);
} }//对某个类型按照size 进行分组
function groupby_mtsize_inheap(mt) {
var size_group = {};
var commandText = "!dumpheap -mt " + mt;
var output = exec(commandText);
for (var line of output) {
if (line == "" || line.indexOf("Address") > -1) continue;
if (line.indexOf("Statistics") > -1) break;
var size = parseInt(line.substring(Math.ceil(platform / 2) + 1).trim());
if (!size_group[size]) size_group[size] = 0;
size_group[size]++;
}
show_top10_format(mt, size_group);
}function show_top10_format(mt, size_group) {
var maparr = [];
//转数组
for (var size in size_group) {
maparr.push({ "size": size, "count": size_group[size], "totalsize": (size * size_group[size]) });
}maparr.sort(function (a, b) { return b.totalsize - a.totalsize });
var topTotalSize = 0;
//按size输出
for (var i = 0;
i < Math.min(maparr.length, maxlimit);
i++) {
var size = maparr[i].size;
var count = maparr[i].count;
var totalsize = Math.round(maparr[i].totalsize / 1024 / 1024, 2);
topTotalSize += totalsizelog("size=" + size + ",count=" + count + ",totalsize=" + totalsize + "M");
}log("Total:" + topTotalSize + "M");
//show max
if (maparr.length > 0) {
var size = maparr[0].size;
var totalsize = Math.round(maparr[0].totalsize / 1024 / 1024, 2) + "M";
var output = exec("!dumpheap -mt " + mt + " -min 0n" + size + " -max 0n" + size + " -short").Take(maxlimit);
for (var line of output) {
log(line);
}
}
}
接下来把 string 的方法表地址传下去看看排序结果,简化输出如下:
!dumpheap -mt 000007fef8f6aee0
size=29285946,count=2,totalsize=56M
size=29285540,count=2,totalsize=56M
size=29285502,count=2,totalsize=56M
size=29285348,count=2,totalsize=56M
size=27455186,count=2,totalsize=52M
size=31116504,count=1,totalsize=30M
size=31116490,count=1,totalsize=30M
size=31116306,count=1,totalsize=30M
size=31115934,count=1,totalsize=30M
size=31115920,count=1,totalsize=30M
size=31115718,count=1,totalsize=30M
size=29286342,count=1,totalsize=28M
size=29285898,count=1,totalsize=28M
...
Total:1198M
可以看到,有不少大 size 的 string,那这些string到底是个啥,这里我随便抽几个导出到txt看看。
0:000> !dumpheap -mt 000007fef8f6aee0 -min 0n31116490 -max 0n31116490 -short
0000000a61c51000
0:000> !do 0000000a61c51000
Name:System.String
MethodTable: 000007fef8f6aee0
EEClass:000007fef88d3720
Size:31116490(0x1daccca) bytes
File:C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String:Fields:
MTFieldOffsetType VTAttrValue Name
000007fef8f6dc9040000aa8System.Int321 instance15558232 m_stringLength
000007fef8f6c1c840000abcSystem.Char1 instance50 m_firstChar
000007fef8f6aee040000ac18System.String0sharedstatic Empty
>> Domain:Value00000000003fb620:NotInit000000001ca30bd0:NotInit000000001f7b21a0:NotInit000000001f8940c0:NotInit0000000027dc46b0:NotInit00000000281bd720:NotInit00000000282b7ee0:NotInit<<0:000> .writemem D:\dumps\xxxx\string.txt 0000000a61c51000 L?0x1daccca
Writing 1daccca bytes..........
【记一次 .NET 某招聘网后端服务 内存暴涨分析】
文章图片
从内容看其实就是 pdf 的 base64 编码,以同样的方式调研
char[]
和 byte[]
类型,发现大多也都是 pdf,猜测程序在处理 pdf 的过程中,进行了 byte[]
,char[]
,string
之间的切换,所以这些对象理论上大多属于无根对象,其实通过 !heapstat -iu
也能看到那大约 5.5G
的无根对象正等待GC回收。
0:000> !heapstat -iu
HeapGen0Gen1Gen2LOH
Heap017625808127468047745824140181016
...
Total3574862562810061622296733765733004848Free space:Percentage
Heap039622402411211224298616SOH: 22% LOH:0%
Heap156258561449857168302152SOH: 27% LOH:0%
...
Heap3114485762419957312218024SOH: 25% LOH:0%
Total18149278411364318258565183128Unrooted objects:Percentage
Heap01216392824358442872137153536SOH: 18% LOH: 97%
...
Heap312368322392721435840139770656SOH:2% LOH: 99%
Total1649549527948448290664805530423784
三:总结 本次内存阶段性暴涨的事故,主要还是程序接收了上游过多的
pdf文件
,毕竟这些都是大对象,还进行了 char[] ,string,byte[] 的切换,造成短时间内过大的内存占用。最后就是我个人的解决建议:
- 针对大量的pdf,能否借用第三方的 oss 软件来规避一些不必要的内存占用。
- 清洗服务是否可以做些限流或者使用服务均摊的方式。
筛选过滤
以及一些业务流程优化
解决了这个问题,我想现实中肯定有很多朋友遇到过这类问题,欢迎大家留言补充您的解决方案。推荐阅读
- EffectiveObjective-C2.0|EffectiveObjective-C2.0 笔记 - 第二部分
- 野营记-第五章|野营记-第五章 讨伐梦魇兽
- 20170612时间和注意力开销记录
- 2018年11月19日|2018年11月19日 星期一 亲子日记第144篇
- 叙述作文
- 2019年12月24日
- 【故障公告】周五下午的一次突发故障
- 人生感悟记#环境仪器宋庆国成长记#072
- 2019.4.18感恩日记
- 我要我们在一起(二)