Android|Android Stability - gdb和coredump
在分析Android Native Error这一类问题的时候,如果能抓到异常进程的coredump文件,那么对分析该问题是事半功倍的,但是由于在抓取coredump文件的时候,需要消耗很多的内存和CPU资源,并且保存的文件也都很大,所以用户最终使用的版本都是默认关闭的,即使在内部研发阶段,也只是某些特定的测试项里面才会打开,例如针对系统稳定性的monkey测试,所以有的时候稳定性问题其实不是很难分析,难的是获取有效的Log,抓取到了coredump文件,同时有这个固件对应的symbole的话就可以使用GDB这一个调试利器来分析问题了.
Coredump文件
coredump文件可以理解为是进程某个时刻的内存和寄存器快照,最终用ELF文件把这些内容包装一下,就可以使用GDB等工具来分析了,Kernel默认是支持Coredump的,但是在Android上面还有几个重要的因素影响到是否会抓取coredump.
【Android|Android Stability - gdb和coredump】Linux当中每个进程可以使用的资源是有限制的,可以通过查看/proc/$PID/limits这个文件来查看,例如
文章图片
进程rlimit
从这个节点的信息可以看到,这个进程允许打开的文件个数是1024,而它的core file size是0,所以当前这个进程是即使收到了相关的信号,它也是无法抓取coredump的,所以一般要修改进程的rlimit.
/proc/sys/kernel/core_pattern设置coredump文件的保存路径,例如 echo " /data/corefile/core-%e-%p" > /proc/sys/kernel/core_pattern
另外还可能要执行 echo 1 > /proc/sys/fs/suid_dumpable.
进程只有在接收到某些特定的信号时,才会去抓coredump,比如SIGSEGV、SIGABRT、SIGBUS等等,同时要注意在抓取某个进程的coredump文件的时候,不能发送SIGKILL信号给该进程,SIGKILL会终止抓取动作,导致抓出来的coredump文件不完整,无法分析.GDB
- GDB在线调试环境
更多的信息请见搭建Android GDB在线调试环境
- adb shell gdbserver remote:1234 --attach 4321
1234是手机端的端口,4321是你要debug的进程PID.- adb forward tcp:1234 tcp:1234
设置adb tcp端口转发,前一个tcp:1234是指PC端的端口,后一个是Target,也就是手机端的.- aarch64-linux-android-gdb
aarch64-linux-android-gdb是针对ARM64的gdb客户端,相应的对于以AARCH32来执行的进程,需要选择相应的gdb客户端.- 在GDB命令行里面执行以下命令:
(gdb) set solib-absolute-prefix out/target/product/general/symbols/
(gdb) set solib-search-path out/target/product/general/symbols/
(gdb) target remote :1234
- GDB + Eclipse 离线调试
1、打开ADT之后,依次点击Run → Debug Configration,然后选择C/C++ Postmortem Debugger
2、点击左上角的 "+"符号,新建一个配置,并随机取一个名字,例如“android_gdb”, C/C++ Appliacation选择你的Coredump文件对应的可执行文件,例如SurfaceFlinger,可以选择/symbols/system/bin/surfaceflinger,但是由zygote派生出来的进程要选择/symbols/system/bin/app_process64, 同时 Post Mortem file type选择 Core file,点击Browse定位到Coredump文件.
文章图片
3、切换到Debugger选项卡,GDB debugger选择对应平台的gdb可执行文件,GDB command file对应的文件是你想在打开coredump文件之后想要执行的gdb命令,我的gdbinit文件内容是: set solib-search-path /media/xxxx/SSD/tmp/Log/0622/symbols/system/lib64 设置GDB的lib库查找路径,这样GDB就可以把带符号信息的so库加载进去了
文章图片
4、点击Debug按钮之后会出现完整的debug视图
文章图片
文章图片
- GDB脚本
??define commandName
???statement
???......
??end
其中 statement可以是任何有效的GDB命令,此外自定义命令还支持最多10个输入参数:$arg0,$arg1 …… $arg9,并且还用$argc来标明一共传入了多少参数,另外脚本也提供了if else等条件判断语句和while循环语句,可以直接在命令行里面编辑gdb脚本,也可以写到一个单独的文件里面,然后使用source命令加载进来.
- GDB调试coredump示例
在monkey测试过程中,发现有一台机器卡屏了,通过log分析到可能是system_server进程的ART虚拟机在抓取trace或者gc时候,调用SuspendAll的时候超时了,这种情况以前也遇到过,也是抓coredump文件来分析的,所以这一次我们也是直接发送了kill -11信号给system_server进程,然后抓到coredump文件.拿到了coredump文件之后,还需要这个固件对应的symbole文件分析.
[Linux@Linux w]$ls
core-system_server-3060symbolssymbols.zip
[Linux@Linux w]$aarch64-linux-android-gdb ./symbols/system/bin/app_process64 ./core-system_server-3060
GNU gdb (GDB) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=aarch64-elf-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./symbols/system/bin/app_process64...done.
[New LWP 3060]
[New LWP 3065]
[New LWP 3173]
[New LWP 3067]
[New LWP 3066]
......
......
[New LWP 3954]
[New LWP 3086]
[New LWP 3128]warning: Could not load shared library symbols for 194 libraries, e.g. /system/bin/linker64.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Program terminated with signal SIGSEGV, Segmentation fault.
#00x0000007962221cac in ?? ()
(gdb) set solib-search-path ./symbols/system/lib64/
Reading symbols from /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so
Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so
Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so...done.
Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so
......
(gdb) bt
#0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#10x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45
#2art::ConditionVariable::WaitHoldingLocks (this=, self=) at art/runtime/base/mutex.cc:848
#30x000000795f3272e8 in TransitionFromSuspendedToRunnable (this=) at art/runtime/thread-inl.h:209
#4ScopedThreadStateChange (self=, new_thread_state=art::kRunnable, this=) at art/runtime/scoped_thread_state_change.h:51
#5ScopedObjectAccessUnchecked (this=, env=) at art/runtime/scoped_thread_state_change.h:224
#6ScopedObjectAccess (this=, env=) at art/runtime/scoped_thread_state_change.h:255
#7art::JNI::NewStringUTF (env=, utf=) at art/runtime/jni_internal.cc:1646
#80x0000007961acce64 in NewStringUTF (bytes=, this=0x795f63e180) at libnativehelper/include/nativehelper/jni.h:842
#9android::android_content_AssetManager_getArrayStringResource (env=0x795f63e180, clazz=, arrayResId=) at frameworks/base/core/jni/android_util_AssetManager.cpp:1977
#10 0x00000000748f498c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
因为这个机器是虚拟机在suspend all的时候卡住的,分析代码,这里卡住的话一般是因为某些线程没有及时响应suspend flag,而不响应的话一般是这个线程的状态是mRunnable状态,注意这里指的是ART的线程状态不是Linux的R状态,这两个之间还是有区别的,那我们的思路就是要从coredump文件找出是哪个线程还在mRunnable状态,因为所有的Java线程对应的art::Thread对象都在ThreadList的list_域变量里面,所以我们只要把这个list_对象内容打印出来,就可以找到是哪个Java线程是mRunnable状态.
// The actual list of all threads.
std::list list_ GUARDED_BY(Locks::thread_list_lock_);
而要打印这个list_的内容的话,需要从上下文里面找到ThreadList对象,这个可以通过Runtime的全局变量推导出来,也可以找到哪个线程的调用堆栈上下文里面有这个ThreadList对象的,然后找出来,我们这里选用第二种方式,因为ThreadList::SuspendAllInternal的方法恰好就有this参数,通过this就很容易找到ThreadList对象,在虚拟机中调用这个的地方只有SignalCatcher 或者HeapTaskDaemon线程,他们一个负责打印trace,一个负责执行gc task,所以先从现场或者log里面找到这两个线程的pid,然后通过gdb来查看他们当前的堆栈. 我们找到这两个线程的pid分别为3065和3070.
(gdb) info threads
IdTarget IdFrame
180LWP 3128syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
179LWP 3086syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
178LWP 3954syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
177LWP 3218syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
......
127LWP 3167syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
---Type to continue, or q to quit---
因为GDB对线程重新编了号,所以我们要找到3065和3070对应的编号,而且我们看到在GDB里面有输出
“---Type
(gdb) set pagination off
(gdb) info threads
......
9LWP 3070syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
......
2LWP 3065syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41(gdb) t 2
[Switching to thread 2 (LWP 3065)]
#0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录.
(gdb) bt
#0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#10x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45
#2art::ConditionVariable::WaitHoldingLocks (this=, self=) at art/runtime/base/mutex.cc:848
#30x000000795f120234 in TransitionFromSuspendedToRunnable (this=) at art/runtime/thread-inl.h:209
#4ScopedThreadStateChange (new_thread_state=art::kRunnable, this=, self=) at art/runtime/scoped_thread_state_change.h:51
#5ScopedObjectAccessUnchecked (this=, self=) at art/runtime/scoped_thread_state_change.h:231
#6ScopedObjectAccess (self=, this=) at art/runtime/scoped_thread_state_change.h:261
#7art::ClassLinker::DumpForSigQuit (this=, os=...) at art/runtime/class_linker.cc:7752
#80x000000795f415950 in art::Runtime::DumpForSigQuit (this=0x795f6ec000, os=...) at art/runtime/runtime.cc:1401
#90x000000795f41c27c in art::SignalCatcher::HandleSigQuit (this=) at art/runtime/signal_catcher.cc:145
#10 0x000000795f41ad3c in art::SignalCatcher::Run (arg=) at art/runtime/signal_catcher.cc:214
#11 0x000000796226e0f0 in __pthread_start (arg=) at bionic/libc/bionic/pthread_create.cpp:198
#12 0x0000007962223944 in __start_thread (fn=0x62, arg=0x795f6fa910) at bionic/libc/bionic/clone.cpp:41
#13 0x0000000000000000 in ?? ()(gdb) t 9
[Switching to thread 9 (LWP 3070)]
#0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41in bionic/libc/arch-arm64/bionic/syscall.S
(gdb) bt
#0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#10x000000795f43bb2c in futex (val3=0, uaddr=, op=, val=, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45
#2art::ThreadList::SuspendAllInternal (this=, self=, ignore1=, ignore2=, debug_suspend=) at art/runtime/thread_list.cc:586
#30x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476
#40x000000795f1c6d4c in art::gc::collector::MarkSweep::RunPhases (this=) at art/runtime/gc/collector/mark_sweep.cc:153
#50x000000795f1bf490 in art::gc::collector::GarbageCollector::Run (this=0x795f687500, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=false) at art/runtime/gc/collector/garbage_collector.cc:87
#60x000000795f1ef0a4 in art::gc::Heap::CollectGarbageInternal (this=, gc_type=, gc_cause=, clear_soft_references=) at art/runtime/gc/heap.cc:2719
#70x000000795f1f65dc in art::gc::Heap::ConcurrentGC (this=0x795f64b700, self=, force_full=true) at art/runtime/gc/heap.cc:3722
#80x000000795f1fd668 in art::gc::Heap::ConcurrentGCTask::Run (this=, self=0x0) at art/runtime/gc/heap.cc:3685
#90x000000795f21f2c4 in art::gc::TaskProcessor::RunAllTasks (this=, self=) at art/runtime/gc/task_processor.cc:124
#10 0x0000000072739114 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)(gdb)
从上面gdb命令的执行结果来看,ThreadList对象的地址是0x795f6fb000,那么可以通过它找到保存了所有Thread对象的list_地址.
#3 0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476.
(gdb) set print pretty on
(gdb) f 3
#30x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476
476 in art/runtime/thread_list.cc
(gdb) p *this
$2 = {
static kMaxThreadId = 65535,
static kInvalidThreadId = 0,
static kMainThreadId = 1,
allocated_ids_ = {
> = {
static __bits_per_word = 64,
__first_ = {18446744073709551615, 18446744073709551615, 38654705663, 0 }
},
members of std::__1::bitset<65535>:
static __n_words = 1024
},
list_ = {
> = {
__end_ = {
__prev_ = 0x792df94ee0,
__next_ = 0x795f6fa9a0
},
__size_alloc_ = {
, 2>> = {
> = {},
members of std::__1::__libcpp_compressed_pair_imp:
__first_ = 161
}, }
}, },
suspend_all_count_ = 1,
debug_suspend_all_count_ = 0,
......
}因为list_是一个很长的列表,所以这里先自定义一个GDB命令,用来自动打印每个Thread对象内容(gdb) def dump_all_threads_state
Type commands for definition of "dump_all_threads_state".
End with a line saying just "end".
>set $current = list_.__end_.__next_
>while $current != 0
>p * $current.__value_
>set $current = $current.__next_
>end
>end (gdb) dump_all_threads_state
$3 = {
static kStackOverflowImplicitCheckSize = 8192,
static kMaxCheckpoints = 3,
static kMaxSuspendBarriers = 3,
static is_started_ = true,
static pthread_key_self_ = -2147483634,
static resume_cond_ = 0x795f6fa900,
static is_sensitive_thread_hook_ = 0x7961a76f20 ,
static jit_sensitive_thread_ = 0x0,
tls32_ = {
state_and_flags = {
as_struct = {
flags = 1,
state = 89
},
as_atomic_int = {
> = {
> = {
> = {
__a_ = 5832705
}, }, }, },
as_int = 5832705
},
suspend_count = 1,
debug_suspend_count = 0,
thin_lock_thread_id = 1,
tid = 3060,
daemon = 0,
throwing_OutOfMemoryError = 0,
no_thread_suspension = 0,
thread_exit_check_count = 0,
handling_signal_ = 0,
suspended_at_suspend_check = 0,
ready_for_debug_invoke = 0,
debug_method_entry_ = 0,
is_gc_marking = 0,
weak_ref_access_enabled = 1,
disable_thread_flip_count = 0
},
tls64_ = {
trace_clock_base = 0,
stats = {
allocated_objects = 0,
allocated_bytes = 0,
freed_objects = 0,
freed_bytes = 0,
gc_for_alloc_count = 0,
class_init_count = 2682,
class_init_time_ns = 1043034049
}
},
tlsPtr_ = {
card_table = 0x795ad01070 "",
exception = 0x0,
stack_end = 0x7ffbc34000 "",
managed_stack = {
top_quick_frame_ = 0x7ffc42d940,
link_ = 0x7ffc42e050,
top_shadow_frame_ = 0x0
},
suspend_trigger = 0x0,
jni_env = 0x795f63e180,
tmp_jni_env = 0x0,
self = 0x0,
opeer = 0x762523e8,
jpeer = 0x0,
stack_begin = 0x7ffbc32000 "",
stack_size = 8388608,
stack_trace_sample = 0x0,
wait_next = 0x0,
monitor_enter_object = 0x0,
top_handle_scope = 0x7ffc42d948,
class_loader_override = 0x10070a,
long_jump_context = 0x795f687c80,
instrumentation_stack = 0x795f716e90,
debug_invoke_req = 0x0,
single_step_control = 0x0,
stacked_shadow_frame_record = 0x0,
deoptimization_context_stack = 0x0,
frame_id_to_shadow_frame = 0x0,
name = 0x795f6fa980,
pthread_self = 521358133912,
last_no_thread_suspension_cause = 0x0,
checkpoint_functions = {0x0, 0x0, 0x0},
active_suspend_barriers = {0x0, 0x0, 0x0},
jni_entrypoints = {
pDlsymLookup = 0x795f0b04d0
},
quick_entrypoints = {
pAllocArray = 0x795f0b4420 ,
pAllocArrayResolved = 0x795f0b44e0 ,
pAllocArrayWithAccessCheck = 0x795f0b45a0 ,
pAllocObject = 0x795f0b9cc0 ,
pAllocObjectResolved = 0x795f0b41e0 ,
pAllocObjectInitialized = 0x795f0b42a0 ,
pAllocObjectWithAccessCheck = 0x795f0b4360 ,
pCheckAndAllocArray = 0x795f0b4660 ,
pCheckAndAllocArrayWithAccessCheck = 0x795f0b4720 ,
pAllocStringFromBytes = 0x795f0b47e0 ,
pAllocStringFromChars = 0x795f0b48f0 ,
pAllocStringFromString = 0x795f0b49b0 ,
pInstanceofNonTrivial = 0x795f516374 ,
pCheckCast = 0x795f0b17f0 ,
pInitializeStaticStorage = 0x795f0b1a40 ,
pInitializeTypeAndVerifyAccess = 0x795f0b1bc0 ,
pInitializeType = 0x795f0b1b00 ,
pResolveString = 0x795f0b2e80 ,
pSet8Instance = 0x795f0b2a00 ,
pSet8Static = 0x795f0b2700 ,
pSet16Instance = 0x795f0b2ac0 ,
pSet16Static = 0x795f0b27c0 ,
pSet32Instance = 0x795f0b2b80 ,
pSet32Static = 0x795f0b2880 ,
pSet64Instance = 0x795f0b2c40 ,
pSet64Static = 0x795f0b2dc0 ,
pSetObjInstance = 0x795f0b2d00 ,
pSetObjStatic = 0x795f0b2940 ,
pGetByteInstance = 0x795f0b2280 ,
pGetBooleanInstance = 0x795f0b21c0 ,
pGetByteStatic = 0x795f0b1d40 ,
pGetBooleanStatic = 0x795f0b1c80 ,
pGetShortInstance = 0x795f0b2400 ,
pGetCharInstance = 0x795f0b2340 ,
pGetShortStatic = 0x795f0b1ec0 ,
pGetCharStatic = 0x795f0b1e00 ,
pGet32Instance = 0x795f0b24c0 ,
pGet32Static = 0x795f0b1f80 ,
pGet64Instance = 0x795f0b2580 ,
pGet64Static = 0x795f0b2040 ,
pGetObjInstance = 0x795f0b2640 ,
pGetObjStatic = 0x795f0b2100 ,
pAputObjectWithNullAndBoundCheck = 0x795f0b1870 ,
pAputObjectWithBoundCheck = 0x795f0b1880 ,
pAputObject = 0x795f0b18a0 ,
pHandleFillArrayData = https://www.it610.com/article/0x795f0b1980 ,
pJniMethodStart = 0x795f523c2c ,
pJniMethodStartSynchronized = 0x795f523db0 ,
pJniMethodEnd = 0x795f523dec ,
pJniMethodEndSynchronized = 0x795f5240d4 ,
pJniMethodEndWithReference = 0x795f5242f4 ,
pJniMethodEndWithReferenceSynchronized = 0x795f5243a4 ,
pQuickGenericJniTrampoline = 0x795f0ba500 ,
pLockObject = 0x795f0b1430 ,
pUnlockObject = 0x795f0b1610 ,
pCmpgDouble = 0x0,
pCmpgFloat = 0x0,
pCmplDouble = 0x0,
pCmplFloat = 0x0,
pCos = 0x7961075168 ,
pSin = 0x7961079e78 ,
pAcos = 0x796106b978 ,
pAsin = 0x796106c128 ,
pAtan = 0x7961074400 ,
pAtan2 = 0x796106c55c ,
pCbrt = 0x7961074844 ,
pCosh = 0x796106cbb4 ,
pExp = 0x796106cd98 ,
pExpm1 = 0x7961077cdc ,
pHypot = 0x796106d688 ,
pLog = 0x7961071960 ,
pLog10 = 0x79610712c0 ,
pNextAfter = 0x79610792c8 ,
pSinh = 0x79610730f0 ,
pTan = 0x796107a72c ,
pTanh = 0x796107aefc ,
pFmod = 0x796106d204 ,
pL2d = 0x0,
pFmodf = 0x796106d4f4 ,
pL2f = 0x0,
pD2iz = 0x0,
pF2iz = 0x0,
pIdivmod = 0x0,
pD2l = 0x0,
pF2l = 0x0,
pLdiv = 0x0,
pLmod = 0x0,
pLmul = 0x0,
pShlLong = 0x0,
pShrLong = 0x0,
pUshrLong = 0x0,
pIndexOf = 0x795f0ba930 ,
pStringCompareTo = 0x795f0baa00 ,
pMemcpy = 0x79622208c8 ,
pQuickImtConflictTrampoline = 0x795f0ba290 ,
pQuickResolutionTrampoline = 0x795f0ba3c0 ,
pQuickToInterpreterBridge = 0x795f0ba650 ,
pInvokeDirectTrampolineWithAccessCheck = 0x795f0b0a70 ,
pInvokeInterfaceTrampolineWithAccessCheck = 0x795f0b0870 ,
pInvokeStaticTrampolineWithAccessCheck = 0x795f0b0970 ,
pInvokeSuperTrampolineWithAccessCheck = 0x795f0b0b70 ,
pInvokeVirtualTrampolineWithAccessCheck = 0x795f0b0c70 ,
pTestSuspend = 0x795f0ba090 ,
pDeliverException = 0x795f0b0660 ,
pThrowArrayBounds = 0x795f0b0760 ,
pThrowDivZero = 0x795f0b0710 ,
pThrowNoSuchMethod = 0x795f0b0810 ,
pThrowNullPointer = 0x795f0b06c0 ,
pThrowStackOverflow = 0x795f0b07c0 ,
pDeoptimize = 0x795f0ba8d0 ,
pA64Load = 0x795f4253b8 ,
pA64Store = 0x795f4253b8 ,
pNewEmptyString = 0x70e8c810,
pNewStringFromBytes_B = 0x70e8c848,
pNewStringFromBytes_BI = 0x70e8c880,
pNewStringFromBytes_BII = 0x70e8c8b8,
pNewStringFromBytes_BIII = 0x70e8c8f0,
pNewStringFromBytes_BIIString = 0x70e8c928,
pNewStringFromBytes_BString = 0x70e8c998,
pNewStringFromBytes_BIICharset = 0x70e8c960,
pNewStringFromBytes_BCharset = 0x70e8c9d0,
pNewStringFromChars_C = 0x70e8ca40,
pNewStringFromChars_CII = 0x70e8ca78,
pNewStringFromChars_IIC = 0x70e8ca08,
pNewStringFromCodePoints = 0x70e8cab0,
pNewStringFromString = 0x70e8cae8,
pNewStringFromStringBuffer = 0x70e8cb20,
pNewStringFromStringBuilder = 0x70e8cb58,
pReadBarrierJni = 0x795f523c28 *, art::Thread*)>,
pReadBarrierMark = 0x795f5235c4 ,
pReadBarrierSlow = 0x795f5236e8 ,
pReadBarrierForRootSlow = 0x795f5236f0 *)>
},
thread_local_objects = 0,
thread_local_start = 0x0,
thread_local_pos = 0x0,
thread_local_end = 0x0,
mterp_current_ibase = 0x795f0a0280 ,
mterp_default_ibase = 0x795f0a0280 ,
mterp_alt_ibase = 0x795f0a8280 ,
rosalloc_runs = {0x795f5fcf08 , 0x13754000, 0x149d7000, 0x14e1f000, 0x14595000, 0x1457c000, 0x13e65000, 0x14186000, 0x14828000, 0x12f71000, 0x13f18000, 0x795f5fcf08 , 0x795f5fcf08 , 0x13c29000, 0x795f5fcf08 , 0x795f5fcf08 },
thread_local_alloc_stack_top = 0x795a52b8b8,
thread_local_alloc_stack_end = 0x795a52ba00,
held_mutexes = {0x0 },
nested_signal_state = 0x795f67f300,
flip_function = 0x0,
method_verifier = 0x0,
thread_local_mark_stack = 0x0
},
wait_mutex_ = 0x795f719080,
wait_cond_ = 0x795f6fa960,
wait_monitor_ = 0x0,
interrupted_ = false,
debug_disallow_read_barrier_ = 0 '\000'
}
...... //此处省略N个Thread对象的打印
Cannot access memory at address 0xa1
(gdb)
从上面打印出来的N个Thread对象的内容来看,我们很容易找到处于kRunnable状态的线程,它的pid为3093,因为它的state = 67,也就是kRunnable.
enum ThreadState {
//Thread.StateJDWP state
kTerminated = 66,// TERMINATEDTS_ZOMBIEThread.run has returned, but Thread* still around
kRunnable,// RUNNABLETS_RUNNINGrunnable
kTimedWaiting,// TIMED_WAITINGTS_WAITin Object.wait() with a timeout
kSleeping,// TIMED_WAITINGTS_SLEEPINGin Thread.sleep()
kBlocked,// BLOCKEDTS_MONITORblocked on a monitor
kWaiting,// WAITINGTS_WAITin Object.wait()
kWaitingForGcToComplete,// WAITINGTS_WAITblocked waiting for GC
kWaitingForCheckPointsToRun,// WAITINGTS_WAITGC waiting for checkpoints to run
kWaitingPerformingGc,// WAITINGTS_WAITperforming GC
kWaitingForDebuggerSend,// WAITINGTS_WAITblocked waiting for events to be sent
kWaitingForDebuggerToAttach,// WAITINGTS_WAITblocked waiting for debugger to attach
kWaitingInMainDebuggerLoop,// WAITINGTS_WAITblocking/reading/processing debugger events
kWaitingForDebuggerSuspension,// WAITINGTS_WAITwaiting for debugger suspend all
kWaitingForJniOnLoad,// WAITINGTS_WAITwaiting for execution of dlopen and JNI on load code
kWaitingForSignalCatcherOutput,// WAITINGTS_WAITwaiting for signal catcher IO to complete
kWaitingInMainSignalCatcherLoop,// WAITINGTS_WAITblocking/reading/processing signals
kWaitingForDeoptimization,// WAITINGTS_WAITwaiting for deoptimization suspend all
kWaitingForMethodTracingStart,// WAITINGTS_WAITwaiting for method tracing to start
kWaitingForVisitObjects,// WAITINGTS_WAITwaiting for visiting objects
kWaitingForGetObjectsAllocated,// WAITINGTS_WAITwaiting for getting the number of allocated objects
kWaitingWeakGcRootRead,// WAITINGTS_WAITwaiting on the GC to read a weak root
kWaitingForGcThreadFlip,// WAITINGTS_WAITwaiting on the GC thread flip (CC collector) to finish
kStarting,// NEWTS_WAITnative thread started, not yet ready to run managed code
kNative,// RUNNABLETS_RUNNINGrunning in a JNI native method
kSuspended,// RUNNABLETS_RUNNINGsuspended by GC or debugger
};
tls32_ = {
state_and_flags = {
as_struct = {
flags = 5,
state = 67
},
as_atomic_int = {
> = {
> = {
> = {
__a_ = 4390917
}, }, }, },
as_int = 4390917
},
suspend_count = 1,
debug_suspend_count = 0,
thin_lock_thread_id = 20,
tid = 3093,
daemon = 0,
throwing_OutOfMemoryError = 0,
no_thread_suspension = 0,
thread_exit_check_count = 0,
handling_signal_ = 0,
suspended_at_suspend_check = 0,
ready_for_debug_invoke = 0,
debug_method_entry_ = 0,
is_gc_marking = 0,
weak_ref_access_enabled = 1,
disable_thread_flip_count = 0
}
(gdb) t 176
[Switching to thread 176 (LWP 3093)]
#0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
41bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录.
(gdb) bt
#0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
#10x000000796226eb84 in __futex (op=, value=https://www.it610.com/article/, timeout=0x0, bitset=-1, ftx=) at bionic/libc/private/bionic_futex.h:48
#2__futex_wait_ex (value=https://www.it610.com/article/, ftx=, shared=, use_realtime_clock=, abs_timeout=) at bionic/libc/private/bionic_futex.h:70
#3__pthread_normal_mutex_lock (abs_timeout_or_null=, mutex=, shared=, use_realtime_clock=) at bionic/libc/bionic/pthread_mutex.cpp:327
#4__pthread_mutex_lock_with_timeout (mutex=, use_realtime_clock=, abs_timeout_or_null=) at bionic/libc/bionic/pthread_mutex.cpp:430
#50x0000007961ad0354 in android::android_content_AssetManager_applyStyle (env=0x795127c740, themeToken=520810520368, defStyleAttr=, defStyleRes=16974731, xmlParserToken=1982366608, attrs=0x795f0bdcb4 , outValues=0x795f197c50 , outIndices=0x7942b9dee0, clazz=) at frameworks/base/core/jni/android_util_AssetManager.cpp:1430
#60x00000000748f3ecc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
从这个堆栈来看它已经进入了JNI函数,按理来说它应该是kNative状态才对,但是这里却为kRunnable状态,有点奇怪,查看进入Jni函数的代码:
extern uint32_t JniMethodStart(Thread* self) {
JNIEnvExt* env = self->GetJniEnv();
DCHECK(env != nullptr);
uint32_t saved_local_ref_cookie = env->local_ref_cookie;
env->local_ref_cookie = env->locals.GetSegmentState();
ArtMethod* native_method = *self->GetManagedStack()->GetTopQuickFrame();
if (!native_method->IsFastNative()) { //如果这个Jni方法不是fast native方法,就改为suspend状态
// When not fast JNI we transition out of runnable.
self->TransitionFromRunnableToSuspended(kNative);
}
return saved_local_ref_cookie;
}
所以如果这个Native方法是fast native方法的话,那么它的状态就还是kRunnable,我们看android_content_AssetManager_applyStyle这个Jni函数注册的地方:
{ "applyStyle","!(JIIJ[I[I[I)Z",(void*) android_content_AssetManager_applyStyle }注册的时候有加!号,所以这个函数的确是一个fast native方法,所以它的状态就是kRunnable,fast native方法应该是指能够很快返回的jni方法,所以可以不用转换状态,本来是一种优化措施,但是从上面的堆栈来看,这个fast native方法却在等锁,一旦等锁的话,就可能不是那么快执行完了,所以觉得这里把它置为fast native不是那么合适,而应该去掉前面的 !号,这样就可以在进入JNI之后变为kNative状态,ART也不会卡死.
推荐阅读
- android第三方框架(五)ButterKnife
- Android中的AES加密-下
- 带有Hilt的Android上的依赖注入
- android|android studio中ndk的使用
- Android事件传递源码分析
- RxJava|RxJava 在Android项目中的使用(一)
- Android7.0|Android7.0 第三方应用无法访问私有库
- 深入理解|深入理解 Android 9.0 Crash 机制(二)
- android防止连续点击的简单实现(kotlin)
- Android|Android install 多个设备时指定设备