Android|Android Stability - gdb和coredump

在分析Android Native Error这一类问题的时候,如果能抓到异常进程的coredump文件,那么对分析该问题是事半功倍的,但是由于在抓取coredump文件的时候,需要消耗很多的内存和CPU资源,并且保存的文件也都很大,所以用户最终使用的版本都是默认关闭的,即使在内部研发阶段,也只是某些特定的测试项里面才会打开,例如针对系统稳定性的monkey测试,所以有的时候稳定性问题其实不是很难分析,难的是获取有效的Log,抓取到了coredump文件,同时有这个固件对应的symbole的话就可以使用GDB这一个调试利器来分析问题了.
Coredump文件 coredump文件可以理解为是进程某个时刻的内存和寄存器快照,最终用ELF文件把这些内容包装一下,就可以使用GDB等工具来分析了,Kernel默认是支持Coredump的,但是在Android上面还有几个重要的因素影响到是否会抓取coredump.

【Android|Android Stability - gdb和coredump】Linux当中每个进程可以使用的资源是有限制的,可以通过查看/proc/$PID/limits这个文件来查看,例如

Android|Android Stability - gdb和coredump
文章图片
进程rlimit
从这个节点的信息可以看到,这个进程允许打开的文件个数是1024,而它的core file size是0,所以当前这个进程是即使收到了相关的信号,它也是无法抓取coredump的,所以一般要修改进程的rlimit.
/proc/sys/kernel/core_pattern设置coredump文件的保存路径,例如 echo " /data/corefile/core-%e-%p" > /proc/sys/kernel/core_pattern
另外还可能要执行 echo 1 > /proc/sys/fs/suid_dumpable.
进程只有在接收到某些特定的信号时,才会去抓coredump,比如SIGSEGV、SIGABRT、SIGBUS等等,同时要注意在抓取某个进程的coredump文件的时候,不能发送SIGKILL信号给该进程,SIGKILL会终止抓取动作,导致抓出来的coredump文件不完整,无法分析.
GDB
  • GDB在线调试环境
GDB,GNU Project Debugger,大名鼎鼎的调试利器,对于我们程序员来说,即使没用过但应该也不陌生吧,GDB它可以在线调试,也可以离线调试coredump等内存转储文件,在稳定性日常工作中,我们主要用它来离线分析coredump文件.
  • adb shell gdbserver remote:1234 --attach 4321
    1234是手机端的端口,4321是你要debug的进程PID.
  • adb forward tcp:1234 tcp:1234
    设置adb tcp端口转发,前一个tcp:1234是指PC端的端口,后一个是Target,也就是手机端的.
  • aarch64-linux-android-gdb
    aarch64-linux-android-gdb是针对ARM64的gdb客户端,相应的对于以AARCH32来执行的进程,需要选择相应的gdb客户端.
  • 在GDB命令行里面执行以下命令:
    (gdb) set solib-absolute-prefix out/target/product/general/symbols/
    (gdb) set solib-search-path out/target/product/general/symbols/
    (gdb) target remote :1234
更多的信息请见搭建Android GDB在线调试环境
  • GDB + Eclipse 离线调试
工欲善其事必先利其器,分析NE问题可以使用命令行形式的GDB工具,如果你熟悉GDB的各种命令,那么命令行的方式可以让你得心应手,另外也还可以使用GDB + Eclipse打造一个可视化的调试环境,虽然功能没有命令行强大,但是对我们分析简单的问题足矣,下面介绍如何搭建环境:
1、打开ADT之后,依次点击Run → Debug Configration,然后选择C/C++ Postmortem Debugger
Android|Android Stability - gdb和coredump
文章图片
2、点击左上角的 "+"符号,新建一个配置,并随机取一个名字,例如“android_gdb”, C/C++ Appliacation选择你的Coredump文件对应的可执行文件,例如SurfaceFlinger,可以选择/symbols/system/bin/surfaceflinger,但是由zygote派生出来的进程要选择/symbols/system/bin/app_process64, 同时 Post Mortem file type选择 Core file,点击Browse定位到Coredump文件.
Android|Android Stability - gdb和coredump
文章图片
3、切换到Debugger选项卡,GDB debugger选择对应平台的gdb可执行文件,GDB command file对应的文件是你想在打开coredump文件之后想要执行的gdb命令,我的gdbinit文件内容是: set solib-search-path /media/xxxx/SSD/tmp/Log/0622/symbols/system/lib64 设置GDB的lib库查找路径,这样GDB就可以把带符号信息的so库加载进去了
Android|Android Stability - gdb和coredump
文章图片
4、点击Debug按钮之后会出现完整的debug视图
Android|Android Stability - gdb和coredump
文章图片
  • GDB脚本
GDB脚本 gdb支持两种脚本:python脚本和命令脚本,在命令脚本中我们可以自定义命令,其形式类似于:
??define commandName ???statement ???...... ??end

其中 statement可以是任何有效的GDB命令,此外自定义命令还支持最多10个输入参数:$arg0,$arg1 …… $arg9,并且还用$argc来标明一共传入了多少参数,另外脚本也提供了if else等条件判断语句和while循环语句,可以直接在命令行里面编辑gdb脚本,也可以写到一个单独的文件里面,然后使用source命令加载进来.
  • GDB调试coredump示例
    在monkey测试过程中,发现有一台机器卡屏了,通过log分析到可能是system_server进程的ART虚拟机在抓取trace或者gc时候,调用SuspendAll的时候超时了,这种情况以前也遇到过,也是抓coredump文件来分析的,所以这一次我们也是直接发送了kill -11信号给system_server进程,然后抓到coredump文件.拿到了coredump文件之后,还需要这个固件对应的symbole文件分析.
[Linux@Linux w]$ls core-system_server-3060symbolssymbols.zip [Linux@Linux w]$aarch64-linux-android-gdb ./symbols/system/bin/app_process64 ./core-system_server-3060 GNU gdb (GDB) 7.7 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-linux-gnu --target=aarch64-elf-linux". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./symbols/system/bin/app_process64...done. [New LWP 3060] [New LWP 3065] [New LWP 3173] [New LWP 3067] [New LWP 3066] ......

...... [New LWP 3954] [New LWP 3086] [New LWP 3128]warning: Could not load shared library symbols for 194 libraries, e.g. /system/bin/linker64. Use the "info sharedlibrary" command to see the complete listing. Do you need "set solib-search-path" or "set sysroot"? Program terminated with signal SIGSEGV, Segmentation fault. #00x0000007962221cac in ?? () (gdb) set solib-search-path ./symbols/system/lib64/ Reading symbols from /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so...done. Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libcutils.so Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so...done. Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/libutils.so Reading symbols from /media/xxxx/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so...done. Loaded symbols for /media/linux/SSD/tmp/Log/DoNotRemove/symbols/system/lib64/liblog.so ......

(gdb) bt #0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 #10x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45 #2art::ConditionVariable::WaitHoldingLocks (this=, self=) at art/runtime/base/mutex.cc:848 #30x000000795f3272e8 in TransitionFromSuspendedToRunnable (this=) at art/runtime/thread-inl.h:209 #4ScopedThreadStateChange (self=, new_thread_state=art::kRunnable, this=) at art/runtime/scoped_thread_state_change.h:51 #5ScopedObjectAccessUnchecked (this=, env=) at art/runtime/scoped_thread_state_change.h:224 #6ScopedObjectAccess (this=, env=) at art/runtime/scoped_thread_state_change.h:255 #7art::JNI::NewStringUTF (env=, utf=) at art/runtime/jni_internal.cc:1646 #80x0000007961acce64 in NewStringUTF (bytes=, this=0x795f63e180) at libnativehelper/include/nativehelper/jni.h:842 #9android::android_content_AssetManager_getArrayStringResource (env=0x795f63e180, clazz=, arrayResId=) at frameworks/base/core/jni/android_util_AssetManager.cpp:1977 #10 0x00000000748f498c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

因为这个机器是虚拟机在suspend all的时候卡住的,分析代码,这里卡住的话一般是因为某些线程没有及时响应suspend flag,而不响应的话一般是这个线程的状态是mRunnable状态,注意这里指的是ART的线程状态不是Linux的R状态,这两个之间还是有区别的,那我们的思路就是要从coredump文件找出是哪个线程还在mRunnable状态,因为所有的Java线程对应的art::Thread对象都在ThreadList的list_域变量里面,所以我们只要把这个list_对象内容打印出来,就可以找到是哪个Java线程是mRunnable状态.
// The actual list of all threads. std::list list_ GUARDED_BY(Locks::thread_list_lock_);

而要打印这个list_的内容的话,需要从上下文里面找到ThreadList对象,这个可以通过Runtime的全局变量推导出来,也可以找到哪个线程的调用堆栈上下文里面有这个ThreadList对象的,然后找出来,我们这里选用第二种方式,因为ThreadList::SuspendAllInternal的方法恰好就有this参数,通过this就很容易找到ThreadList对象,在虚拟机中调用这个的地方只有SignalCatcher 或者HeapTaskDaemon线程,他们一个负责打印trace,一个负责执行gc task,所以先从现场或者log里面找到这两个线程的pid,然后通过gdb来查看他们当前的堆栈. 我们找到这两个线程的pid分别为3065和3070.
(gdb) info threads IdTarget IdFrame 180LWP 3128syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 179LWP 3086syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 178LWP 3954syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 177LWP 3218syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 ...... 127LWP 3167syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 ---Type to continue, or q to quit---

因为GDB对线程重新编了号,所以我们要找到3065和3070对应的编号,而且我们看到在GDB里面有输出
“---Type to continue, or q to quit---”这样的内容,这个是因为GDB默认对于输出内容很长的做了截断,可以通过set pagination off来改变这种行为.
(gdb) set pagination off (gdb) info threads ...... 9LWP 3070syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 ...... 2LWP 3065syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41(gdb) t 2 [Switching to thread 2 (LWP 3065)] #0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 41bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录. (gdb) bt #0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 #10x000000795f0c63dc in futex (uaddr=0x795f6fa910, op=0, val=17669, val3=0, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45 #2art::ConditionVariable::WaitHoldingLocks (this=, self=) at art/runtime/base/mutex.cc:848 #30x000000795f120234 in TransitionFromSuspendedToRunnable (this=) at art/runtime/thread-inl.h:209 #4ScopedThreadStateChange (new_thread_state=art::kRunnable, this=, self=) at art/runtime/scoped_thread_state_change.h:51 #5ScopedObjectAccessUnchecked (this=, self=) at art/runtime/scoped_thread_state_change.h:231 #6ScopedObjectAccess (self=, this=) at art/runtime/scoped_thread_state_change.h:261 #7art::ClassLinker::DumpForSigQuit (this=, os=...) at art/runtime/class_linker.cc:7752 #80x000000795f415950 in art::Runtime::DumpForSigQuit (this=0x795f6ec000, os=...) at art/runtime/runtime.cc:1401 #90x000000795f41c27c in art::SignalCatcher::HandleSigQuit (this=) at art/runtime/signal_catcher.cc:145 #10 0x000000795f41ad3c in art::SignalCatcher::Run (arg=) at art/runtime/signal_catcher.cc:214 #11 0x000000796226e0f0 in __pthread_start (arg=) at bionic/libc/bionic/pthread_create.cpp:198 #12 0x0000007962223944 in __start_thread (fn=0x62, arg=0x795f6fa910) at bionic/libc/bionic/clone.cpp:41 #13 0x0000000000000000 in ?? ()(gdb) t 9 [Switching to thread 9 (LWP 3070)] #0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 41in bionic/libc/arch-arm64/bionic/syscall.S (gdb) bt #0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 #10x000000795f43bb2c in futex (val3=0, uaddr=, op=, val=, timeout=, uaddr2=) at art/runtime/base/mutex-inl.h:45 #2art::ThreadList::SuspendAllInternal (this=, self=, ignore1=, ignore2=, debug_suspend=) at art/runtime/thread_list.cc:586 #30x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476 #40x000000795f1c6d4c in art::gc::collector::MarkSweep::RunPhases (this=) at art/runtime/gc/collector/mark_sweep.cc:153 #50x000000795f1bf490 in art::gc::collector::GarbageCollector::Run (this=0x795f687500, gc_cause=art::gc::kGcCauseBackground, clear_soft_references=false) at art/runtime/gc/collector/garbage_collector.cc:87 #60x000000795f1ef0a4 in art::gc::Heap::CollectGarbageInternal (this=, gc_type=, gc_cause=, clear_soft_references=) at art/runtime/gc/heap.cc:2719 #70x000000795f1f65dc in art::gc::Heap::ConcurrentGC (this=0x795f64b700, self=, force_full=true) at art/runtime/gc/heap.cc:3722 #80x000000795f1fd668 in art::gc::Heap::ConcurrentGCTask::Run (this=, self=0x0) at art/runtime/gc/heap.cc:3685 #90x000000795f21f2c4 in art::gc::TaskProcessor::RunAllTasks (this=, self=) at art/runtime/gc/task_processor.cc:124 #10 0x0000000072739114 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)(gdb)

从上面gdb命令的执行结果来看,ThreadList对象的地址是0x795f6fb000,那么可以通过它找到保存了所有Thread对象的list_地址.
#3 0x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476.
(gdb) set print pretty on (gdb) f 3 #30x000000795f43c198 in art::ThreadList::SuspendAll (this=0x795f6fb000, cause=0x795f55e996 "ScopedPause", long_suspend=) at art/runtime/thread_list.cc:476 476 in art/runtime/thread_list.cc (gdb) p *this $2 = { static kMaxThreadId = 65535, static kInvalidThreadId = 0, static kMainThreadId = 1, allocated_ids_ = { > = { static __bits_per_word = 64, __first_ = {18446744073709551615, 18446744073709551615, 38654705663, 0 } }, members of std::__1::bitset<65535>: static __n_words = 1024 }, list_ = { > = { __end_ = { __prev_ = 0x792df94ee0, __next_ = 0x795f6fa9a0 }, __size_alloc_ = { , 2>> = { > = {}, members of std::__1::__libcpp_compressed_pair_imp: __first_ = 161 }, } }, }, suspend_all_count_ = 1, debug_suspend_all_count_ = 0, ...... }因为list_是一个很长的列表,所以这里先自定义一个GDB命令,用来自动打印每个Thread对象内容(gdb) def dump_all_threads_state Type commands for definition of "dump_all_threads_state". End with a line saying just "end". >set $current = list_.__end_.__next_ >while $current != 0 >p * $current.__value_ >set $current = $current.__next_ >end >end (gdb) dump_all_threads_state $3 = { static kStackOverflowImplicitCheckSize = 8192, static kMaxCheckpoints = 3, static kMaxSuspendBarriers = 3, static is_started_ = true, static pthread_key_self_ = -2147483634, static resume_cond_ = 0x795f6fa900, static is_sensitive_thread_hook_ = 0x7961a76f20 , static jit_sensitive_thread_ = 0x0, tls32_ = { state_and_flags = { as_struct = { flags = 1, state = 89 }, as_atomic_int = { > = { > = { > = { __a_ = 5832705 }, }, }, }, as_int = 5832705 }, suspend_count = 1, debug_suspend_count = 0, thin_lock_thread_id = 1, tid = 3060, daemon = 0, throwing_OutOfMemoryError = 0, no_thread_suspension = 0, thread_exit_check_count = 0, handling_signal_ = 0, suspended_at_suspend_check = 0, ready_for_debug_invoke = 0, debug_method_entry_ = 0, is_gc_marking = 0, weak_ref_access_enabled = 1, disable_thread_flip_count = 0 }, tls64_ = { trace_clock_base = 0, stats = { allocated_objects = 0, allocated_bytes = 0, freed_objects = 0, freed_bytes = 0, gc_for_alloc_count = 0, class_init_count = 2682, class_init_time_ns = 1043034049 } }, tlsPtr_ = { card_table = 0x795ad01070 "", exception = 0x0, stack_end = 0x7ffbc34000 "", managed_stack = { top_quick_frame_ = 0x7ffc42d940, link_ = 0x7ffc42e050, top_shadow_frame_ = 0x0 }, suspend_trigger = 0x0, jni_env = 0x795f63e180, tmp_jni_env = 0x0, self = 0x0, opeer = 0x762523e8, jpeer = 0x0, stack_begin = 0x7ffbc32000 "", stack_size = 8388608, stack_trace_sample = 0x0, wait_next = 0x0, monitor_enter_object = 0x0, top_handle_scope = 0x7ffc42d948, class_loader_override = 0x10070a, long_jump_context = 0x795f687c80, instrumentation_stack = 0x795f716e90, debug_invoke_req = 0x0, single_step_control = 0x0, stacked_shadow_frame_record = 0x0, deoptimization_context_stack = 0x0, frame_id_to_shadow_frame = 0x0, name = 0x795f6fa980, pthread_self = 521358133912, last_no_thread_suspension_cause = 0x0, checkpoint_functions = {0x0, 0x0, 0x0}, active_suspend_barriers = {0x0, 0x0, 0x0}, jni_entrypoints = { pDlsymLookup = 0x795f0b04d0 }, quick_entrypoints = { pAllocArray = 0x795f0b4420 , pAllocArrayResolved = 0x795f0b44e0 , pAllocArrayWithAccessCheck = 0x795f0b45a0 , pAllocObject = 0x795f0b9cc0 , pAllocObjectResolved = 0x795f0b41e0 , pAllocObjectInitialized = 0x795f0b42a0 , pAllocObjectWithAccessCheck = 0x795f0b4360 , pCheckAndAllocArray = 0x795f0b4660 , pCheckAndAllocArrayWithAccessCheck = 0x795f0b4720 , pAllocStringFromBytes = 0x795f0b47e0 , pAllocStringFromChars = 0x795f0b48f0 , pAllocStringFromString = 0x795f0b49b0 , pInstanceofNonTrivial = 0x795f516374 , pCheckCast = 0x795f0b17f0 , pInitializeStaticStorage = 0x795f0b1a40 , pInitializeTypeAndVerifyAccess = 0x795f0b1bc0 , pInitializeType = 0x795f0b1b00 , pResolveString = 0x795f0b2e80 , pSet8Instance = 0x795f0b2a00 , pSet8Static = 0x795f0b2700 , pSet16Instance = 0x795f0b2ac0 , pSet16Static = 0x795f0b27c0 , pSet32Instance = 0x795f0b2b80 , pSet32Static = 0x795f0b2880 , pSet64Instance = 0x795f0b2c40 , pSet64Static = 0x795f0b2dc0 , pSetObjInstance = 0x795f0b2d00 , pSetObjStatic = 0x795f0b2940 , pGetByteInstance = 0x795f0b2280 , pGetBooleanInstance = 0x795f0b21c0 , pGetByteStatic = 0x795f0b1d40 , pGetBooleanStatic = 0x795f0b1c80 , pGetShortInstance = 0x795f0b2400 , pGetCharInstance = 0x795f0b2340 , pGetShortStatic = 0x795f0b1ec0 , pGetCharStatic = 0x795f0b1e00 , pGet32Instance = 0x795f0b24c0 , pGet32Static = 0x795f0b1f80 , pGet64Instance = 0x795f0b2580 , pGet64Static = 0x795f0b2040 , pGetObjInstance = 0x795f0b2640 , pGetObjStatic = 0x795f0b2100 , pAputObjectWithNullAndBoundCheck = 0x795f0b1870 , pAputObjectWithBoundCheck = 0x795f0b1880 , pAputObject = 0x795f0b18a0 , pHandleFillArrayData = https://www.it610.com/article/0x795f0b1980 , pJniMethodStart = 0x795f523c2c , pJniMethodStartSynchronized = 0x795f523db0 , pJniMethodEnd = 0x795f523dec , pJniMethodEndSynchronized = 0x795f5240d4 , pJniMethodEndWithReference = 0x795f5242f4 , pJniMethodEndWithReferenceSynchronized = 0x795f5243a4 , pQuickGenericJniTrampoline = 0x795f0ba500 , pLockObject = 0x795f0b1430 , pUnlockObject = 0x795f0b1610 , pCmpgDouble = 0x0, pCmpgFloat = 0x0, pCmplDouble = 0x0, pCmplFloat = 0x0, pCos = 0x7961075168 , pSin = 0x7961079e78 , pAcos = 0x796106b978 , pAsin = 0x796106c128 , pAtan = 0x7961074400 , pAtan2 = 0x796106c55c , pCbrt = 0x7961074844 , pCosh = 0x796106cbb4 , pExp = 0x796106cd98 , pExpm1 = 0x7961077cdc , pHypot = 0x796106d688 , pLog = 0x7961071960 , pLog10 = 0x79610712c0 , pNextAfter = 0x79610792c8 , pSinh = 0x79610730f0 , pTan = 0x796107a72c , pTanh = 0x796107aefc , pFmod = 0x796106d204 , pL2d = 0x0, pFmodf = 0x796106d4f4 , pL2f = 0x0, pD2iz = 0x0, pF2iz = 0x0, pIdivmod = 0x0, pD2l = 0x0, pF2l = 0x0, pLdiv = 0x0, pLmod = 0x0, pLmul = 0x0, pShlLong = 0x0, pShrLong = 0x0, pUshrLong = 0x0, pIndexOf = 0x795f0ba930 , pStringCompareTo = 0x795f0baa00 , pMemcpy = 0x79622208c8 , pQuickImtConflictTrampoline = 0x795f0ba290 , pQuickResolutionTrampoline = 0x795f0ba3c0 , pQuickToInterpreterBridge = 0x795f0ba650 , pInvokeDirectTrampolineWithAccessCheck = 0x795f0b0a70 , pInvokeInterfaceTrampolineWithAccessCheck = 0x795f0b0870 , pInvokeStaticTrampolineWithAccessCheck = 0x795f0b0970 , pInvokeSuperTrampolineWithAccessCheck = 0x795f0b0b70 , pInvokeVirtualTrampolineWithAccessCheck = 0x795f0b0c70 , pTestSuspend = 0x795f0ba090 , pDeliverException = 0x795f0b0660 , pThrowArrayBounds = 0x795f0b0760 , pThrowDivZero = 0x795f0b0710 , pThrowNoSuchMethod = 0x795f0b0810 , pThrowNullPointer = 0x795f0b06c0 , pThrowStackOverflow = 0x795f0b07c0 , pDeoptimize = 0x795f0ba8d0 , pA64Load = 0x795f4253b8 , pA64Store = 0x795f4253b8 , pNewEmptyString = 0x70e8c810, pNewStringFromBytes_B = 0x70e8c848, pNewStringFromBytes_BI = 0x70e8c880, pNewStringFromBytes_BII = 0x70e8c8b8, pNewStringFromBytes_BIII = 0x70e8c8f0, pNewStringFromBytes_BIIString = 0x70e8c928, pNewStringFromBytes_BString = 0x70e8c998, pNewStringFromBytes_BIICharset = 0x70e8c960, pNewStringFromBytes_BCharset = 0x70e8c9d0, pNewStringFromChars_C = 0x70e8ca40, pNewStringFromChars_CII = 0x70e8ca78, pNewStringFromChars_IIC = 0x70e8ca08, pNewStringFromCodePoints = 0x70e8cab0, pNewStringFromString = 0x70e8cae8, pNewStringFromStringBuffer = 0x70e8cb20, pNewStringFromStringBuilder = 0x70e8cb58, pReadBarrierJni = 0x795f523c28 *, art::Thread*)>, pReadBarrierMark = 0x795f5235c4 , pReadBarrierSlow = 0x795f5236e8 , pReadBarrierForRootSlow = 0x795f5236f0 *)> }, thread_local_objects = 0, thread_local_start = 0x0, thread_local_pos = 0x0, thread_local_end = 0x0, mterp_current_ibase = 0x795f0a0280 , mterp_default_ibase = 0x795f0a0280 , mterp_alt_ibase = 0x795f0a8280 , rosalloc_runs = {0x795f5fcf08 , 0x13754000, 0x149d7000, 0x14e1f000, 0x14595000, 0x1457c000, 0x13e65000, 0x14186000, 0x14828000, 0x12f71000, 0x13f18000, 0x795f5fcf08 , 0x795f5fcf08 , 0x13c29000, 0x795f5fcf08 , 0x795f5fcf08 }, thread_local_alloc_stack_top = 0x795a52b8b8, thread_local_alloc_stack_end = 0x795a52ba00, held_mutexes = {0x0 }, nested_signal_state = 0x795f67f300, flip_function = 0x0, method_verifier = 0x0, thread_local_mark_stack = 0x0 }, wait_mutex_ = 0x795f719080, wait_cond_ = 0x795f6fa960, wait_monitor_ = 0x0, interrupted_ = false, debug_disallow_read_barrier_ = 0 '\000' } ...... //此处省略N个Thread对象的打印 Cannot access memory at address 0xa1 (gdb)

从上面打印出来的N个Thread对象的内容来看,我们很容易找到处于kRunnable状态的线程,它的pid为3093,因为它的state = 67,也就是kRunnable.
enum ThreadState { //Thread.StateJDWP state kTerminated = 66,// TERMINATEDTS_ZOMBIEThread.run has returned, but Thread* still around kRunnable,// RUNNABLETS_RUNNINGrunnable kTimedWaiting,// TIMED_WAITINGTS_WAITin Object.wait() with a timeout kSleeping,// TIMED_WAITINGTS_SLEEPINGin Thread.sleep() kBlocked,// BLOCKEDTS_MONITORblocked on a monitor kWaiting,// WAITINGTS_WAITin Object.wait() kWaitingForGcToComplete,// WAITINGTS_WAITblocked waiting for GC kWaitingForCheckPointsToRun,// WAITINGTS_WAITGC waiting for checkpoints to run kWaitingPerformingGc,// WAITINGTS_WAITperforming GC kWaitingForDebuggerSend,// WAITINGTS_WAITblocked waiting for events to be sent kWaitingForDebuggerToAttach,// WAITINGTS_WAITblocked waiting for debugger to attach kWaitingInMainDebuggerLoop,// WAITINGTS_WAITblocking/reading/processing debugger events kWaitingForDebuggerSuspension,// WAITINGTS_WAITwaiting for debugger suspend all kWaitingForJniOnLoad,// WAITINGTS_WAITwaiting for execution of dlopen and JNI on load code kWaitingForSignalCatcherOutput,// WAITINGTS_WAITwaiting for signal catcher IO to complete kWaitingInMainSignalCatcherLoop,// WAITINGTS_WAITblocking/reading/processing signals kWaitingForDeoptimization,// WAITINGTS_WAITwaiting for deoptimization suspend all kWaitingForMethodTracingStart,// WAITINGTS_WAITwaiting for method tracing to start kWaitingForVisitObjects,// WAITINGTS_WAITwaiting for visiting objects kWaitingForGetObjectsAllocated,// WAITINGTS_WAITwaiting for getting the number of allocated objects kWaitingWeakGcRootRead,// WAITINGTS_WAITwaiting on the GC to read a weak root kWaitingForGcThreadFlip,// WAITINGTS_WAITwaiting on the GC thread flip (CC collector) to finish kStarting,// NEWTS_WAITnative thread started, not yet ready to run managed code kNative,// RUNNABLETS_RUNNINGrunning in a JNI native method kSuspended,// RUNNABLETS_RUNNINGsuspended by GC or debugger }; tls32_ = { state_and_flags = { as_struct = { flags = 5, state = 67 }, as_atomic_int = { > = { > = { > = { __a_ = 4390917 }, }, }, }, as_int = 4390917 }, suspend_count = 1, debug_suspend_count = 0, thin_lock_thread_id = 20, tid = 3093, daemon = 0, throwing_OutOfMemoryError = 0, no_thread_suspension = 0, thread_exit_check_count = 0, handling_signal_ = 0, suspended_at_suspend_check = 0, ready_for_debug_invoke = 0, debug_method_entry_ = 0, is_gc_marking = 0, weak_ref_access_enabled = 1, disable_thread_flip_count = 0 }

(gdb) t 176 [Switching to thread 176 (LWP 3093)] #0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 41bionic/libc/arch-arm64/bionic/syscall.S: 没有那个文件或目录. (gdb) bt #0syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41 #10x000000796226eb84 in __futex (op=, value=https://www.it610.com/article/, timeout=0x0, bitset=-1, ftx=) at bionic/libc/private/bionic_futex.h:48 #2__futex_wait_ex (value=https://www.it610.com/article/, ftx=, shared=, use_realtime_clock=, abs_timeout=) at bionic/libc/private/bionic_futex.h:70 #3__pthread_normal_mutex_lock (abs_timeout_or_null=, mutex=, shared=, use_realtime_clock=) at bionic/libc/bionic/pthread_mutex.cpp:327 #4__pthread_mutex_lock_with_timeout (mutex=, use_realtime_clock=, abs_timeout_or_null=) at bionic/libc/bionic/pthread_mutex.cpp:430 #50x0000007961ad0354 in android::android_content_AssetManager_applyStyle (env=0x795127c740, themeToken=520810520368, defStyleAttr=, defStyleRes=16974731, xmlParserToken=1982366608, attrs=0x795f0bdcb4 , outValues=0x795f197c50 , outIndices=0x7942b9dee0, clazz=) at frameworks/base/core/jni/android_util_AssetManager.cpp:1430 #60x00000000748f3ecc in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)

从这个堆栈来看它已经进入了JNI函数,按理来说它应该是kNative状态才对,但是这里却为kRunnable状态,有点奇怪,查看进入Jni函数的代码:
extern uint32_t JniMethodStart(Thread* self) { JNIEnvExt* env = self->GetJniEnv(); DCHECK(env != nullptr); uint32_t saved_local_ref_cookie = env->local_ref_cookie; env->local_ref_cookie = env->locals.GetSegmentState(); ArtMethod* native_method = *self->GetManagedStack()->GetTopQuickFrame(); if (!native_method->IsFastNative()) { //如果这个Jni方法不是fast native方法,就改为suspend状态 // When not fast JNI we transition out of runnable. self->TransitionFromRunnableToSuspended(kNative); } return saved_local_ref_cookie; }

所以如果这个Native方法是fast native方法的话,那么它的状态就还是kRunnable,我们看android_content_AssetManager_applyStyle这个Jni函数注册的地方:
{ "applyStyle","!(JIIJ[I[I[I)Z",(void*) android_content_AssetManager_applyStyle }
注册的时候有加!号,所以这个函数的确是一个fast native方法,所以它的状态就是kRunnable,fast native方法应该是指能够很快返回的jni方法,所以可以不用转换状态,本来是一种优化措施,但是从上面的堆栈来看,这个fast native方法却在等锁,一旦等锁的话,就可能不是那么快执行完了,所以觉得这里把它置为fast native不是那么合适,而应该去掉前面的 !号,这样就可以在进入JNI之后变为kNative状态,ART也不会卡死.

    推荐阅读