【TVM手册】四、AutoTVM

  • 使用AutoTVM基于Android-Arm-CPU调优CNN网络
    • 本文我们跟大家介绍如何使用TVM中提供的AutoTVM来自适应的根据目标平台进行调优。
  • [TOPI] Using x86 schedules for ARM conv2d
    • Schedule Transferability between Intel and ARM CPU targets
    • I used op strategy to enable both Intel depthwise conv and Intel NCWHc conv2d on ARM device. ARM winograd schedule is very powerful, and I let op strategy and AutoTVM choose it whenever it is faster. The networks have been tuned using AutoTVM.
      【TVM手册】四、AutoTVM
      文章图片
      【TVM手册】四、AutoTVM
      文章图片
NCHW and NHWC
  • [TOPI][AutoTVM] NHWC conv2d templates for ARM #3859
    • we are enabling NHWC conv2d templates for ARM as a nearly final solution. The benefits include:
    • Enable NHWC schedule directly. Previously, we need to transpose between NCHW and NHWC.
    • AutoTVM now can tune NHWC directly. Previously, we need to build a NCHW network to tune.
    • Potential performance advantage in NHWC which known to community.
  • [RFC] Frontend layout transformation #2519
    • Currently, frontend models has two different input layout: NHWC and NCHW. Tensorflow and TFLite are NHWC layout, while like CoreML frontend is NCHW layout.
    • For converting model with NHWC input layout, currently there is no unified way. Some framework convert NHWC into NCHW input layout. For example, Intel OpenVINO, Tensorflow-CoreML converters (https://github.com/tf-coreml/tf-coreml). This has some advantages, for example on GPU. And for TVM, we support NCHW very well, for example:
  • [TFLite] Convert TFLite NCHW to NHWC #3141
    • As discussed in above RFC, we agree to make TFLite frontend input layout from NCHW to NHWC. This PR does this.
    • Affected:
      We could not use Auto TVM to tuning on ARM CPU. Because ARM CPU schedule only implement NCHW currently. We should add SpatialPack + NHWC for conv2d / depthwise convolution on ARM CPU.
  • [TOPI][AlterOpLayout][ARM] Enabling NHWC to NCHW layout transformation. #4249
    • Enables NHWC to NCHW conversion for ARM. Follows AlterOpLayout principle to propagate the layout as deep as possible. For my toy network with 5 convs, it results in only 2 layout transforms - 1 at the beginning, and one at the end.
    • Layout conversion pass
      • Right now we use AlterOpLayout pass that automatically decides which layout based on the target hw back ends.
      • 【【TVM手册】四、AutoTVM】Given that we also want to offer the flexibility to pragmatically add pass pipelines, and there has been increasing need for converting between layouts(e.g. NHWC to NCHW), we might want to also introduce a Layout conversion pass that a user can specify. This would provide additional optional flexibility that some of our current frontends need Example usage:

    推荐阅读