2021|谷歌董事会主席John Hennessy:AI技术发展放缓,我们正处于半导体产业寒冬 | 钛媒体T-EDGE( 七 )
熟悉大卫·帕特森和我合著的书籍的人都知道,在计算机设计上,我们信奉遵循工程学方法论的量化分析 。那么是什么让这些特定领域架构更高效呢?
First of all, they use a simple model for parallelism that works in a specific domain and that means they can have less control hardware. So for example we switch from multiple instruction multiple data models in a multicore to a single instruction data model. That means we dramatically improve the energy associated with fetching instructions because now we have to fetch one instruction rather than any instructions.
首先,他们使用一个简单的并行模型,在特定领域工作,这意味着它们可以拥有更少的控制硬件 。例如,我们从多核中的多指令多数据模型切换到单指令数据模型 。这意味着我们显着提高了与获取指令相关的效率,因为现在我们必须获取一条指令而不是任何指令 。
We move to VLIW versus speculative out of order mechanisms, so things that rely on being able to analyze the code better know about dependences and therefore be able to create and structure parallelism at compile time rather than having to do with dynamically runtime.
我们来看看VLIW和推测性乱序机制的对比 。现在需要更好处理代码的也能够得知其依附性,因此能够在编译时创建和构建并行性,而不必进行动态运行 。
Second we make more effective use of memory bandwidth. We go to user controlled memory system rather than caches. Caches are great except when you have large amounts of data does streaming through them. They're extremely inefficient that's not what they meant to do. They are meant to work when the program does repetitive things but it is somewhat in predictable fashion. Here we have repetitive things in a very predictable fashion but very large amounts of data.
其次,我们更有效地利用内存带宽 。我们使用用户控制的内存系统而不是缓存 。缓存是好东西,但是如果要处理大量数据的话就不会那么好使了,效率极低,缓存不是用来干这事的 。缓存旨在在程序执行具有重复性、可预测的操作时发挥作用 。这里执行的运算虽然重复性高且可预测,但是数据量是在太大 。
So we go to an alternative using prefetching and other techniques to move data into the memory once we get it into the memory within the processor within the domain specific processor. We can then make heavy use of the data before moving it back to the main memory.
那我们就用个别的方式 。在我们把数据导入特定领域处理器上的内存之后,我们采用预提取和其他技术手段将数据导入内存中 。接着,在我们需要把数据导去主存之前,我们就可以重度使用这些数据 。
We eliminate unneeded accuracy. Turns out we need relatively much less accuracy then we do for general purpose computing here. In the case of integer, we need 8-16 bit integers. In the case of floating point, we need 16 to 32 bit not 64-bit large-scale floating point numbers. So we get efficiency thereby making data items smaller and by making the arithmetic operations more efficient.
我们消除了不需要的准确性 。事实证明,我们需要的准确度比用于通用计算的准确度要低得多 。我们只需要8-16位整数,要16到32位而不是64位的大规模浮点数 。因此,我们通过使数据项变得更小而提高效率 。
The key is that the domain specific programming model matches the application to the processor. These are not general purpose processor. You are not gonna take a piece of C code and throw it on one of these processors and be happy with the results. They're designed to match a particular class of applications and that structure is determined by that interface in the domain specific language and the underlining architecture.
关键在于特定领域的编程模型将应用程序与处理器匹配 。这些不是通用处理器 。你不会把一段 C 代码扔到其中一个处理器上,然后对结果感到满意 。它们旨在匹配特定类别的应用程序,并且该结构由领域特定语言中的接口和架构决定 。
推荐阅读
- 荣耀|2021年第4季度手机销量出炉,荣耀首次位列第二,销量仅次于苹果
- 四季度|2021四季度苹果手机又赢了,荣VO米到底谁能第一个超越苹果?
- 英雄联盟|精彩!盘点2021年国产厂商推出的联名定制款手机(下)
- 首席执行官|持续关注健康领域 苹果新董事会成员变动显露端倪
- 开箱|诺基亚手机开箱,2021年了还在用功能机,超长待机便宜又好用
- 旗舰|出乎意外?苹果、小米、三星、谷歌旗舰续航对比,第一名实力反超
- oem|荣耀2021年中国智能手机出货量排名第二
- 爱奇艺|爱奇艺谷歌版来了!纯净无广告,不开会员也能流畅追剧
- oppo|精彩!盘点2021年国产厂商推出的联名定制款手机(上)
- 国际数据公司|2021年中国平板电脑出货量约2846万 创近7年最高增幅