2021|谷歌董事会主席John Hennessy：AI技术发展放缓，我们正处于半导体产业寒冬 | 钛媒体T-EDGE( 七 ) 2021|T-EDGE|全球创新大会|半导体

熟悉大卫·帕特森和我合著的书籍的人都知道，在计算机设计上，我们信奉遵循工程学方法论的量化分析。那么是什么让这些特定领域架构更高效呢？
First of all, they use a simple model for parallelism that works in a specific domain and that means they can have less control hardware. So for example we switch from multiple instruction multiple data models in a multicore to a single instruction data model. That means we dramatically improve the energy associated with fetching instructions because now we have to fetch one instruction rather than any instructions.
首先，他们使用一个简单的并行模型，在特定领域工作，这意味着它们可以拥有更少的控制硬件。例如，我们从多核中的多指令多数据模型切换到单指令数据模型。这意味着我们显着提高了与获取指令相关的效率，因为现在我们必须获取一条指令而不是任何指令。
We move to VLIW versus speculative out of order mechanisms, so things that rely on being able to analyze the code better know about dependences and therefore be able to create and structure parallelism at compile time rather than having to do with dynamically runtime.
我们来看看VLIW和推测性乱序机制的对比。现在需要更好处理代码的也能够得知其依附性，因此能够在编译时创建和构建并行性，而不必进行动态运行。
Second we make more effective use of memory bandwidth. We go to user controlled memory system rather than caches. Caches are great except when you have large amounts of data does streaming through them. They're extremely inefficient that's not what they meant to do. They are meant to work when the program does repetitive things but it is somewhat in predictable fashion. Here we have repetitive things in a very predictable fashion but very large amounts of data.
其次，我们更有效地利用内存带宽。我们使用用户控制的内存系统而不是缓存。缓存是好东西，但是如果要处理大量数据的话就不会那么好使了，效率极低，缓存不是用来干这事的。缓存旨在在程序执行具有重复性、可预测的操作时发挥作用。这里执行的运算虽然重复性高且可预测，但是数据量是在太大。
So we go to an alternative using prefetching and other techniques to move data into the memory once we get it into the memory within the processor within the domain specific processor. We can then make heavy use of the data before moving it back to the main memory.
那我们就用个别的方式。在我们把数据导入特定领域处理器上的内存之后，我们采用预提取和其他技术手段将数据导入内存中。接着，在我们需要把数据导去主存之前，我们就可以重度使用这些数据。
We eliminate unneeded accuracy. Turns out we need relatively much less accuracy then we do for general purpose computing here. In the case of integer, we need 8-16 bit integers. In the case of floating point, we need 16 to 32 bit not 64-bit large-scale floating point numbers. So we get efficiency thereby making data items smaller and by making the arithmetic operations more efficient.
我们消除了不需要的准确性。事实证明，我们需要的准确度比用于通用计算的准确度要低得多。我们只需要8-16位整数，要16到32位而不是64位的大规模浮点数。因此，我们通过使数据项变得更小而提高效率。
The key is that the domain specific programming model matches the application to the processor. These are not general purpose processor. You are not gonna take a piece of C code and throw it on one of these processors and be happy with the results. They're designed to match a particular class of applications and that structure is determined by that interface in the domain specific language and the underlining architecture.
关键在于特定领域的编程模型将应用程序与处理器匹配。这些不是通用处理器。你不会把一段 C 代码扔到其中一个处理器上，然后对结果感到满意。它们旨在匹配特定类别的应用程序，并且该结构由领域特定语言中的接口和架构决定。

2021|谷歌董事会主席John Hennessy：AI技术发展放缓，我们正处于半导体产业寒冬 | 钛媒体T-EDGE( 七 )

推荐阅读

你们认为诗仙李白最好的一首诗是哪一首？

一天喝多少牛奶合适？

青芒的热量高吗

花胶有什么营养价值

土大黄的功效与作用土大黄的副作用

美国冲突升级!武装民兵冲进议会大厦,议员吓的穿防弹衣上班,州长命令被废,大乱了吗？

最新热门的12个最佳博客平台合集（以及如何选择一个）

买笔记本电脑,处理器是选择i5还是i7？

妒组词语妒有哪些组词呢

熊童子简笔画

Photoshop制作沧桑颓废的城市海报

手机停机怎么连接网络，手机停机怎么连接网络电视

176升冰箱购买指南及故障维修解决方案

早晨八点半

升级换代，让您的打印更高效 epsonL3153变成2710

海带有味道了还能吃吗

照门是什么意思照门的含义

获嘉天气30天获嘉天气预报

核酸检测假阳性怎么处理

分享一首小诗（相信）