TensorFlow|TensorFlow 2.2.0-rc0,这次更新让人惊奇!
文章图片
AI编辑:我是小将
刚刚谷歌在TensorFlow 开发者峰会上发布了 TensorFlow 2.2 版,2.2版本有很多地方的更新,我觉得可能两点会让大家欣喜若狂:
1. 增加同步BN层
【TensorFlow|TensorFlow 2.2.0-rc0,这次更新让人惊奇!】同步BN层tf.keras.layers.experimental.SyncBatchNormalization,这是分布式训练的好帮手,接口和原来的BatchNormalization层类似:
tf.keras.layers.experimental.SyncBatchNormalization(
axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,
beta_initializer='zeros', gamma_initializer='ones',
moving_mean_initializer='zeros', moving_variance_initializer='ones',
beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
gamma_constraint=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99,
trainable=True, adjustment=None, name=None, **kwargs
)
用法如下:
strategy = tf.distribute.MirroredStrategy()with strategy.scope():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(16))
2. Model.fit可以自定义训练和测试逻辑
Model.fit支持Model.train_step接口改写,这样我们可以实现训练的自定义逻辑,具体请看:
def train_step(self, data):
"""The logic for one training step.
This method can be overridden to support custom training logic.
This method is called by `Model._make_train_function`.
This method should contain the mathemetical logic for one step of training.
This typically includes the forward pass, loss calculation, backpropagation,
and metric updates.
Configuration details for *how* this logic is run (e.g. `tf.function` and
`tf.distribute.Strategy` settings), should be left to
`Model._make_train_function`, which can also be overridden.
Arguments:
data: A nested structure of `Tensor`s.
Returns:
A `dict` containing values that will be passed to
`tf.keras.callbacks.CallbackList.on_train_batch_end`. Typically, the
values of the `Model`'s metrics are returned. Example:
`{'loss': 0.2, 'accuracy': 0.7}`.
"""
# These are the only transformations `Model.fit` applies to user-input
# data when a `tf.data.Dataset` is provided. These utilities will be exposed
# publicly.
data = https://www.it610.com/article/data_adapter.expand_1d(data)
x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data)with backprop.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(
y, y_pred, sample_weight, regularization_losses=self.losses)
# For custom training steps, users can just write:
#trainable_variables = self.trainable_variables
#gradients = tape.gradient(loss, trainable_variables)
#self.optimizer.apply_gradients(zip(gradients, trainable_variables))
# The _minimize call does a few extra steps unnecessary in most cases,
# such as loss scaling and gradient clipping.
_minimize(tape, self.optimizer, loss, self.trainable_variables)self.compiled_metrics.update_state(y, y_pred, sample_weight)
return {m.name: m.result() for m in self.metrics}
其实这样带来的一个好处就是,我们就可以更加灵活使用Model.fit来训练自己的模型,当然Model还有Model.test_step和Model.predict_step来修改测试和预测逻辑,我觉得这个绝对对TFer有吸引力。
主要更新和改进如下
- Replaced the scalar type for string tensors from
std::string
totensorflow::tstring
which is now ABI stable.
- A new Profiler for TF 2 for CPU/GPU/TPU. It offers both device and host performance analysis, including input pipeline and TF Ops. Optimization advisory is provided whenever possible. Please see this tutorial for usage guidelines.
- Export C++ functions to Python using
pybind11
as opposed toSWIG
as a part of our deprecation of swig efforts.
tf.distribute
:
-
- Update NVIDIA
NCCL
to2.5.7-1
for better performance and performance tuning. Please see nccl developer guide for more information on this.
- Support gradient
allreduce
infloat16
. See this example usage.
- Experimental support of all reduce gradient packing to allow overlapping gradient aggregation with backward path computation.
- Support added for global sync
BatchNormalization
by using the newly addedtf.keras.layers.experimental.SyncBatchNormalization
layer. This layer will syncBatchNormalization
statistics every step across all replicas taking part in sync training.
- Performance improvements for GPU multi-worker distributed training using
tf.distribute.experimental.MultiWorkerMirroredStrategy
- Update NVIDIA
tf.keras
:
-
- You can now use custom training logic with
Model.fit
by overridingModel.train_step
.
- Easily write state-of-the-art training loops without worrying about all of the features
Model.fit
handles for you (distribution strategies, callbacks, data formats, looping logic, etc)
- See the default
Model.train_step
for an example of what this function should look like
- Same applies for validation and inference via
Model.test_step
andModel.predict_step
Model.fit
major improvements:
- The SavedModel format now supports all Keras built-in layers (including metrics, preprocessing layers, and stateful RNN layers)
- You can now use custom training logic with
tf.lite
:
-
- Enable TFLite experimental new converter by default.
- Enable TFLite experimental new converter by default.
- XLA
-
- XLA now builds and works on windows. All prebuilt packages come with XLA available.
- XLA can be enabled for a
tf.function
with “compile or throw exception” semantics on CPU and GPU.
- XLA now builds and works on windows. All prebuilt packages come with XLA available.
tf.keras
:
-
- In
tf.keras.applications
the name of the "top" layer has been standardized to "predictions". This is only a problem if your code relies on the exact name of the layer.
- Huber loss function has been updated to be consistent with other Keras losses. It now computes mean over the last axis of per-sample losses before applying the reduction function.
- In
- AutoGraph no longer converts functions passed to
tf.py_function
,tf.py_func
andtf.numpy_function
.
- Deprecating
XLA_CPU
andXLA_GPU
devices with this release.
- Increasing the minimum bazel version to build TF to 1.2.1 to use Bazel's
cc_experimental_shared_library
.
- MacOS binaries are not available on pypi at tensorflow-cpu project, but they are identical to the binaries in tensorflow project, since MacOS has no GPU.
tf.data
:
-
- Removed
autotune_algorithm
from experimental optimization options.
- Removed
- TF Core:
-
tf.constant
always creates CPU tensors irrespective of the current device context.
- Eager TensorHandles maintain a list of mirrors for any copies to local or remote devices. This avoids any redundant copies due to op execution.
- For
tf.Tensor
&tf.Variable
,.experimental_ref()
is no longer experimental and is available as simply.ref()
.
- Support matrix inverse and solves in
pfor/vectorized_map
.
- Set as much partial shape as we can infer statically within the gradient impl of the gather op.
- Gradient of
tf.while_loop
emitsStatelessWhile
op ifcond
and body functions are stateless. This allows multiple gradients while ops to run in parallel under distribution strategy.
- Speed up
GradientTape
in eager mode by auto-generating list of op inputs/outputs which are unused and hence not cached for gradient functions.
- Support
back_prop=False
inwhile_v2
but mark it as deprecated.
- Improve error message when attempting to use
None
in data-dependent control flow.
- Add
RaggedTensor.numpy()
.
- Update
RaggedTensor.__getitem__
to preserve uniform dimensions & allow indexing into uniform dimensions.
- Update
tf.expand_dims
to always insert the new dimension as a non-ragged dimension.
- Update
tf.embedding_lookup
to usepartition_strategy
andmax_norm
whenids
is ragged.
- Allow
batch_dims==rank(indices)
intf.gather
.
- Add support for bfloat16 in
tf.print
.
tf.distribute
:
-
- Support
embedding_column
with variable-length input features forMultiWorkerMirroredStrategy
.
- Support
tf.keras
:
-
- Added
all_reduce_sum_gradients
argument totf.keras.optimizer.Optimizer.apply_gradients
. This allows custom gradient aggregation and processing aggregated gradients in custom training loop.
- Allow
pathlib.Path
paths for loading models via Keras API.
- Added
tf.function
/AutoGraph:
-
- AutoGraph is now available in
ReplicaContext.merge_call
,Strategy.extended.update
andStrategy.extended.update_non_slot
.
- Experimental support for shape invariants has been enabled in
tf.function
. See the API docs fortf.autograph.experimental.set_loop_options
for additonal info.
- AutoGraph error messages now exclude frames corresponding to APIs internal to AutoGraph.
- Improve shape inference for
tf.function
input arguments to unlock more Grappler optimizations in TensorFlow 2.x.
- Improve automatic control dependency management of resources by allowing resource reads to occur in parallel and synchronizing only on writes.
- Fix execution order of multiple stateful calls to
experimental_run_v2
intf.function
.
- You can now iterate over
RaggedTensors
using a for loop insidetf.function
.
- AutoGraph is now available in
tf.lite
:
-
- Migrated the
tf.lite
C inference API out of experimental into lite/c.
- Add an option to disallow
NNAPI
CPU / partial acceleration on Android 10
- TFLite Android AARs now include the C headers and APIs are required to use TFLite from native code.
- Refactors the delegate and delegate kernel sources to allow usage in the linter.
- Limit delegated ops to actually supported ones if a device name is specified or
NNAPI
CPU Fallback is disabled.
- TFLite now supports
tf.math.reciprocal1
op by lowering totf.div op
.
- TFLite's unpack op now supports boolean tensor inputs.
- Microcontroller and embedded code moved from experimental to main TensorFlow Lite folder
- Check for large TFLite tensors.
- Fix GPU delegate crash with C++17.
- Add 5D support to TFLite
strided_slice
.
- Fix error in delegation of
DEPTH_TO_SPACE
toNNAPI
causing op not to be accelerated.
- Fix segmentation fault when running a model with LSTM nodes using
NNAPI
Delegate
- Fix
NNAPI
delegate failure when an operand for Maximum/Minimum operation is a scalar.
- Fix
NNAPI
delegate failure when Axis input for reduce operation is a scalar.
- Expose option to limit the number of partitions that will be delegated to
NNAPI
.
- If a target accelerator is specified, use its feature level to determine operations to delegate instead of SDK version.
- Migrated the
tf.random
:
-
- Add a fast path for default
random_uniform
random_seed
documentation improvement.
RandomBinomial
broadcasts and appends the sample shape to the left rather than the right.
- Various random number generation improvements:
- Added
tf.random.stateless_binomial
,tf.random.stateless_gamma
,tf.random.stateless_poisson
tf.random.stateless_uniform
now supports unbounded sampling ofint
types.
- Add a fast path for default
- Math and Linear Algebra:
-
- Add
tf.linalg.LinearOperatorTridiag
.
- Add
LinearOperatorBlockLowerTriangular
- Add broadcasting support to tf.linalg.triangular_solve#26204, tf.math.invert_permutation.
- Add
tf.math.sobol_sample
op.
- Add
tf.math.xlog1py
.
- Add
tf.math.special.{dawsn,expi,fresnel_cos,fresnel_sin,spence}
.
- Add a Modified Discrete Cosine Transform (MDCT) and its inverse to
tf.signal
.
- Add
- TPU Enhancements:
-
- Refactor
TpuClusterResolver
to move shared logic to a separate pip package.
- Support configuring TPU software version from cloud tpu client.
- Allowed TPU embedding weight decay factor to be multiplied by learning rate.
- Refactor
- XLA Support:
-
- Add standalone XLA AOT runtime target + relevant .cc sources to pip package.
- Add check for memory alignment to MemoryAllocation::MemoryAllocation() on 32-bit ARM. This ensures a deterministic early exit instead of a hard to debug bus error later.
saved_model_cli aot_compile_cpu
allows you to compile saved models to XLA header+object files and include them in your C++ programs.
- Enable
Igamma
,Igammac
for XLA.
- XLA reduction emitter is deterministic when the environment variable
TF_DETERMINISTIC_OPS
is set.
- Add standalone XLA AOT runtime target + relevant .cc sources to pip package.
- Tracing and Debugging:
-
- Add source, destination name to
_send
traceme to allow easier debugging.
- Add traceme event to
fastpathexecute
.
- Add source, destination name to
- Other:
-
- Fix an issue with AUC.reset_states for multi-label AUC #35852
- Fix the TF upgrade script to not delete files when there is a parsing error and the output mode is
in-place
.
- Move
tensorflow/core:framework/*_pyclif
rules totensorflow/core/framework:*_pyclif
.
- Fix an issue with AUC.reset_states for multi-label AUC #35852
文章图片
文章图片
推荐阅读
- 2022面试冲刺!BATJ一线互联网大厂Java岗面试题库没,这次大厂offer稳了!
- python3.7.3安装opencv和tensorflow
- 环境配置|RTX3060+win10+CUDA11.2+cudnn8.2.0+tensorflow-gpu2.4.1 ——个人配置经验
- 菜鸟 Linux 系统的 NVIDIA驱动 + cuda + cudnn + anaconda + tensorflow 安装
- 安装|深度学习环境配置Win10+CUDA+cuDNN+Tensorflow2.0+PyTorch1.2+Python3.7.6
- 一起学习华为自研数据库GaussDB(DWS)这次高斯不是数学家
- 这次我会结合我前面所发的搭建DHCP服务,dns服务和HTTP服务,中核做一个实验有什么不懂得,可以评论区交流一下,也可以私信我,葱鸭!
- 神经网络|贝叶斯优化神经网络参数_贝叶斯超参数优化(神经网络,TensorFlow,相预测示例)
- 无人与自动驾驶应用|第五次创业,图森未来创始人陈默这次要造车
- 好久没有更新了,这次给大家带来的是linux中的用户配额和组配额。还有一个重要的事情给我投投票呗嘻嘻!