怎么理解生成模型VAE？( 五 ) _经验知识

去噪自编码器
Fashion MNIST
在第一个练习中。在Fashion MNIST数据集添加一些随机噪声（椒盐噪声）。然后使用去噪自编码器尝试移除噪声。首先进行预处理：下载数据。调整数据大小。然后添加噪声。
## Download the data(x_train, y_train), (x_test, y_test) = datasets.fashion_mnist.load_data()## normalize and reshapex_train = x_train/255.x_test = x_test/255.x_train = x_train.reshape(-1, 28, 28, 1)x_test = x_test.reshape(-1, 28, 28, 1)# Lets add sample noise - Salt and Peppernoise = augmenters.SaltAndPepper(0.1)seq_object = augmenters.Sequential([noise])train_x_n = seq_object.augment_images(x_train * 255) / 255val_x_n = seq_object.augment_images(x_test * 255) / 255
接着。给自编码器网络创建结构。这包括多层卷积神经网络、编码器网络的最大池化层和解码器网络上的升级层。
# input layerinput_layer =Input(shape=(28, 28, 1))# encodingarchitectureencoded_layer1= Conv2D(64, (3, 3), activation='relu', padding='same')(input_layer)encoded_layer1= MaxPool2D( (2, 2), padding='same')(encoded_layer1)encoded_layer2= Conv2D(32, (3, 3), activation='relu', padding='same')(encoded_layer1)encoded_layer2= MaxPool2D( (2, 2), padding='same')(encoded_layer2)encoded_layer3= Conv2D(16, (3, 3), activation='relu', padding='same')(encoded_layer2)latent_view = MaxPool2D( (2, 2),padding='same')(encoded_layer3)# decodingarchitecturedecoded_layer1= Conv2D(16, (3, 3), activation='relu', padding='same')(latent_view)decoded_layer1= UpSampling2D((2, 2))(decoded_layer1)decoded_layer2= Conv2D(32, (3, 3), activation='relu', padding='same')(decoded_layer1)decoded_layer2= UpSampling2D((2, 2))(decoded_layer2)decoded_layer3= Conv2D(64, (3, 3), activation='relu')(decoded_layer2)decoded_layer3= UpSampling2D((2, 2))(decoded_layer3)output_layer = Conv2D(1, (3, 3), padding='same',activation='sigmoid')(decoded_layer3)# compile themodelmodel =Model(input_layer, output_layer)model.compile(optimizer='adam',loss='mse')# run themodelearly_stopping= EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=5,mode='auto')history =model.fit(train_x_n, x_train, epochs=20, batch_size=2048,validation_data=https://www.wangchuang8.com/(val_x_n, x_test), callbacks=[early_stopping])
所输入的图像。添加噪声的图像。和输出图像。

文章插图
【怎么理解生成模型VAE？】从时尚MNIST输入的图像。

文章插图
添加椒盐噪声的输入图像。

文章插图
从去噪网络输出的图像。
从这里可以看到。我们成功从噪声图像去除相当的噪声。但同时也失去了一定量的服装细节的分辨率。这是使用稳健网络所需付出的代价之一。可以对该网络进行调优。使最终的输出更能代表所输入的图像。
文本清理
去噪自编码器的第二个例子包括清理扫描图像的折痕和暗黑区域。这是最终获得的输入和输出图像。

文章插图
输入的有噪声文本数据图像。

文章插图
经清理的文本图像。
为此进行的数据预处理稍微复杂一些。因此就不在这里进行介绍。预处理过程和相关数据可在GitHub库里获取。网络结构如下：
input_layer= Input(shape=(258, 540, 1))#encoderencoder= Conv2D(64, (3, 3), activation='relu', padding='same')(input_layer)encoder= MaxPooling2D((2, 2), padding='same')(encoder)#decoderdecoder= Conv2D(64, (3, 3), activation='relu', padding='same')(encoder)decoder= UpSampling2D((2, 2))(decoder)output_layer= Conv2D(1, (3, 3), activation='sigmoid', padding='same')(decoder)ae =Model(input_layer, output_layer)ae.compile(loss='mse',optimizer=Adam(lr=0.001))batch_size= 16epochs= 200early_stopping= EarlyStopping(monitor='val_loss',min_delta=0,patience=5,verbose=1,mode='auto')history= ae.fit(x_train, y_train, batch_size=batch_size, epochs=epochs,validation_data=https://www.wangchuang8.com/(x_val, y_val), callbacks=[early_stopping])
变分自编码器
最后的压轴戏。是尝试从FashionMNIST数据集现有的服装中生成新图像。
其中的神经结构较为复杂。包含了一个称‘Lambda’层的采样层。
batch_size = 16latent_dim = 2 # Number of latent dimension parameters# ENCODER ARCHITECTURE: Input -> Conv2D*4 -> Flatten -> Denseinput_img = Input(shape=(28, 28, 1))x = Conv2D(32, 3,padding='same', activation='relu')(input_img)x = Conv2D(64, 3,padding='same', activation='relu',strides=(2, 2))(x)x = Conv2D(64, 3,padding='same', activation='relu')(x)x = Conv2D(64, 3,padding='same', activation='relu')(x)# need to know the shape of the network here for the decodershape_before_flattening = K.int_shape(x)x = Flatten()(x)x = Dense(32, activation='relu')(x)# Two outputs, latent mean and (log)variancez_mu = Dense(latent_dim)(x)z_log_sigma = Dense(latent_dim)(x)## SAMPLING FUNCTIONdef sampling(args):z_mu, z_log_sigma = args epsilon = K.random_normal(shape=(K.shape(z_mu)[0], latent_dim),mean=0., stddev=1.)return z_mu + K.exp(z_log_sigma) * epsilon# sample vector from the latent distributionz = Lambda(sampling)([z_mu, z_log_sigma])## DECODER ARCHITECTURE# decoder takes the latent distribution sample as inputdecoder_input = Input(K.int_shape(z)[1:])# Expand to 784 total pixelsx = Dense(np.prod(shape_before_flattening[1:]),activation='relu')(decoder_input)# reshapex = Reshape(shape_before_flattening[1:])(x)# use Conv2DTranspose to reverse the conv layers from the encoderx = Conv2DTranspose(32, 3,padding='same', activation='relu',strides=(2, 2))(x)x = Conv2D(1, 3,padding='same', activation='sigmoid')(x)# decoder model statementdecoder = Model(decoder_input, x)# apply the decoder to the sample from the latent distributionz_decoded = decoder(z)这就是体系结构。但还是需要插入损失函数再合并KL散度。# construct a custom layer to calculate the lossclass CustomVariationalLayer(Layer):def vae_loss(self, x, z_decoded):x = K.flatten(x)z_decoded = K.flatten(z_decoded)# Reconstruction lossxent_loss = binary_crossentropy(x, z_decoded)# KL divergencekl_loss = -5e-4 * K.mean(1 + z_log_sigma - K.square(z_mu) - K.exp(z_log_sigma), axis=-1)return K.mean(xent_loss + kl_loss)# adds the custom loss to the classdef call(self, inputs):x = inputs[0]z_decoded = inputs[1]loss = self.vae_loss(x, z_decoded)self.add_loss(loss, inputs=inputs)return x# apply the custom loss to the input images and the decoded latent distribution sampley = CustomVariationalLayer()([input_img, z_decoded])# VAE model statementvae = Model(input_img, y)vae.compile(optimizer='rmsprop', loss=None)vae.fit(x=train_x, y=None,shuffle=True,epochs=20,batch_size=batch_size,validation_data=https://www.wangchuang8.com/(val_x, None))