强化学习之MountainCarContinuous(注册自己的gym环境)
目录
- 1. 问题概述
- 2. 环境
- 2.1 Observation & state
- 2.2 Actions
- 2.3 Reward
- 2.4 初始状态
- 2.5 终止状态- Episode Termination
- 2.6 Solved Requirements
- 3. 代码
- 3.1 导入lib
- 3.2 定义Continuous_MountainCarEnv类
- 3.2.1 定义__init__(self)函数
- 3.2.2 定义随机种子函数seed(self, seed=None)
- 3.2.3 定义step(self, action)函数
- 3.2.4 定义reset()函数:
- 3.2.5 定义_height(self, xs)函数:
- 3.2.6 定义render(self, mode='human')函数
- 3.2.7 定义close(self)函数
- 4. 运行
- 4.1 完整代码:continuous_mountain_car.py
- 4.2 注册环境
- 4.3 创建运行代码:MountainCarContinuous.py
- 五:参考
![强化学习之MountainCarContinuous(注册自己的gym环境)](https://img.it610.com/image/info8/30c1aaf93f49480989764c2e6d42ee92.jpg)
文章图片
1. 问题概述 问题:MountainCarContinuous-v0
代码地址:https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py
细节:动力不足的汽车必须爬上一维小山才能到达目标。 MountainCarContinuous-v0与MountainCar-v0不同,动作(应用的引擎力)允许是连续值。
目标位于汽车右侧的山顶上。 如果汽车到达或超出,则剧集终止。
在左侧,还有另一座山。 攀登这座山丘可以用来获得潜在的能量,并朝着目标加速。 在这第二座山顶上,汽车不能超过等于-1的位置,好像有一堵墙。 达到此限制不会产生惩罚(可能在更具挑战性的版本中)[1].
类型:连续控制
2. 环境 2.1 Observation & state
textObservation | Min | Max | |
---|---|---|---|
0 | Position | -1.2 | 0.6 |
1 | Velocity | -0.07 | 0.07 |
注意:Observation是 state的函数,二者有时相同,有时不同,在此例中,二者是一样的,在 Pendulum-v0中,Observation是state的函数。2.2 Actions
n | Action |
---|---|
0 | 将车推向左侧(负值)或向右侧(正值) |
请注意,对于大多数已发表的作品而言,这种奖励是不寻常的,其目标是尽可能快地达到目标,因此有利于爆炸战略。
更多的奖励函数形式查看这个Leaderboard
2.4 初始状态 位于-0.6和-0.4之间,无速度。
2.5 终止状态- Episode Termination 位置等于0.5(此值可能被调整)。 可以在更具挑战性的版本中添加对速度的约束。
添加最大步数可能是个好主意。
2.6 Solved Requirements 获得超过90的奖励。此值可能会被调整。
3. 代码 3.1 导入lib
import math
import gym
from gym import spaces
from gym.utils import seeding
import numpy as np
3.2 定义
Continuous_MountainCarEnv
类
class Continuous_MountainCarEnv(gym.Env):
metadata = https://www.it610.com/article/{'render.modes': ['human', 'rgb_array'],
'video.frames_per_second': 30
}
3.2.1 定义
__init__(self)
函数def __init__(self):
self.min_action = -1.0# 最小动作值
self.max_action = 1.0# 最大动作值
self.min_position = -1.2 # 最低位置
self.max_position = 0.6# 最高位置
self.max_speed = 0.07# 最大速度
self.goal_position = 0.45 # was 0.5 in gym, 0.45 in Arnaud de Broissia's version
self.power = 0.0015self.low_state = np.array([self.min_position, -self.max_speed]) # [-1.2, -0.07]
self.high_state = np.array([self.max_position, self.max_speed]) # [0.6, 0.07]self.viewer = None
#声明observation space和action space的上下限
self.action_space = spaces.Box(low=self.min_action, high=self.max_action, shape=(1,))
# (low = 1.0, high = 1.0)
self.observation_space = spaces.Box(low=self.low_state, high=self.high_state)
# (low = -1.2, high = 0.6 )self.seed()
self.reset()
3.2.2 定义随机种子函数
seed(self, seed=None)
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
3.2.3 定义
step(self, action)
函数step()
函数
该函数在仿真器中扮演物理引擎的角色。其输入是动作action
, 输出是:下一步状态,立即回报,是否终止,调试项。该函数描述了 智能体与环境交互的所有信息,是环境文件中最重要的函数。在该函数中, 一般利用智能体的运动学模型和动力学模型计算下一步的状态和立即回报,并判断是否达到终止状态
def step(self, action):1.position = self.state[0]
2.velocity = self.state[1]
# position, velocity = self.state
3.force = min(max(action[0], -1.0), 1.0)4.velocity += force*self.power - 0.0025 * math.cos(3*position)
5.if (velocity > self.max_speed): velocity = self.max_speed
6.if (velocity < -self.max_speed): velocity = -self.max_speed
7.position += velocity
8.if (position > self.max_position): position = self.max_position
9.if (position < self.min_position): position = self.min_position
10.if (position==self.min_position and velocity<0): velocity = 011.done = bool(position >= self.goal_position)12.reward = 0
13.if done:
14.reward = 100.0
15.reward-= math.pow(action[0],2)*0.116.self.state = np.array([position, velocity])
17.return self.state, reward, done, {}
- 初始化位置状态
- 初始化速度状态
- 引擎力:内层的
max(action[0], -1.0)
确保动作值不低于下界,即 - 1.0,
外层的min(max(action[0], -1.0), 1.0)
确保动作值不高于上界,即 1.0
- 计算速度:注意是速度累加的,这是微分的概念,把连续过程离散成很小的片段以进行近似
- 判断当前速度是否大于最大速度:如果是,将当前速度设定为最大速度
- 判断当前速度是否小于最小速度:如果是,将当前速度设定为最小速度
- 计算位置:
- 判断当前位置是否高于最高位置:如果是,将当前位置设定为最高位置
- 判断当前位置是否低于最低位置:如果是,将当前位置设定为最低位置
- 如果当前位置是最低位置且速度小于 0 :将速度设为0
- 判断布尔类型的,返回True或者False
- 初始化 reward = 0
- 如果当前位置高于目标位置,
- 给予 agent 值为100的reward
-
- 这是执行动作之后得到的新的状态
-
step()
函数返回下一时刻的观测,回报,是否终止,调试项
MountainCarContinuous-v0
11-15 这几行代码的意思是:每执行一个step,就会检查看自己是否越过了右边的山峰,据此来给done赋值,如果小车没有越过右边的山峰,即 done=False,则在这一个step, reward将会记为,也就是这一个时间步我们耗费了多少能量,我们当然不希望耗油太多。如果小车越过右边的山峰,即 done=True,这一个step就会马上得到 的奖励。3.2.4 定义
reset()
函数:在强化学习算法中,智能体需要一次次地尝试,累积经验,然后从经验中学到好的动作。一次尝试我们称之为一条轨迹或一个episode. 每次尝试都要到达终止状态. 一次尝试结束后,智能体需要从头开始,这就需要智能体具有重新初始化的功能。函数reset()
就是这个作用, agent与环境交互前调用该函数,确定agent的初始状态以及其他可能的一些初始化设置。此例中在每个episode开始时,position初始化为[-0.6,-0.4]之间的一个任意状态,速度初始化为0.
def reset(self):
self.state = np.array([self.np_random.uniform(low=-0.6, high=-0.4), 0])
return np.array(self.state)
3.2.5 定义
_height(self, xs)
函数:此函数用于下面的
render()
函数用来构建图像引擎def _height(self, xs):
return np.sin(3 * xs)*.45+.55
3.2.6 定义
render(self, mode='human')
函数render()
函数是图像引擎,就是人机交互界面,进行动画演示,一个仿真环境必不可少的两部分 是物理引擎和图像引擎。物理引擎模拟环境中物体的运动规律;图像引擎用来显示环境中的物体图像。
def render(self, mode='human'):
screen_width = 600
screen_height = 400world_width = self.max_position - self.min_position
scale = screen_width/world_width
carwidth=40
carheight=20if self.viewer is None:
from gym.envs.classic_control import rendering
self.viewer = rendering.Viewer(screen_width, screen_height)
xs = np.linspace(self.min_position, self.max_position, 100)
ys = self._height(xs)
xys = list(zip((xs-self.min_position)*scale, ys*scale))self.track = rendering.make_polyline(xys)
self.track.set_linewidth(4)
self.viewer.add_geom(self.track)clearance = 10l,r,t,b = -carwidth/2, carwidth/2, carheight, 0
car = rendering.FilledPolygon([(l,b), (l,t), (r,t), (r,b)])
car.add_attr(rendering.Transform(translation=(0, clearance)))
self.cartrans = rendering.Transform()
car.add_attr(self.cartrans)
self.viewer.add_geom(car)
frontwheel = rendering.make_circle(carheight/2.5)
frontwheel.set_color(.5, .5, .5)
frontwheel.add_attr(rendering.Transform(translation=(carwidth/4,clearance)))
frontwheel.add_attr(self.cartrans)
self.viewer.add_geom(frontwheel)
backwheel = rendering.make_circle(carheight/2.5)
backwheel.add_attr(rendering.Transform(translation=(-carwidth/4,clearance)))
backwheel.add_attr(self.cartrans)
backwheel.set_color(.5, .5, .5)
self.viewer.add_geom(backwheel)
flagx = (self.goal_position-self.min_position)*scale
flagy1 = self._height(self.goal_position)*scale
flagy2 = flagy1 + 50
flagpole = rendering.Line((flagx, flagy1), (flagx, flagy2))
self.viewer.add_geom(flagpole)
flag = rendering.FilledPolygon([(flagx, flagy2), (flagx, flagy2-10),
(flagx+25, flagy2-5)])
flag.set_color(.8,.8,0)
self.viewer.add_geom(flag)pos = self.state[0]
self.cartrans.set_translation((pos-self.min_position)*scale, self._height(pos)*scale)
self.cartrans.set_rotation(math.cos(3 * pos))return self.viewer.render(return_rgb_array = mode=='rgb_array')
强化学习算法可以不用图像引擎,这里我们不做解释了。
3.2.7 定义
close(self)
函数def close(self):
if self.viewer:
self.viewer.close()
self.viewer = None
4. 运行 4.1 完整代码:continuous_mountain_car.py
"""
MountainCarContinuous-v1
@author: Olivier Sigaud
A merge between two sources:
* Adaptation of the MountainCar Environment from the "FAReinforcement" library
of Jose Antonio Martin H. (version 1.0), adapted by'Tom Schaul, tom@idsia.ch'
and then modified by Arnaud de Broissia
* the OpenAI/gym MountainCar environment
itself from
http://incompleteideas.net/sutton/MountainCar/MountainCar1.cp
permalink: https://perma.cc/6Z2N-PFWC
"""import mathimport numpy as npimport gym
from gym import spaces
from gym.utils import seedingclass ContinuousMountainCarEnv(gym.Env):
"""
Description:
The agent (a car) is started at the bottom of a valley. For any given
state the agent may choose to accelerate to the left, right or cease
any acceleration.
Observation:
Type: Box(2)
NumObservationMinMax
0Car Position-1.20.6
1Car Velocity-0.070.07
Actions:
Type: Box(1)
NumActionMinMax
0the power coef-1.01.0
Note: actual driving force is calculated by multiplying the power coef by power (0.0015)
Reward:
Reward of 100 is awarded if the agent reached the flag (position = 0.45) on top of the mountain.
Reward is decrease based on amount of energy consumed each step.
Starting State:
The position of the car is assigned a uniform random value in
[-0.6 , -0.4].
The starting velocity of the car is always assigned to 0.
Episode Termination:
The car position is more than 0.45
Episode length is greater than 200
"""metadata = https://www.it610.com/article/{"render.modes": ["human", "rgb_array"], "video.frames_per_second": 30}def __init__(self, goal_velocity=0):
self.min_action = -1.0
self.max_action = 1.0
self.min_position = -1.2
self.max_position = 0.6
self.max_speed = 0.07
self.goal_position = (
0.45# was 0.5 in gym, 0.45 in Arnaud de Broissia's version
)
self.goal_velocity = goal_velocity
self.power = 0.0015self.low_state = np.array(
[self.min_position, -self.max_speed], dtype=np.float32
)
self.high_state = np.array(
[self.max_position, self.max_speed], dtype=np.float32
)self.viewer = Noneself.action_space = spaces.Box(
low=self.min_action, high=self.max_action, shape=(1,), dtype=np.float32
)
self.observation_space = spaces.Box(
low=self.low_state, high=self.high_state, dtype=np.float32
)self.seed()def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]def step(self, action):position = self.state[0]
velocity = self.state[1]
force = min(max(action[0], self.min_action), self.max_action)velocity += force * self.power - 0.0025 * math.cos(3 * position)
if velocity > self.max_speed:
velocity = self.max_speed
if velocity < -self.max_speed:
velocity = -self.max_speed
position += velocity
if position > self.max_position:
position = self.max_position
if position < self.min_position:
position = self.min_position
if position == self.min_position and velocity < 0:
velocity = 0# Convert a possible numpy bool to a Python bool.
done = bool(position >= self.goal_position and velocity >= self.goal_velocity)reward = 0
if done:
reward = 100.0
reward -= math.pow(action[0], 2) * 0.1self.state = np.array([position, velocity], dtype=np.float32)
return self.state, reward, done, {}def reset(self):
self.state = np.array([self.np_random.uniform(low=-0.6, high=-0.4), 0])
return np.array(self.state, dtype=np.float32)def _height(self, xs):
return np.sin(3 * xs) * 0.45 + 0.55def render(self, mode="human"):
screen_width = 600
screen_height = 400world_width = self.max_position - self.min_position
scale = screen_width / world_width
carwidth = 40
carheight = 20if self.viewer is None:
from gym.envs.classic_control import renderingself.viewer = rendering.Viewer(screen_width, screen_height)
xs = np.linspace(self.min_position, self.max_position, 100)
ys = self._height(xs)
xys = list(zip((xs - self.min_position) * scale, ys * scale))self.track = rendering.make_polyline(xys)
self.track.set_linewidth(4)
self.viewer.add_geom(self.track)clearance = 10l, r, t, b = -carwidth / 2, carwidth / 2, carheight, 0
car = rendering.FilledPolygon([(l, b), (l, t), (r, t), (r, b)])
car.add_attr(rendering.Transform(translation=(0, clearance)))
self.cartrans = rendering.Transform()
car.add_attr(self.cartrans)
self.viewer.add_geom(car)
frontwheel = rendering.make_circle(carheight / 2.5)
frontwheel.set_color(0.5, 0.5, 0.5)
frontwheel.add_attr(
rendering.Transform(translation=(carwidth / 4, clearance))
)
frontwheel.add_attr(self.cartrans)
self.viewer.add_geom(frontwheel)
backwheel = rendering.make_circle(carheight / 2.5)
backwheel.add_attr(
rendering.Transform(translation=(-carwidth / 4, clearance))
)
backwheel.add_attr(self.cartrans)
backwheel.set_color(0.5, 0.5, 0.5)
self.viewer.add_geom(backwheel)
flagx = (self.goal_position - self.min_position) * scale
flagy1 = self._height(self.goal_position) * scale
flagy2 = flagy1 + 50
flagpole = rendering.Line((flagx, flagy1), (flagx, flagy2))
self.viewer.add_geom(flagpole)
flag = rendering.FilledPolygon(
[(flagx, flagy2), (flagx, flagy2 - 10), (flagx + 25, flagy2 - 5)]
)
flag.set_color(0.8, 0.8, 0)
self.viewer.add_geom(flag)pos = self.state[0]
self.cartrans.set_translation(
(pos - self.min_position) * scale, self._height(pos) * scale
)
self.cartrans.set_rotation(math.cos(3 * pos))return self.viewer.render(return_rgb_array=mode == "rgb_array")def close(self):
if self.viewer:
self.viewer.close()
self.viewer = None
4.2 注册环境 第一步:将我们自己的环境文件(我创建的文件名为
continuous_mountain_car.py
)拷贝到你的gym安装目录./gym/gym/envs/classic_control
文件夹中。(拷贝在这个文件夹中因为要使用rendering模块。当然,也有其他办法。该方法不唯一)第二步:打开该文件夹(第一步中的文件夹)下的
__init__.py
文件,在文件末尾加入语句:from gym.envs.classic_control.continuous_mountain_car import ContinuousMountainCarEnv
第三步:进入文件夹你的gym安装目录
./gym/gym/envs
,打开该文件夹下的__init__.py
文件,添加代码:register(id='MountainCarContinuous-v1',entry_point='gym.envs.classic_control:GridEnv')"""
第一个参数id就是你调用gym.make(‘id’)时的id, 这个id你可以随便选取,我取的,名字是MountainCarContinuous-v1。
第二个参数就是函数路口。
"""
经过以上三步,就完成了注册。
4.3 创建运行代码:MountainCarContinuous.py 【强化学习之MountainCarContinuous(注册自己的gym环境)】创建这个运行代码后直接运行即可。
#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Toolby: PyCharmimport gymenv = gym.make('MountainCarContinuous-v1')
env = env.unwrappedtotal_steps = 0for i_episode in range(10):observation = env.reset()
ep_r = 0
while True:
env.render()action = env.action_space.sample()observation_, reward, done, info = env.step(action)position, velocity = observation_# 车开得越高 reward 越大
reward = abs(position - (-0.5))ep_r += reward
if done:
get = '| Get' if observation_[0] >= env.unwrapped.goal_position else '| ----'
print('Epi: ', i_episode,
get,
'| Ep_r: ', round(ep_r, 4))breakobservation = observation
total_steps += 1
五:参考
- https://github.com/openai/gym/wiki/MountainCarContinuous-v0
- https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py
- https://applenob.github.io/mountain_car.html
- https://blog.csdn.net/u013745804/article/details/78403912
- 强化学习实践二 理解gym的建模思想
推荐阅读
- python学习之|python学习之 实现QQ自动发送消息
- 五年后,我要成为独立自强自信的女性
- 4月23日海军节,我在青岛等你,一起看强大的中国海军。(如图如视频)
- 不废话,代码实践带你掌握|不废话,代码实践带你掌握 强缓存、协商缓存!
- 【变化】我的青椒学习之旅
- 社会教你顽强,而不是教你失望
- [成长]“青椒”给我打开了一扇窗——我的青椒学习之旅
- 《简社》绝句练习之九
- 强极则辱
- r语言python|r语言python 比较_R语言vs Python(数据分析哪家强())