flatland.envs.rail_env module¶
Definition of the RailEnv environment.
-
class
flatland.envs.rail_env.
RailEnv
(width, height, rail_generator: Callable[[int, int, int, int], Tuple[flatland.core.transition_map.GridTransitionMap, Optional[Dict]]] = <function random_rail_generator.<locals>.generator>, schedule_generator: Callable[[flatland.core.transition_map.GridTransitionMap, int, Optional[Any], Optional[int]], flatland.envs.schedule_utils.Schedule] = <function random_schedule_generator.<locals>.generator>, number_of_agents=1, obs_builder_object: flatland.core.env_observation_builder.ObservationBuilder = <flatland.envs.observations.GlobalObsForRailEnv object>, malfunction_generator_and_process_data=(<function no_malfunction_generator.<locals>.generator>, MalfunctionProcessData(malfunction_rate=0.0, min_duration=0, max_duration=0)), remove_agents_at_target=True, random_seed=1, record_steps=False)[source]¶ Bases:
flatland.core.env.Environment
RailEnv environment class.
RailEnv is an environment inspired by a (simplified version of) a rail network, in which agents (trains) have to navigate to their target locations in the shortest time possible, while at the same time cooperating to avoid bottlenecks.
The valid actions in the environment are:
- 0: do nothing (continue moving or stay still)
- 1: turn left at switch and move to the next cell; if the agent was not moving, movement is started
- 2: move to the next cell in front of the agent; if the agent was not moving, movement is started
- 3: turn right at switch and move to the next cell; if the agent was not moving, movement is started
- 4: stop moving
Moving forward in a dead-end cell makes the agent turn 180 degrees and step to the cell it came from.
The actions of the agents are executed in order of their handle to prevent deadlocks and to allow them to learn relative priorities.
Reward Function:
It costs each agent a step_penalty for every time-step taken in the environment. Independent of the movement of the agent. Currently all other penalties such as penalty for stopping, starting and invalid actions are set to 0.
alpha = 1 beta = 1 Reward function parameters:
- invalid_action_penalty = 0
- step_penalty = -alpha
- global_reward = beta
- epsilon = avoid rounding errors
- stop_penalty = 0 # penalty for stopping a moving agent
- start_penalty = 0 # penalty for starting a stopped agent
Stochastic malfunctioning of trains: Trains in RailEnv can malfunction if they are halted too often (either by their own choice or because an invalid action or cell is selected.
Every time an agent stops, an agent has a certain probability of malfunctioning. Malfunctions of trains follow a poisson process with a certain rate. Not all trains will be affected by malfunctions during episodes to keep complexity managable.
TODO: currently, the parameters that control the stochasticity of the environment are hard-coded in init(). For Round 2, they will be passed to the constructor as arguments, to allow for more flexibility.
-
action_required
(self, agent)[source]¶ Check if an agent needs to provide an action
Parameters: - agent: RailEnvAgent
- Agent we want to check
Returns: - True: Agent needs to provide an action
- False: Agent cannot provide an action
-
add_agent
(self, agent)[source]¶ Add static info for a single agent. Returns the index of the new agent.
-
alpha
= 1.0¶
-
beta
= 1.0¶
-
cell_free
(self, position:Tuple[int, int]) → bool[source]¶ Utility to check if a cell is free
position : Tuple[int, int]
Returns: - bool
is the cell free or not?
-
check_action
(self, agent:flatland.envs.agent_utils.EnvAgent, action:flatland.envs.rail_env.RailEnvActions)[source]¶ Parameters: - agent : EnvAgent
- action : RailEnvActions
Returns: - Tuple[Grid4TransitionsEnum,Tuple[int,int]]
-
static
compute_max_episode_steps
(width, height, ratio_nr_agents_to_nr_cities, timedelay_factor, alpha)[source]¶ The method computes the max number of episode steps allowed
Parameters: - width : int
width of environment
- height : int
height of environment
- ratio_nr_agents_to_nr_cities : float, optional
number_of_agents/number_of_cities
Returns: - max_episode_steps: int
maximum number of episode steps
-
epsilon
= 0.01¶
-
get_agent_handles
(self)[source]¶ Returns a list of agents’ handles to be used as keys in the step() function.
-
get_agent_state_msg
(self) → msgpack._cmsgpack.Packer[source]¶ Returns agents information in msgpack object
-
get_full_state_dist_msg
(self) → msgpack._cmsgpack.Packer[source]¶ Returns environment information with distance map information as msgpack object
-
get_full_state_msg
(self) → msgpack._cmsgpack.Packer[source]¶ Returns state of environment in msgpack object
-
get_valid_directions_on_grid
(self, row:int, col:int) → List[int][source]¶ Returns directions in which the agent can move
row : int col : int
List[int]
-
global_reward
= 1.0¶
-
invalid_action_penalty
= 0¶
-
load_pkl
(self, pkl_data)[source]¶ Load environment with distance map from a pickle file
pkl_data: pickle file
-
record_timestep
(self)[source]¶ Record the positions and orientations of all agents in memory, in the cur_episode
-
reset
(regenerate_rail, regenerate_schedule, activate_agents, random_seed)[source]¶ The method resets the rail environment
Parameters: - regenerate_rail : bool, optional
regenerate the rails
- regenerate_schedule : bool, optional
regenerate the schedule and the static agents
- activate_agents : bool, optional
activate the agents
- random_seed : bool, optional
random seed for environment
Returns: - observation_dict: Dict
Dictionary with an observation for each agent
- info_dict: Dict with agent specific information
-
save
(self, filename, save_distance_maps=False)[source]¶ Saves environment and distance map information in a file
filename: string save_distance_maps: bool
-
set_full_state_dist_msg
(self, msg_data)[source]¶ Sets environment grid state and distance map with msgdata object passed as argument
msg_data: msgpack object
-
set_full_state_msg
(self, msg_data)[source]¶ Sets environment state with msgdata object passed as argument
msg_data: msgpack object
-
start_penalty
= 0¶
-
step
(self, action_dict_:Dict[int, flatland.envs.rail_env.RailEnvActions])[source]¶ Updates rewards for the agents at a step.
Parameters: - action_dict_ : Dict[int,RailEnvActions]
-
step_penalty
= -1.0¶
-
stop_penalty
= 0¶
-
class
flatland.envs.rail_env.
RailEnvActions
[source]¶ Bases:
enum.IntEnum
An enumeration.
-
DO_NOTHING
= 0¶
-
MOVE_FORWARD
= 2¶
-
MOVE_LEFT
= 1¶
-
MOVE_RIGHT
= 3¶
-
STOP_MOVING
= 4¶
-