flatland.envs.rail_env module¶

Definition of the RailEnv environment.

class flatland.envs.rail_env.RailEnv(width, height, rail_generator: Callable[[int, int, int, int], Tuple[flatland.core.transition_map.GridTransitionMap, Optional[Dict]]] = <function random_rail_generator.<locals>.generator>, schedule_generator: Callable[[flatland.core.transition_map.GridTransitionMap, int, Optional[Any], Optional[int]], flatland.envs.schedule_utils.Schedule] = <function random_schedule_generator.<locals>.generator>, number_of_agents=1, obs_builder_object: flatland.core.env_observation_builder.ObservationBuilder = <flatland.envs.observations.GlobalObsForRailEnv object>, malfunction_generator_and_process_data=(<function no_malfunction_generator.<locals>.generator>, MalfunctionProcessData(malfunction_rate=0.0, min_duration=0, max_duration=0)), remove_agents_at_target=True, random_seed=1, record_steps=False)[source]¶

Bases: flatland.core.env.Environment

RailEnv environment class.

RailEnv is an environment inspired by a (simplified version of) a rail network, in which agents (trains) have to navigate to their target locations in the shortest time possible, while at the same time cooperating to avoid bottlenecks.

The valid actions in the environment are:

0: do nothing (continue moving or stay still)

1: turn left at switch and move to the next cell; if the agent was not moving, movement is started

2: move to the next cell in front of the agent; if the agent was not moving, movement is started

3: turn right at switch and move to the next cell; if the agent was not moving, movement is started

4: stop moving

Moving forward in a dead-end cell makes the agent turn 180 degrees and step to the cell it came from.

The actions of the agents are executed in order of their handle to prevent deadlocks and to allow them to learn relative priorities.

Reward Function:

It costs each agent a step_penalty for every time-step taken in the environment. Independent of the movement of the agent. Currently all other penalties such as penalty for stopping, starting and invalid actions are set to 0.

alpha = 1 beta = 1 Reward function parameters:

invalid_action_penalty = 0
step_penalty = -alpha
global_reward = beta
epsilon = avoid rounding errors
stop_penalty = 0 # penalty for stopping a moving agent
start_penalty = 0 # penalty for starting a stopped agent

Stochastic malfunctioning of trains: Trains in RailEnv can malfunction if they are halted too often (either by their own choice or because an invalid action or cell is selected.

Every time an agent stops, an agent has a certain probability of malfunctioning. Malfunctions of trains follow a poisson process with a certain rate. Not all trains will be affected by malfunctions during episodes to keep complexity managable.

TODO: currently, the parameters that control the stochasticity of the environment are hard-coded in init(). For Round 2, they will be passed to the constructor as arguments, to allow for more flexibility.

action_required(self, agent)[source]¶

Check if an agent needs to provide an action

Parameters:	agent: RailEnvAgent Agent we want to check
Returns:	True: Agent needs to provide an action False: Agent cannot provide an action

add_agent(self, agent)[source]¶: Add static info for a single agent. Returns the index of the new agent.

alpha = 1.0¶

beta = 1.0¶

cell_free(self, position:Tuple[int, int]) → bool[source]¶

Utility to check if a cell is free

position : Tuple[int, int]

Returns:	bool is the cell free or not?

check_action(self, agent:flatland.envs.agent_utils.EnvAgent, action:flatland.envs.rail_env.RailEnvActions)[source]¶

Parameters:	agent : EnvAgent action : RailEnvActions
Returns:	Tuple[Grid4TransitionsEnum,Tuple[int,int]]

static compute_max_episode_steps(width, height, ratio_nr_agents_to_nr_cities, timedelay_factor, alpha)[source]¶

The method computes the max number of episode steps allowed

Parameters:	width : int width of environment height : int height of environment ratio_nr_agents_to_nr_cities : float, optional number_of_agents/number_of_cities
Returns:	max_episode_steps: int maximum number of episode steps

epsilon = 0.01¶

get_agent_handles(self)[source]¶: Returns a list of agents’ handles to be used as keys in the step() function.

get_agent_state_msg(self) → msgpack._cmsgpack.Packer[source]¶: Returns agents information in msgpack object

get_full_state_dist_msg(self) → msgpack._cmsgpack.Packer[source]¶: Returns environment information with distance map information as msgpack object

get_full_state_msg(self) → msgpack._cmsgpack.Packer[source]¶: Returns state of environment in msgpack object

get_num_agents(self) → int[source]¶

get_valid_directions_on_grid(self, row:int, col:int) → List[int][source]¶

Returns directions in which the agent can move

row : int col : int

List[int]

global_reward = 1.0¶

invalid_action_penalty = 0¶

load(self, filename)[source]¶

Load environment with distance map from a file

filename: string

load_pkl(self, pkl_data)[source]¶

Load environment with distance map from a pickle file

pkl_data: pickle file

load_resource(self, package, resource)[source]¶: Load environment with distance map from a binary

record_timestep(self)[source]¶: Record the positions and orientations of all agents in memory, in the cur_episode

reset(regenerate_rail, regenerate_schedule, activate_agents, random_seed)[source]¶

The method resets the rail environment

Parameters:	regenerate_rail : bool, optional regenerate the rails regenerate_schedule : bool, optional regenerate the schedule and the static agents activate_agents : bool, optional activate the agents random_seed : bool, optional random seed for environment
Returns:	observation_dict: Dict Dictionary with an observation for each agent info_dict: Dict with agent specific information

reset_agents(self)[source]¶: Reset the agents to their starting positions

save(self, filename, save_distance_maps=False)[source]¶

Saves environment and distance map information in a file

filename: string save_distance_maps: bool

save_episode(self, filename)[source]¶

set_agent_active(self, agent:flatland.envs.agent_utils.EnvAgent)[source]¶

set_full_state_dist_msg(self, msg_data)[source]¶

Sets environment grid state and distance map with msgdata object passed as argument

msg_data: msgpack object

set_full_state_msg(self, msg_data)[source]¶

Sets environment state with msgdata object passed as argument

msg_data: msgpack object

start_penalty = 0¶

step(self, action_dict_:Dict[int, flatland.envs.rail_env.RailEnvActions])[source]¶

Updates rewards for the agents at a step.

Parameters:	action_dict_ : Dict[int,RailEnvActions]

step_penalty = -1.0¶

stop_penalty = 0¶

class flatland.envs.rail_env.RailEnvActions[source]¶

Bases: enum.IntEnum

An enumeration.

DO_NOTHING = 0¶

MOVE_FORWARD = 2¶

MOVE_LEFT = 1¶

MOVE_RIGHT = 3¶

STOP_MOVING = 4¶

to_char(a:int) = <function RailEnvActions.to_char>[source]¶

class flatland.envs.rail_env.RailEnvGridPos(r, c)¶

Bases: tuple

c¶: Alias for field number 1

r¶: Alias for field number 0

class flatland.envs.rail_env.RailEnvNextAction(action, next_position, next_direction)¶

Bases: tuple

action¶: Alias for field number 0

next_direction¶: Alias for field number 2

next_position¶: Alias for field number 1