Definition of the RailEnv environment.
RailEnv(width, height, rail_generator: Callable[[int, int, int, int], Tuple[flatland.core.transition_map.GridTransitionMap, Optional[Dict]]] = <function random_rail_generator.<locals>.generator>, schedule_generator: Callable[[flatland.core.transition_map.GridTransitionMap, int, Optional[Any], Optional[int]], flatland.envs.schedule_utils.Schedule] = <function random_schedule_generator.<locals>.generator>, number_of_agents=1, obs_builder_object: flatland.core.env_observation_builder.ObservationBuilder = <flatland.envs.observations.GlobalObsForRailEnv object>, malfunction_generator_and_process_data=(<function no_malfunction_generator.<locals>.generator>, MalfunctionProcessData(malfunction_rate=0.0, min_duration=0, max_duration=0)), remove_agents_at_target=True, random_seed=1, record_steps=False)¶
RailEnv environment class.
RailEnv is an environment inspired by a (simplified version of) a rail network, in which agents (trains) have to navigate to their target locations in the shortest time possible, while at the same time cooperating to avoid bottlenecks.
The valid actions in the environment are:
- 0: do nothing (continue moving or stay still)
- 1: turn left at switch and move to the next cell; if the agent was not moving, movement is started
- 2: move to the next cell in front of the agent; if the agent was not moving, movement is started
- 3: turn right at switch and move to the next cell; if the agent was not moving, movement is started
- 4: stop moving
Moving forward in a dead-end cell makes the agent turn 180 degrees and step to the cell it came from.
The actions of the agents are executed in order of their handle to prevent deadlocks and to allow them to learn relative priorities.
It costs each agent a step_penalty for every time-step taken in the environment. Independent of the movement of the agent. Currently all other penalties such as penalty for stopping, starting and invalid actions are set to 0.
alpha = 1 beta = 1 Reward function parameters:
- invalid_action_penalty = 0
- step_penalty = -alpha
- global_reward = beta
- epsilon = avoid rounding errors
- stop_penalty = 0 # penalty for stopping a moving agent
- start_penalty = 0 # penalty for starting a stopped agent
Stochastic malfunctioning of trains: Trains in RailEnv can malfunction if they are halted too often (either by their own choice or because an invalid action or cell is selected.
Every time an agent stops, an agent has a certain probability of malfunctioning. Malfunctions of trains follow a poisson process with a certain rate. Not all trains will be affected by malfunctions during episodes to keep complexity managable.
TODO: currently, the parameters that control the stochasticity of the environment are hard-coded in init(). For Round 2, they will be passed to the constructor as arguments, to allow for more flexibility.
Check if an agent needs to provide an action
- agent: RailEnvAgent
- Agent we want to check
- True: Agent needs to provide an action
- False: Agent cannot provide an action
Add static info for a single agent. Returns the index of the new agent.
cell_free(self, position:Tuple[int, int]) → bool¶
Utility to check if a cell is free
position : Tuple[int, int]
is the cell free or not?
check_action(self, agent:flatland.envs.agent_utils.EnvAgent, action:flatland.envs.rail_env.RailEnvActions)¶
- agent : EnvAgent
- action : RailEnvActions
compute_max_episode_steps(width, height, ratio_nr_agents_to_nr_cities, timedelay_factor, alpha)¶
The method computes the max number of episode steps allowed
- width : int
width of environment
- height : int
height of environment
- ratio_nr_agents_to_nr_cities : float, optional
- max_episode_steps: int
maximum number of episode steps
Returns a list of agents’ handles to be used as keys in the step() function.
get_agent_state_msg(self) → msgpack._cmsgpack.Packer¶
Returns agents information in msgpack object
get_full_state_dist_msg(self) → msgpack._cmsgpack.Packer¶
Returns environment information with distance map information as msgpack object
get_full_state_msg(self) → msgpack._cmsgpack.Packer¶
Returns state of environment in msgpack object
get_num_agents(self) → int¶
get_valid_directions_on_grid(self, row:int, col:int) → List[int]¶
Returns directions in which the agent can move
row : int col : int
Load environment with distance map from a file
Load environment with distance map from a pickle file
pkl_data: pickle file
load_resource(self, package, resource)¶
Load environment with distance map from a binary
Record the positions and orientations of all agents in memory, in the cur_episode
reset(regenerate_rail, regenerate_schedule, activate_agents, random_seed)¶
The method resets the rail environment
- regenerate_rail : bool, optional
regenerate the rails
- regenerate_schedule : bool, optional
regenerate the schedule and the static agents
- activate_agents : bool, optional
activate the agents
- random_seed : bool, optional
random seed for environment
- observation_dict: Dict
Dictionary with an observation for each agent
- info_dict: Dict with agent specific information
Reset the agents to their starting positions
save(self, filename, save_distance_maps=False)¶
Saves environment and distance map information in a file
filename: string save_distance_maps: bool
Sets environment grid state and distance map with msgdata object passed as argument
msg_data: msgpack object
Sets environment state with msgdata object passed as argument
msg_data: msgpack object
step(self, action_dict_:Dict[int, flatland.envs.rail_env.RailEnvActions])¶
Updates rewards for the agents at a step.
- action_dict_ : Dict[int,RailEnvActions]
to_char(a:int) = <function RailEnvActions.to_char>¶
Alias for field number 1
Alias for field number 0