GridWorld¶

class
safemdp.
GridWorld
(gp, world_shape, step_size, beta, altitudes, h, S0, S_hat0, L, update_dist=0)¶ Grid world with Safe exploration
Parameters: gp: GPy.core.GP
Gaussian process that expresses our current belief over the safety feature
world_shape: shape
Tuple that contains the shape of the grid world n x m
step_size: tuple of floats
Tuple that contains the step sizes along each direction to create a linearly spaced grid
beta: float
Scaling factor to determine the amplitude of the confidence intervals
altitudes: np.array
It contains the flattened n x m matrix where the altitudes of all the points in the map are stored
h: float
Safety threshold
S0: np.array
n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which stateaction pairs belong to the initial safe seed. Notice that, by convention we initialize all the states to be safe
S_hat0: np.array or nan
n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which stateaction pairs belong to the initial safe seed and satisfy recovery and reachability properties. If it is nan, such a boolean matrix is computed during initialization
noise: float
Standard deviation of the measurement noise
L: float
Lipschitz constant to compute expanders
update_dist: int
Distance in unweighted graph used for confidence interval update. A sample will only influence other nodes within this distance.
Methods
add_gp_observations
(x_new, y_new)Add observations to the gp mode. add_observation
(node, action)Add an observation of the given stateaction pair. compute_S_hat
()Compute the safely reachable set given the current safe_set. compute_expanders
()Compute the expanders based on the current estimate of S_hat. plot_S
(safe_set[, action])Plot the set of safe states target_sample
()Compute the next target (s, a) to sample (highest uncertainty within update_confidence_interval
([jacobian])Updates the lower and the upper bound of the confidence intervals update_sets
()Update the sets S, S_hat and G taking with the available observation 
add_gp_observations
(x_new, y_new)¶ Add observations to the gp mode.

add_observation
(node, action)¶ Add an observation of the given stateaction pair.
Observing the pair (s, a) means adding an observation of the altitude at s and an observation of the altitude at f(s, a)
Parameters: node: int
Node index
action: int
Action index

compute_S_hat
()¶ Compute the safely reachable set given the current safe_set.

compute_expanders
()¶ Compute the expanders based on the current estimate of S_hat.

plot_S
(safe_set, action=0)¶ Plot the set of safe states
Parameters: safe_set: np.array(dtype=bool)
n_states x (n_actions + 1) array of boolean values that indicates the safe set
action: int
The action for which we want to plot the safe set.

target_sample
()¶ Compute the next target (s, a) to sample (highest uncertainty within G or S_hat)
Returns: node: int
The next node to sample
action: int
The next action to sample

update_confidence_interval
(jacobian=False)¶ Updates the lower and the upper bound of the confidence intervals using then posterior distribution over the gradients of the altitudes
Returns: l: np.array
lower bound of the safety feature (mean  beta*std)
u: np.array
upper bound of the safety feature (mean  beta*std)

update_sets
()¶ Update the sets S, S_hat and G taking with the available observation
