GridWorld¶
-
class
safemdp.
GridWorld
(gp, world_shape, step_size, beta, altitudes, h, S0, S_hat0, L, update_dist=0)¶ Grid world with Safe exploration
Parameters: gp: GPy.core.GP
Gaussian process that expresses our current belief over the safety feature
world_shape: shape
Tuple that contains the shape of the grid world n x m
step_size: tuple of floats
Tuple that contains the step sizes along each direction to create a linearly spaced grid
beta: float
Scaling factor to determine the amplitude of the confidence intervals
altitudes: np.array
It contains the flattened n x m matrix where the altitudes of all the points in the map are stored
h: float
Safety threshold
S0: np.array
n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed. Notice that, by convention we initialize all the states to be safe
S_hat0: np.array or nan
n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed and satisfy recovery and reachability properties. If it is nan, such a boolean matrix is computed during initialization
noise: float
Standard deviation of the measurement noise
L: float
Lipschitz constant to compute expanders
update_dist: int
Distance in unweighted graph used for confidence interval update. A sample will only influence other nodes within this distance.
Methods
add_gp_observations
(x_new, y_new)Add observations to the gp mode. add_observation
(node, action)Add an observation of the given state-action pair. compute_S_hat
()Compute the safely reachable set given the current safe_set. compute_expanders
()Compute the expanders based on the current estimate of S_hat. plot_S
(safe_set[, action])Plot the set of safe states target_sample
()Compute the next target (s, a) to sample (highest uncertainty within update_confidence_interval
([jacobian])Updates the lower and the upper bound of the confidence intervals update_sets
()Update the sets S, S_hat and G taking with the available observation -
add_gp_observations
(x_new, y_new)¶ Add observations to the gp mode.
-
add_observation
(node, action)¶ Add an observation of the given state-action pair.
Observing the pair (s, a) means adding an observation of the altitude at s and an observation of the altitude at f(s, a)
Parameters: node: int
Node index
action: int
Action index
-
compute_S_hat
()¶ Compute the safely reachable set given the current safe_set.
-
compute_expanders
()¶ Compute the expanders based on the current estimate of S_hat.
-
plot_S
(safe_set, action=0)¶ Plot the set of safe states
Parameters: safe_set: np.array(dtype=bool)
n_states x (n_actions + 1) array of boolean values that indicates the safe set
action: int
The action for which we want to plot the safe set.
-
target_sample
()¶ Compute the next target (s, a) to sample (highest uncertainty within G or S_hat)
Returns: node: int
The next node to sample
action: int
The next action to sample
-
update_confidence_interval
(jacobian=False)¶ Updates the lower and the upper bound of the confidence intervals using then posterior distribution over the gradients of the altitudes
Returns: l: np.array
lower bound of the safety feature (mean - beta*std)
u: np.array
upper bound of the safety feature (mean - beta*std)
-
update_sets
()¶ Update the sets S, S_hat and G taking with the available observation
-