GridWorld¶
-
class
safemdp.GridWorld(gp, world_shape, step_size, beta, altitudes, h, S0, S_hat0, L, update_dist=0)¶ Grid world with Safe exploration
Parameters: gp: GPy.core.GP
Gaussian process that expresses our current belief over the safety feature
world_shape: shape
Tuple that contains the shape of the grid world n x m
step_size: tuple of floats
Tuple that contains the step sizes along each direction to create a linearly spaced grid
beta: float
Scaling factor to determine the amplitude of the confidence intervals
altitudes: np.array
It contains the flattened n x m matrix where the altitudes of all the points in the map are stored
h: float
Safety threshold
S0: np.array
n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed. Notice that, by convention we initialize all the states to be safe
S_hat0: np.array or nan
n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed and satisfy recovery and reachability properties. If it is nan, such a boolean matrix is computed during initialization
noise: float
Standard deviation of the measurement noise
L: float
Lipschitz constant to compute expanders
update_dist: int
Distance in unweighted graph used for confidence interval update. A sample will only influence other nodes within this distance.
Methods
add_gp_observations(x_new, y_new)Add observations to the gp mode. add_observation(node, action)Add an observation of the given state-action pair. compute_S_hat()Compute the safely reachable set given the current safe_set. compute_expanders()Compute the expanders based on the current estimate of S_hat. plot_S(safe_set[, action])Plot the set of safe states target_sample()Compute the next target (s, a) to sample (highest uncertainty within update_confidence_interval([jacobian])Updates the lower and the upper bound of the confidence intervals update_sets()Update the sets S, S_hat and G taking with the available observation -
add_gp_observations(x_new, y_new)¶ Add observations to the gp mode.
-
add_observation(node, action)¶ Add an observation of the given state-action pair.
Observing the pair (s, a) means adding an observation of the altitude at s and an observation of the altitude at f(s, a)
Parameters: node: int
Node index
action: int
Action index
-
compute_S_hat()¶ Compute the safely reachable set given the current safe_set.
-
compute_expanders()¶ Compute the expanders based on the current estimate of S_hat.
-
plot_S(safe_set, action=0)¶ Plot the set of safe states
Parameters: safe_set: np.array(dtype=bool)
n_states x (n_actions + 1) array of boolean values that indicates the safe set
action: int
The action for which we want to plot the safe set.
-
target_sample()¶ Compute the next target (s, a) to sample (highest uncertainty within G or S_hat)
Returns: node: int
The next node to sample
action: int
The next action to sample
-
update_confidence_interval(jacobian=False)¶ Updates the lower and the upper bound of the confidence intervals using then posterior distribution over the gradients of the altitudes
Returns: l: np.array
lower bound of the safety feature (mean - beta*std)
u: np.array
upper bound of the safety feature (mean - beta*std)
-
update_sets()¶ Update the sets S, S_hat and G taking with the available observation
-