GridWorld¶

class safemdp.GridWorld(gp, world_shape, step_size, beta, altitudes, h, S0, S_hat0, L, update_dist=0)¶

Grid world with Safe exploration

Parameters:

gp: GPy.core.GP

Gaussian process that expresses our current belief over the safety feature

world_shape: shape

Tuple that contains the shape of the grid world n x m

step_size: tuple of floats

Tuple that contains the step sizes along each direction to create a linearly spaced grid

beta: float

Scaling factor to determine the amplitude of the confidence intervals

altitudes: np.array

It contains the flattened n x m matrix where the altitudes of all the points in the map are stored

h: float

Safety threshold

S0: np.array

n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed. Notice that, by convention we initialize all the states to be safe

S_hat0: np.array or nan

n_states x (n_actions + 1) array of booleans that indicates which states (first column) and which state-action pairs belong to the initial safe seed and satisfy recovery and reachability properties. If it is nan, such a boolean matrix is computed during initialization

noise: float

Standard deviation of the measurement noise

L: float

Lipschitz constant to compute expanders

update_dist: int

Distance in unweighted graph used for confidence interval update. A sample will only influence other nodes within this distance.

Methods

`add_gp_observations`(x_new, y_new)	Add observations to the gp mode.
`add_observation`(node, action)	Add an observation of the given state-action pair.
`compute_S_hat`()	Compute the safely reachable set given the current safe_set.
`compute_expanders`()	Compute the expanders based on the current estimate of S_hat.
`plot_S`(safe_set[, action])	Plot the set of safe states
`target_sample`()	Compute the next target (s, a) to sample (highest uncertainty within
`update_confidence_interval`([jacobian])	Updates the lower and the upper bound of the confidence intervals
`update_sets`()	Update the sets S, S_hat and G taking with the available observation

add_gp_observations(x_new, y_new)¶: Add observations to the gp mode.

add_observation(node, action)¶

Add an observation of the given state-action pair.

Observing the pair (s, a) means adding an observation of the altitude at s and an observation of the altitude at f(s, a)

Parameters:

node: int

Node index

action: int

Action index

compute_S_hat()¶: Compute the safely reachable set given the current safe_set.

compute_expanders()¶: Compute the expanders based on the current estimate of S_hat.

plot_S(safe_set, action=0)¶

Plot the set of safe states

Parameters:

safe_set: np.array(dtype=bool)

n_states x (n_actions + 1) array of boolean values that indicates the safe set

action: int

The action for which we want to plot the safe set.

target_sample()¶

Compute the next target (s, a) to sample (highest uncertainty within G or S_hat)

Returns:

node: int

The next node to sample

action: int

The next action to sample

update_confidence_interval(jacobian=False)¶

Updates the lower and the upper bound of the confidence intervals using then posterior distribution over the gradients of the altitudes

Returns:

l: np.array

lower bound of the safety feature (mean - beta*std)

u: np.array

upper bound of the safety feature (mean - beta*std)

update_sets()¶: Update the sets S, S_hat and G taking with the available observation