Online receding horizon planning of multi-contact locomotion
Abstract
Legged robots can traverse uneven terrain by using multiple contacts between their limbs and the environment. Nevertheless, to enable reliable operation in the real world, legged robots necessarily require the capability to online re-plan their motions in response to changing conditions, such as environment changes, or state deviations due to external force perturbations. To approach this goal, Receding Horizon Planning (RHP) can be a promising solution. RHP refers to the planning framework that can constantly update the motion plan for immediate execution. To achieve successful RHP, we typically need to consider an extended planning horizon, which consists of an execution horizon that plans the motion to be executed, and a prediction horizon that foresees the future. Although the prediction horizon is never executed, it is important to the success of RHP. This is because the prediction horizon serves as a value function approximation that evaluates the feasibility and the future effort required for accomplishing the given task starting from a chosen robot state. Having such value information can guide the execution horizon toward the states that are beneficial for the future.
Nevertheless, computing such multi-contact motions for a legged robot to traverse uneven terrain can be time-consuming, especially when considering a long planning horizon. The computation complexity typically comes from the simultaneous resolution of the following two sub-problems: 1) selecting a gait pattern that specifies the sequence in which the limbs break and make contact with the environment; 2)synthesizing the contact and motion plan that determines the robot state trajectory along with the contact plan, i.e., contact locations and contact timings. The issue of gait pattern selection introduces combinatorial complexity into the planning problem,while the computation of the contact and motion plan brings high-dimensionality and non-convexity due to the consideration of complex non-linear dynamics constraints.
To facilitate online RHP of multi-contact motions, in this thesis, we focus on exploring novel methods to address these two sub-problems efficiently. To give more detail, we firstly consider the problem of planning contact and motion plans in an online receding horizon fashion. In this case, we pre-specifying the gait pattern as a priori. Although this helps us to avoid the combinatorial complexity, the resulting planning problem is still high-dimensional and non-convex, which can hinder online computation. To improve the computation speed, we propose to simplify the modeling of the value function approximation that is required for guiding the RHP. This leads to 1) Receding Horizon Planning with Multiple Levels of Model Fidelity, where we compute the prediction horizon with a convex relaxed model; 2) Locally- Guided Receding Horizon Planning—where we propose to learn an oracle to predict local objectives (intermediate goals) for completing a given task, and then we use these local objectives to construct local value functions to guide a short-horizon RHP. We evaluate our methods for planning centroidal trajectories of a humanoid robot walking on moderate slopes as well as large slopes where static stability cannot be maintained.The result of multi-fidelity RHP demonstrates that we can accelerate the computation speed by relaxing the model accuracy in the prediction horizon. However, the relaxation cannot be arbitrary. Furthermore, owing to the shortened planning horizon, we find that locally-guided RHP demonstrates the best computation efficiency (95%-98.6%cycles converge online). This computation advantage enables us to demonstrate online RHP for our real-world humanoid robot Talos walking in dynamic environments that change on-the-fly.
To handle the combinatorial complexity that arises from the gait pattern selection issue, we propose the idea of constructing a map from the task specifications to the gait pattern selections for a given environment model and performance objective(cost). We show that for a 2D half-cheetah model and a quadruped robot, a direct mapping between a given task and an optimal gait pattern can be established. We use supervised learning to capture the structure of this map in the form of gait regions.Furthermore, we also find that the trajectories in each gait region are qualitatively similar. We utilize this property to construct a warm-starting trajectory for each gait region, i.e., the mean of the trajectories discovered in each region. We empirically show that these warm-starting trajectories can improve the computation speed of our trajectory optimization problem up to 60 times when compared with random initial guesses. Moreover, we also conduct experimental trials on the ANYmal robot to validate our method.