From motor control to embodied intelligence

Estimated read time: 5 min

Wireless

Using human and animal motion to teach robots to dribble a ball, simulate human figures to carry boxes, and play soccer

The human personality learns to traverse an obstacle course through trial and error, which can lead to particular solutions. Hess et al. Emergence of movement behaviors in enriched environments (2017).

Five years ago, we took on the challenge of teaching a fully articulated human character to pass obstacle courses. This demonstrated what reinforcement learning (RL) can achieve through trial and error but also highlighted two challenges to the solution embodied Intelligence:

  1. Reusing Previously Learned Behaviors: A large amount of data was required for the agent to “get off the ground”. Without any preliminary knowledge of the force to be applied to each of its joints, the worker began with random tingling of the body and fell swiftly to the ground. This problem can be mitigated by reusing previously learned behaviors.
  2. Distinctive behaviors: When the agent finally learned how to navigate obstacle courses, it did so using unnatural (albeit amusing) movement patterns that would be impractical for applications such as robots.

Here, we describe a solution to both challenges called neural probabilistic primitive motor (NPMP), including directed learning using movement patterns derived from humans and animals, and discuss how to use this approach in the Humanoid Football paper, published today in Science Robotics.

We also discuss how this same approach enables whole-body human vision manipulation, such as a human body holding an object, and real-world robotic control, such as a robot dribbling a ball.

Data distillation into NPMP-controllable raw drivers

The NPMP is a general-purpose motor control module that translates short-horizon motor intentions into low-level control signals, and is trained offline or via RL by simulating motion capture (MoCap) data, recorded using trackers on humans or animals running gestures of interest.

A learning agent to mimic the MoCap trajectory (shown in grey).

The form consists of two parts:

  1. An encoder takes a receptive pathway and compresses it into a kinetic intent.
  2. A low-level controller produces the next action given the current state of the operator and this motor intent.
Our NPMP model first distills the reference data into a low-level controller (left). This low-level controller can then be used as a plug-and-play engine controller in a new task (right).

After training, the low-level controller can be reused to learn new tasks, and the high-level controller is optimized to directly output the engine’s intent. This enables efficient exploration—coherent behaviors are produced, even with random motor intentions—and constrains the final solution.

Coordination of the emerging team in human football

Soccer has been a longstanding challenge of embodied intelligence research, requiring individual skills and coordinated team play. In our recent work, we used NPMP as a precursor step to guide motor skill learning.

The result was a team of players who progressed from learning the skills of chasing the ball to eventually learning coordination. Previously, in a study with simple avatars, we showed that coordinated behavior can emerge in teams that compete with each other. NPMP allowed us to observe a similar effect but in a scenario that required more advanced motor control.

Customers first imitate the movement of soccer players to learn the NPMP module (above). With NPMP, agents learn football-specific skills (bottom).

Our agents have acquired skills including fluid movement, passing, and division of labor as evidenced by a range of statistics, including metrics used in real-world sports analytics. Players display both high-frequency motor control and long-term decision-making that involves anticipating the behaviors of their teammates, resulting in coordinated team play.

An agent learns to play soccer competitively using Multi-Agent RL.


Whole-body manipulation of cognitive tasks using vision

Learning how to interact with objects using the arms is another difficult control challenge. NPMP can also enable this type of whole-body manipulation. Using a small amount of MoCap data to interact with the boxes, we can train an agent to carry a box from one place to another, using egocentric vision and with only a few reward cues:

Using a small amount of MoCap data (top), our NPMP approach can solve the box carrying task (bottom).

Similarly, we can teach the dealer how to catch and throw balls:

Simulation of catching and throwing a ball.

With NPMP, we can also tackle maze tasks that involve movement, perception, and memory:

A human simulation collecting blue balls in a maze.

Safe and effective control of realistic robots

NPMP can also help control real robots. Having a well-structured demeanor is crucial for activities such as walking over rough terrain or handling fragile objects. Jerky movements can damage the robot itself or its surroundings, or at least drain its battery. Therefore, significant effort is often invested in designing learning objectives that make the robot do what we want it to while behaving in a safe and efficient manner.

As an alternative, we investigated whether the use of biological motion-derived buds could give us well-structured, natural-looking, and reusable movement skills of legged robots, such as walking, running, and rotation suitable for deployment to real-world robots.

Starting with MoCap data from humans and dogs, we adapted the NPMP approach to train skills and simulated controllers that can then be deployed on real robots (OP3) and quadrupeds (ANYmal B), respectively. This allowed the user to direct the robots via a joystick or dribble the ball to a target location in a natural and powerful-looking manner.

The movement skills of any robot are learned by imitating the MoCap dog.
Movement skills can then be reused for controlled walking and dribbling.

The benefits of using neuroprobabilistic motor alternatives

In summary, we used the NPMP skill model to learn complex tasks with human characters in simulations and real-world robots. The NPMP packages low-level movement skills in a reusable way, making it easier to learn beneficial behaviors that are difficult to discover through unstructured trial and error. Using motion capture as a source of prior information, it biases motor control learning toward natural movements.

NPMP enables embodied agents to learn more quickly using RL; to learn more natural behaviors; To learn more about the safe, effective, and stable behaviors appropriate for bots in the real world; and combining whole-body motor control with longer-sighted cognitive skills, such as teamwork and coordination.

Learn more about our work:

Source link

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.