Multi-Quadruped Cooperative Object Transport: Learning Decentralized Pinch-Lift-Move

Pandit, Bikram; Shrestha, Aayam Kumar; Fern, Alan

Multi-Quadruped Cooperative Object Transport:
Learning Decentralized Pinch-Lift-Move

Bikram Pandit, Aayam Kumar Shrestha, Alan Fern

Oregon State University

decPLM is a decentralized reinforcement learning framework that enables teams of quadruped-arm robots to pinch, lift, and transport ungraspable objects using only local sensing and contact forces; without communication or rigid coupling. Trained with just two robots, the policy generalizes to teams of 2-10 robots and diverse object geometries and masses.

Abstract

We study decentralized cooperative transport using teams of N-quadruped robots with arm that must pinch, lift, and move ungraspable objects through physical contact alone. Unlike prior work that relies on rigid mechanical coupling between robots and objects, we address the more challenging setting where mechanically independent robots must coordinate through contact forces alone without any communication or centralized control. To this end, we employ a hierarchical policy architecture that separates base locomotion from arm control, and propose a constellation reward formulation that unifies position and orientation tracking to enforce rigid contact behavior. The key insight is encouraging robots to behave as if rigidly connected to the object through careful reward design and training curriculum rather than explicit mechanical constraints. Our approach enables coordination through shared policy parameters and implicit synchronization cues—scaling to arbitrary team sizes without retraining. We show extensive simulation experiments to demonstrate robust transport across 2-10 robots on diverse object geometries and masses, along with sim2real transfer results on lightweight objects.

Decentralized Pinch-Lift-Move (decPLM)

We train decPLM in large-scale parallel simulation and evaluate it on multi-robot teams, diverse payloads, and real quadruped-arm platforms.

Training in Simulation

decPLM is trained in IsaacLab with 2 Unitree Go2 quadrupeds and D1 arms using MAPPO and extensive domain randomization over dynamics, contact parameters, and sensor noise.

Generalization to 2-10 Robots

A single policy trained only with 2 robots transfers to teams of 2-10 robots without retraining, maintaining stable pinch-lift-move behaviors and improving tracking as team size increases.

Evaluation on Diverse Payloads

Policies trained on a box payload generalize to out-of-distribution objects such as logs, barrels, and couches with different geometries and masses, without task-specific retraining.

Sim-to-Real Transfer (2 robots)

Two Unitree Go2 robots successfully execute the learned policy on real hardware, performing smooth pinch, lift, and transport motions using decentralized control.

Sim-to-Real Transfer (3 robots)

The same decentralized policy extends naturally to a 3-robot team, showing synchronized real-world coordination driven entirely by shared policy parameters.

Sim-to-Real Transfer (4 robots)

A 4-robot team demonstrates cooperative transport with the same policy, highlighting the scalability and real-world transferability of decPLM's decentralized design.

Approach

decPLM combines a hierarchical loco-manipulation policy with a constellation-based reward to make fully decentralized robots behave as if they were rigidly coupled to the payload.

1. Shared Policy, Local Observations

A team of N quadruped-arm robots cooperatively transports a single rigid-body payload. Each robot runs the same high-level policy and acts only on local proprioception and a contact-frame command, with contact-frame pose information provided either continuously or only at initialization depending on the execution mode. There is no inter-robot communication or centralized controller—coordination emerges purely from the shared policy and contact interactions with the payload.

2. Hierarchical Loco-Manipulation Architecture

Hierarchical decPLM policy with shared high-level and low-level controllers for each quadruped-arm robot

decPLM uses a hierarchical policy structure that separates high-level manipulation from low-level locomotion. Each robot receives:

its proprioceptive state s,
a contact-frame command C_cf derived from the payload command,
and (in cf⁺ mode) the relative contact-frame pose ^bT_cf.

The high-level policy π^h_{θ_h} maps these inputs to:

arm joint targets for regulating contact forces, and
a desired base velocity command.

The low-level locomotion policy π^b_{θ_b} converts the base velocity into leg motor setpoints, and both arm and base targets are tracked by PD controllers. This hierarchy lets π^h_{θ_h} focus on manipulation and payload tracking while π^b_{θ_b} ensures stable quadruped locomotion.

3. Constellation-Based Reward

The central design choice is a constellation reward that encourages robots to behave as if they were rigidly attached to their contact frames on the payload. We define two sets of virtual landmark points:

An end-effector contact constellation aligns the contact pad with the commanded contact frame, enforcing consistent surface normals and stable pinching.
A base tracking constellation encourages the robot base to move consistently with the commanded payload motion, as if connected through a rigid kinematic chain.

Constellation reward aligning virtual points on the robot and payload to enforce rigid-like contact

By minimizing the mean squared distance between each pair of blue (robot-anchored) and green (payload-anchored) constellation points, the policy aligns both position and orientation of the robot relative to its contact frame. This allows the robots to behave as if rigidly linked to the payload, without explicit mechanical couplings or hand-designed force-balancing strategies.

4. Curriculum over Pinch, Lift, and Move

Learning the full pinch-lift-move behavior in one shot is difficult. We therefore employ a curriculum that gradually increases task complexity:

Phase	Demo	Description
Pinch		Robots learn to reach and establish stable contact at their assigned frames while the payload is held fixed with no motion commands, focusing entirely on contact formation.
Lift		Robots coordinate vertical forces to raise the payload in response to height commands, maintaining synchronized lifting under randomized payload masses.
Move		Robots perform full cooperative transport, tracking planar velocity, yaw, and height commands while regulating contact forces through coordinated loco-manipulation.

Key Results

A single decentralized policy trained with only 2 robots scales directly to teams of 2-10 robots without any retraining.
Because all robots share the same policy and act only on local observations, coordinated pinch-lift-move behaviors emerge without any communication.
The proposed constellation reward is essential for stable pinch-lift-move behaviors, significantly improving tracking accuracy and reducing payload drop rates across all team sizes.
The learned controller generalizes to diverse payload geometries and masses— including boxes, logs, barrels, and couches—despite being trained on a single box.
Larger teams naturally exhibit better force distribution and lower tracking error, demonstrating smooth scalability of the decentralized controller.
The policy transfers to real hardware, enabling 2-, 3-, and 4-robot teams to perform decentralized pinch, lift, and cooperative transport in the real world.

For full quantitative results, ablations, and analysis, please refer to the paper.

BibTeX

@misc{pandit2025multiquadrupedcooperativeobjecttransport,
  title         = {Multi-Quadruped Cooperative Object Transport: Learning Decentralized Pinch-Lift-Move},
  author        = {Pandit, Bikram and Shrestha, Aayam Kumar and Fern, Alan},
  year          = {2025},
  eprint        = {2509.14342},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url           = {https://arxiv.org/abs/2509.14342}
}

Previous Works

Learning Decentralized Multi-Biped Control for Payload Transport