We introduce HAND, a simple and time-efficient method for teaching robots manipulation tasks through human hand demonstrations.
Video: Real-Time Learning in just 3:30!
HAND: Fast Robot Adaptation via Hand Path Retrieval
Sub-Trajectory Preprocessing: We first segment the hand demonstrations, $D_{\text{hand}}$, and the offline play dataset, $D_{\text{play}}$, into variable-length sub-trajectories using a simple heuristic based on proprioception.
Visual Filtering: Before retrieving sub-trajectories with paths, we first run a visual filtering step to ensure that the sub-trajectories we retrieve will be task-relevant. We use an object-centric visual foundation model, namely DINOv2, to first filter out sub-trajectories performing unrelated tasks with different objects.
Retrieving Sub-Trajectories: We then employ S-DTW to match the target sub-trajectories, $T_{\text{hand}}$, to the set of visually filtered segments, $T_{\text{play}}$.
Adapter Fine-tuning: We leverage parameter-efficient fine-tuning using task-specific adapters—small trainable modules that modulate the behavior of the frozen base policy.
Loss Re-Weighting: To prioritize the most behaviorally aligned examples, we reweight the BC loss with an exponential term following Advantage-Weighted Regression where each sub-trajectory is weighted based on its similarity (from S-DTW) to the hand demonstration.
Real Robot Evaluation Tasks
We evaluate HAND on 5 different real robot tasks: Reach Block, Push Button, Close Microwave, Put Keurig in Coffee Machine, and Carrot Blender. The last two are long-horizon tasks, requiring more than 100 timesteps of execution.
Real-world single task retrieval experiment plotting number of rollout successes out of 10 of $\pi_\text{base}$, Flow, STRAP, and HAND.
Qualitative Retrieval Results from IPhone Hand Demonstration
HAND works with demos from unseen environments.
We visualize the top sub-trajectory match of STRAP, HAND without visual filtering (HAND(-VF)), and HAND on two out-of-domain demonstrations recorded from an iPhone camera, showing approaching a Keurig coffee pod and putting it into the machine. Only HAND's top match is relevant for both hand demonstrations.
User Study
HAND enables real-time, data-efficient policy learning of long-horizon tasks.
Two users, unfamiliar with HAND, are asked to collect trajectories either via teleoperation (Left) or using their hands (Right) for the $\texttt{Put Keurig in Coffee Machine}$ task.
HAND retrieval achieves an \todo{XX} success rate with the same amount of demonstrations using only \todo{XX} less time.
We conduct a small-scale user study to demonstrate HAND's ability to learn robot policies in real-time.
From providing the hand demonstration (Left), to retrieval and fine-tuning a base policy (Middle), to evaluating the policy (Right), we show that HAND can learn to solve the $\texttt{Blend Carrot}$ task with over $70\%$ success rate in less than 3 minutes.
Evaluation Videos
Reach Green Block
STRAP (Baseline)
HAND (Ours)
Push Button
STRAP (Baseline)
HAND (Ours)
Close Microwave
Put Keurig in Coffee Machine
Carrot Blender
BibTeX
@inproceedings{hong2025hand,
title={HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval},
author={Matthew Hong and Anthony Liang and Kevin Kim and Harshitha Rajaprakash and Jesse Thomason and Erdem B{\i}y{\i}k and Jesse Zhang},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2026}
}