We introduce HAND, a simple and time-efficient method for teaching robots manipulation tasks through human hand demonstrations.

Video: Real-Time Learning in just 3:30!

HAND: Fast Robot Adaptation via Hand Path Retrieval

hands_overview
  1. Sub-Trajectory Preprocessing: We first segment the hand demonstrations, $D_{\text{hand}}$, and the offline play dataset, $D_{\text{play}}$, into variable-length sub-trajectories using a simple heuristic based on proprioception.
  2. Visual Filtering: Before retrieving sub-trajectories with paths, we first run a visual filtering step to ensure that the sub-trajectories we retrieve will be task-relevant. We use an object-centric visual foundation model, namely DINOv2, to first filter out sub-trajectories performing unrelated tasks with different objects.
  3. Retrieving Sub-Trajectories: We then employ S-DTW to match the target sub-trajectories, $T_{\text{hand}}$, to the set of visually filtered segments, $T_{\text{play}}$.
  4. Adapter Fine-tuning: We leverage parameter-efficient fine-tuning using task-specific adapters—small trainable modules that modulate the behavior of the frozen base policy.
  5. Loss Re-Weighting: To prioritize the most behaviorally aligned examples, we reweight the BC loss with an exponential term following Advantage-Weighted Regression where each sub-trajectory is weighted based on its similarity (from S-DTW) to the hand demonstration.
hand_pseudocode



Real Robot Evaluation Tasks

real_robot_tasks

We evaluate HAND on 5 different real robot tasks: Reach Block, Push Button, Close Microwave, Put Keurig in Coffee Machine, and Carrot Blender. The last two are long-horizon tasks, requiring more than 100 timesteps of execution.



real_robot_results

Real-world single task retrieval experiment plotting number of rollout successes out of 10 of $\pi_\text{base}$, Flow, STRAP, and HAND.



Qualitative Retrieval Results from IPhone Hand Demonstration

HAND works with demos from unseen environments.

iphone_retrieval_results
We visualize the top sub-trajectory match of STRAP, HAND without visual filtering (HAND(-VF)), and HAND on two out-of-domain demonstrations recorded from an iPhone camera, showing approaching a Keurig coffee pod and putting it into the machine. Only HAND's top match is relevant for both hand demonstrations.



User Study

HAND enables real-time, data-efficient policy learning of long-horizon tasks.

teleop_study
Two users, unfamiliar with HAND, are asked to collect trajectories either via teleoperation (Left) or using their hands (Right) for the $\texttt{Put Keurig in Coffee Machine}$ task. HAND retrieval achieves an \todo{XX} success rate with the same amount of demonstrations using only \todo{XX} less time.
fast_adaptation_study
We conduct a small-scale user study to demonstrate HAND's ability to learn robot policies in real-time. From providing the hand demonstration (Left), to retrieval and fine-tuning a base policy (Middle), to evaluating the policy (Right), we show that HAND can learn to solve the $\texttt{Blend Carrot}$ task with over $70\%$ success rate in less than 3 minutes.
user_study



Evaluation Videos

Reach Green Block

STRAP (Baseline)

STRAP reach failure

 

HAND (Ours)

HAND reach success

Push Button

STRAP (Baseline)

STRAP push failure

 

HAND (Ours)

HAND push success

Close Microwave

STRAP close microwave failure

 

HAND reach success

Put Keurig in Coffee Machine

STRAP keurig failure

 

HAND keurig success

Carrot Blender

STRAP carrot blender failure

 

HAND carrot blender success

BibTeX


        @article{hong2025handdatafastrobot,
            title={HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval}, 
            author={Matthew Hong and Anthony Liang and Kevin Kim and Harshitha Rajaprakash and Jesse Thomason and Erdem Bıyık and Jesse Zhang},
            year={2025},
            eprint={2505.20455},
            archivePrefix={arXiv},
            primaryClass={cs.RO},
            journal={arXiv preprint arxiv:2505.20455}, 
      }