Robotics training data annotation

Annotation built
for robotics.

Dense, frame-accurate action labels for robot manipulation data — delivered in hours, at a fraction of manual labeling cost.

Talk to us
Trusted by

From raw episodes to training-ready data.

A single pipeline that turns messy robot footage into dense, quality-scored action labels.

01

Ingest your episodes

Connect raw teleop video, robot logs, and multimodal sensor streams. Shotwell ingests episodes at any scale — no reformatting required.

02

Dense, frame-level labeling

Our models watch every frame, segment continuous motion into discrete actions, and label each one against your task definition and SOP rubric.

03

QA'd, training-ready export

Every label is scored for quality and returned in hours — not weeks — ready to drop straight into your training and evaluation pipeline.

Built for the hardest robotics data.

Wherever motion is continuous and quality is non-negotiable, Shotwell produces labels your models can trust.

Manipulation & grasping

Frame-accurate labels for pick, place, and regrasp sequences across single- and bimanual arms.

Deformable & folding tasks

Dense action segmentation for cloth, cables, and other deformables where every frame matters.

Teleoperation episode QA

Automatically flag failed, noisy, or off-task episodes before they ever reach your training set.

Vision-language-action data

Task-aligned action labels and language grounding to fuel VLA and foundation-model post-training.

Multi-camera & multimodal fusion

Consistent labels across synchronized camera views, depth, and proprioceptive sensor streams.

Long-horizon segmentation

Break multi-minute episodes into clean, discrete sub-tasks scored against your SOP rubric.

Segment your training videos into labeled actions.

Shotwell watches every video frame-by-frame, chunks it into discrete actions, and labels those actions on quality and task definition.

episode_0431 · fold_shirt00:00.00
00:00.0000:19.07
Detected actionsPick up
Pick up00:00.00
Straighten00:01.00
Fold right sleeve00:07.00
Fold left sleeve00:10.00
Fold first third00:15.00
Fold second third00:16.00
Stack00:17.00
Home00:18.00

Quality is the biggest lever.

Data quality moves robot model performance more than anything else. Shotwell is built to maximize it at every frame.

Frame-levelDense labels on every frame, not sparse keyframes
HoursTurnaround, instead of weeks of manual labeling
A fractionOf the cost of traditional annotation vendors
YC-backedBuilt by a team from Tesla, Sunday Robotics & Gradio

Let's talk about your data: hello@shotwell.ai