Understanding robot failures by looking through their eyes

Nominal Connect's new point clouds combine with URDFs and video to understand what your physical AI is seeing

Jun 17, 2025

This video shows a constant occurrence in the world of robotics. A simple task with a subtle failure:

The robot arm attempts to place a pen in a pan, but it falls over the edge.

A small, simple failure. The question is: why?

Diagnosing the root cause of a robot failure is difficult, especially for subtle errors like a slight misplacement or an incomplete task. There are many reasons why this task may have failed, but we can’t identify root cause by simply watching this video.

To improve reliability, engineers can’t just know what happened; they need to understand why.

The key is to see the world from the robot's perspective. This requires fusing and synchronizing all available data: video, robot models, depth perception, and algorithm outputs.

This article demonstrates how to use Nominal Connect to investigate this specific failure. We will systematically analyze the event by:

Verifying the robot's physical actions against its commands.
Examining its 3D environmental perception using point clouds.
Inspecting its object understanding with segmentation data.

Step 1: Verify execution – the robot followed its plan

👆 A replay and time scrubbing of the robot arm. By showing the commanded and measured joint angles and gripper state we can check how well the robot executed the plan.

The first step to diagnose robot misbehavior is to confirm its physical components followed the plan. Did the arm and gripper accurately execute their commands?

Within Nominal Connect, we start by loading all the recorded data from this test run. The platform provides a synchronized, multi-faceted replay:

Synced video footage: We can re-watch the original camera views (external and wrist-mounted) to pinpoint the exact moment of failure.
3D scene with arm animation: Alongside the video, a 3D model of the robot arm (its URDF representation) precisely re-enacts its movements, driven by the recorded joint state data.
Commanded actions vs. measured state: Nominal Connect allows us to overlay or plot the actions given to the robot (the commanded joint positions and gripper instructions) against the measured robot state (the actual data from its encoders and sensors).

By scrubbing the timeline to the critical moment – when the pen is released – we can meticulously compare:

Was the arm at the commanded state (gripper) and orientation (joint angles) for the pickup and drop?
Did the gripper close and open precisely when instructed?
Are there any significant lags or deviations between the intended motion profile and the actual motion achieved by the robot?

The replay in Nominal Connect confirms the robot executed its commands faithfully. The commanded and measured joint angles, along with the gripper state, track each other closely. This means the problem is not a simple control error or mechanical fault.

The robot followed the plan. Next, we investigate the perception that created it.

Step 2: Check 3D perception – the world model is accurate

Since execution was correct, we now investigate perception: Did the robot "see" the world correctly when it formed its plan?

Nominal Connect allows us to overlay 3D point cloud data, derived from the robot's stereo cameras, directly into the synchronized 3D scene alongside the robot's 3D (URDF) model. This point cloud represents the robot's depth perception, its estimate of surfaces and distances in its surroundings.

👆 Overlaying the point cloud from the stereo camera into the 3D model lets us check for alignment. Here, the alignment is good.

As we replay the sequence in Connect, we observe that the portion of the point cloud corresponding to the robot's own arm segments (e.g., its gripper, forearm) closely aligns with the 3D URDF model of the arm.

This alignment confirms the robot's depth perception and localization are sound. The error is not in the general perception of the scene.

With execution and the world model ruled out, we now investigate step 3.

Step 3: Inspect object understanding – error in segmentation overlays

Execution and general perception seem correct. The final question is: did the robot correctly identify and track the specific objects in the scene?

Nominal Connect enables us to visualize the output of the robot's object segmentation algorithms. This can be shown as colored masks overlaid directly onto the synchronized camera footage, highlighting what the robot identifies as "pen," "pan," "table," etc. Each recognized object class or instance can be assigned a distinct color. As we scrub the timeline in Nominal Connect, we look for unstable object identification.

👆 Watch the segmentation mask on the pen. Its color flickers, indicating the robot’s perception system is losing a stable track of the object.

The replay reveals a potential issue: the segmentation mask for the pen flickers and changes color. The robot loses a persistent track of the pen it is holding.

This observation provides a strong clue. Unstable object tracking suggests the robot's plan could be based on an unreliable perception. The robot failed to drop the pen in the pan because the robot lost ‘sight’ of the pen during critical moments.

This 3-step investigation ruled out execution and general perception as root causes, instead identifying segmentation overlays as the culprit. These visualizations give engineers a clear direction for investigation: the object tracking algorithm.

Nominal Connect: See more, build better robots

The pen missed the pan not because of a mechanical fault or a bad world model, but likely because of a fleeting, unstable perception of the object it targeted. This root cause was not evident in the original video. Triaging subtle failures requires looking through the robot's eyes.

The systematic, three-step analysis shown here – verifying actions, then perception, then object understanding – is made possible by Nominal Connect. It transforms raw robotics data (joint states, camera feeds, depth, perception outputs) into a clear, interactive replay of a robot's behavior.

This synchronized, multi-modal view helps teams:

Quickly find the root cause of elusive bugs.
Validate and iterate on perception and control algorithms.
Improve collaboration with a shared, system-level view.

Furthermore, test data can also be synced to Nominal Core, the analytics and data warehouse backbone of Nominal’s platform. This enables long-term data storage, traceability across serial numbers, software versions, cross-unit trend analysis, and seamless data sharing for engineering investigations.

By providing deep insight into individual failures and broad insight into long-term trends, Nominal equips engineers to build higher-quality robotic systems, reduce errors, and confidently advance automation.

Credits

Robot demonstration data derived from the DROID dataset.
Robot arm model (Franka Emika Panda) from public URDF.