Bin picking in real industrial environments remains challenging due to severe clutter, occlusions, and the high cost of traditional 3D sensing setups. We present Pickalo, a modular 6D pose–based bin-picking pipeline built entirely on low-cost hardware. A wrist-mounted RGB-D camera actively explores the scene from multiple viewpoints, while raw stereo streams are processed with BridgeDepth to obtain refined depth maps suitable for accurate collision reasoning. Object instances are segmented with a Mask R-CNN model trained purely on photorealistic synthetic data and localized using the zero-shot SAM-6D pose estimator. A pose buffer module fuses multi-view observations over time, handling object symmetries and significantly reducing pose noise. Offline, we generate and curate large sets of antipodal grasp candidates per object; online, a utility-based ranking and fast collision checking are queried for the grasp planning. Deployed on a UR5e with a parallel-jaw gripper and an Intel RealSense D435i, Pickalo achieves up to ~600 mean picks per hour with 96–99% grasp success and robust performance over 30-minute runs on densely filled euroboxes. Ablation studies demonstrate the benefits of enhanced depth estimation and the pose buffer for long-term stability and throughput in realistic industrial conditions.
Pickalo follows a modular architecture integrating state-of-the-art perception and planning components. A stereo-pair image is acquired and processed by a depth estimation block (BridgeDepth) to obtain enhanced depth. The resulting depth is aligned to the RGB frame and provided to the SAM-6D pose estimator together with the object CAD model. A Pose Buffer fuses multi-view pose estimates over time, handling symmetries and filtering unreliable detections. The scene state—composed of target objects, static objects (bin, table), and occupied voxels—is then used by the grasp planner, which ranks pre-computed antipodal grasp candidates and performs fast collision checking to find a feasible extraction trajectory.
Below is a sped-up video of Pickalo continuously emptying a densely filled eurobox. The system maintains high throughput (~600 picks/hour) and over 96% grasp success rate across 30-minute runs, demonstrating its long-term reliability in realistic industrial conditions.
Pickalo was validated on three classes of real industrial metallic components—featuring highly reflective surfaces and complex geometries—picked from densely filled standard euroboxes. The table below summarizes the grasp success rate over 1,000 pipeline iterations per category.
| Geometry | Success Rate (%) |
|---|---|
| Square | 98.8 |
| Cylindrical | 97.7 |
| Complex | 96.1 |
Using BridgeDepth instead of raw RealSense depth maintains 96.3% success rate over 30 minutes, whereas the RealSense baseline drops to 88.5%. The raw depth contains holes and artifacts from metallic reflections, leading to missed detections and erroneous collision checks over time.
The multi-view Pose Buffer increases MPPH by 10–12% and boosts success rate by 11–13 percentage points. By requiring at least two consistent observations before declaring an object graspable, it filters out unreliable pose estimates and dramatically reduces grasp failures.
Pickalo demonstrates that industrially relevant bin-picking performance does not require specialized or expensive sensing hardware. By carefully integrating modern 6D pose estimation, synthetic-data–driven instance segmentation, enhanced depth via deep stereo matching, and a temporal pose buffer, the system achieves up to ~600 mean picks per hour with 96–99% grasp success rates over continuous 30-minute runs on densely filled euroboxes with challenging metallic objects.
The modular architecture means each component—perception, pose fusion, grasp selection, and motion planning—can be replaced or upgraded independently as new algorithms become available, without redesigning the entire pipeline.
@article{pickalo2025,
author = {Alessandro Tarsi and Matteo Mastrogiuseppe and Saverio Taliani and Simone Cortinovis and Ugo Pattacini},
title = {Pickalo: Leveraging 6D Pose Estimation for Low-Cost Industrial Bin Picking},
journal = {},
year = {2025},
}