Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal
Abstract
LiDAR has become an essential sensing modality in autonomous driving, robotics, and smart-city applications. However, ghost points (or ghost), which are false reflections caused by multi-path laser returns from glass and reflective surfaces, severely degrade 3D mapping and localization accuracy. Prior ghost removal relies on geometric consistency in dense point clouds, failing on mobile LiDAR's sparse, dynamic data. We address this by exploiting full-waveform LiDAR (FWL), which captures complete temporal intensity profiles rather than just peak distances, providing crucial cues for distinguishing ghosts from genuine reflections in mobile scenarios. As this is a new task, we present Ghost-FWL, the first and largest annotated mobile FWL dataset for ghost detection and removal. Ghost-FWL comprises 24K frames across 10 diverse scenes with 7.5 billion peak-level annotations, which is 100× larger than existing annotated FWL datasets.
Dataset
This section presents Ghost-FWL, the largest FWL dataset to date, which is specialized for ghost removal. Conventional LiDAR datasets provide only point cloud-level information, discarding the temporal multi-path information crucial for identifying ghosts caused by glass and reflective surfaces. Ghost-FWL addresses this gap by capturing complete temporal intensity histograms and providing peak-level annotations indicating the physical cause of each reflection (object, glass, ghost, or noise). Spanning 10 diverse scenes with 24,412 annotated frames and 7.5B peak-level labels, Ghost-FWL is 100× larger than prior annotated FWL datasets [Scheuble et al.], enabling learning-based ghost detection and removal at the waveform level.
Overview of the Ghost-FWL. Left: Our dataset includes both indoor and outdoor scenes. Based on the dense 3D maps as shown in Scene, we annotated FWL data with semantic labels: Ghost (red), Object (green), Glass (blue), Noise. Gray regions are excluded from annotation. Right shows the data acquisition setup and dataset statistics, including the incident angle distribution and LiDAR positions examples. Data were collected at three different times of day: Morning (AM10–12), Daytime (PM12–5), and Evening (PM5–7).
Dataset Point Clouds
Interactive visualization of Ghost-FWL point clouds. Select a scene to inspect the raw dataset geometry.
| Access & Platform | Sensor | Labels | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | Year | Public | Platform | FWL | LiDAR Dim. |
Ray Den. |
Ghost | FWL Data |
Frames/ Scenes† |
Annotated Peaks |
| UNIST[1] | 2017 | ✓ | Stationary | ✗ | 3D | 278 | ✓ | ✗ | -- | -- |
| Leddar PixSet[2] | 2021 | ✓ | Mobile | ✓ | 3D | 0.267 | ✗ | ✓ | -- | -- |
| Lee et al.[3] | 2023 | ✗ | Stationary | ✗ | 3D | 278 | ✗ | ✗ | -- | -- |
| FRACTAL[4] | 2024 | ✓ | Aerial | ✗ | 2D | -- | ✗ | ✗ | -- | -- |
| Scheuble et al.[5] | 2025 | ✗ | Mobile | ✓ | 3D | 2.56 | ✗ | ✓ | 0.24k / 2 | NA |
| Ghost-FWL (Ours) | 2025 | ✓ | Mobile | ✓ | 3D | 200 | ✓ | ✓ | 24k / 10 | 7.5B |
FWL: Full-Waveform LiDAR. †Frames/Scenes: number of annotated frames and number of scenes within the real-world FWL data.
[1] Yun et al., "Virtual Point Removal for Large-Scale 3D Point Clouds with Multiple Glass Planes", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
[2] Deziel et al., "An Opportunity for 3D Computer Vision to Go Beyond Point Clouds with a Full-Waveform LiDAR Dataset", IEEE International Intelligent Transportation Systems Conference, 2021.
[3] Lee et al., "Learning-Based Reflection-Aware Virtual Point Removal for Large-Scale 3D Point Clouds", IEEE Robotics and Automation Letters, 2023.
[4] Gaydon et al., "FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes", arXiv preprint arXiv:2405.04634, 2024.
[5] Scheuble et al., "Lidar Waveforms are Worth 40x128x33 Words", ICCV, 2025.
Method
Given FWL data, our framework predicts and removes ghost-related signals. Our model consists of a transformer-based encoder and an MLP head. We further introduce FWL-MAE, a masked autoencoder designed for representation learning on FWL data, explicitly trained to reconstruct peak position, amplitude, and width. The ghosts detected by our model are then removed from FWL data, and the cleaned data are utilized for downstream tasks such as SLAM and 3D object detection.
Results
Classification
Peak classification results and point cloud visualization after applying ghost removal. All results were obtained using the proposed framework. Red, green, and blue indicate Ghost, Object, and Glass, respectively.
SLAM
Trajectory and mapping generated by SLAM using Multi-Peak processing (left) and our ghost removal method (right). Multi-Peak processing includes numerous ghost points in the reconstructed map, leading to trajectory drift. The proposed method yields a trajectory that more closely follows the ground-truth path (white) by effectively removing ghost artifacts.
3D Object Detection
Qualitative evaluation of 3D object detection with Multi-Peak processing (left) and our ghost removal (right). Green bounding boxes indicate persons. With Multi-Peak, a ghost person is detected behind the glass wall, whereas our method suppresses this false detection.
BibTeX
TBA