Communication-Aware Implicit Neural Fields for Outdoor LiDAR Scene Reconstruction
Pysyvä osoite
Kuvaus
Opinnäytetyö kokotekstinä PDF-muodossa.
The problem of transmission and reconstruction of 3D geometry over wireless channels is an important one in autonomous systems and robotics, where geometry plays an important role, and bandwidth is a critical resource. The current methods for modelling geometry typically represent it as a list of 3D points, which is challenging to maintain continuous surface structure through downsampling to send geometry. The effects are a loss of spatial detail and unevenly reconstructed scenes particularly in outdoor environments with large areas to deal with.
This thesis presents a model that combines implicit neural distance fields with a multi-scale latent representation which is built for wireless transmission. One recent model, called LightNDF, is a lightweight implicit neural field model that reconstructs continuous 3D geometry from voxel occupancy grids using multi-scale CNN features, but was not intended for any kind of transmission function. This work adopts LightNDF as its backbone and significantly extends it for communication-aware deployment. The main contribution is the introduction of a structured latent pyramid which compresses the scene into three spatial pyramids of different resolution. Due to this, the transmission size is decreased from about 218 MB per sample to 0.57 MB. Joint source-channel coding is directly operated in the latent space, and their combination of a bottleneck channel encoder, occupancy-aware masking, a residual cross-scale coding, and an SNR-adaptive gating model, are all beneficial to robustness in the presence of channel noise. Multi-scale features are reconstructed through query based decoding into unsigned distance value at randomized location. Then, outputs are analysed within the SHINE-Mapping pipeline, measuring the spatial consistency between frames.
These experiments are performed on both KITTI and NewerCollege datasets with SNR of 5, 10 and 20 dB, and it can be observed that the proposed model achieves a more stable reconstruction with respect to SEPT, a state-of-the-art wireless point cloud transmission model. Unlike SEPT which achieves better compression by using a single global latent vector, the proposed model maintains the spatial structures across scales and has a better tolerance to channel noise. Results indicate that in the context of mapping and robotics, a spatially structured latent representation is superior to compact single-vector compression despite having a slightly higher transmission cost.
