Official Implementation of "LangOcc: Open Vocabulary Occupancy Estimation via Volume Rendering" by Boeder et al., presented at the 3DV 2025 conference.
conda create -n langocc python=3.8 -y
conda activate langoccPlease make sure to have CUDA 11.3 installed and in your PATH.
# install pytorch
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
# install openmim, used for installing mmcv
pip install -U openmim
# install mmcv
mim install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
# install mmdet, mmsegmentation and ninja
pip install mmdet==2.25.1 ninja==1.11.1Assuming your terminal is in the langocc directory:
pip install -v -e .We also need to install MaskCLIP to generate the ground truth vision-language features. Swap to the MaskCLIP directory and follow these steps:
cd MaskCLIP
# Install MaskCLIP requirements
pip install -r requirements.txt
pip install --no-cache-dir opencv-python
# Install CLIP
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
# Install MaskCLIP
pip install --no-cache-dir -v -e .
# Change back to root directory of the repo
cd ..-
Please create a directory
./datain the root directory of the repository. -
Download nuScenes AND nuScenes-panoptic [https://www.nuscenes.org/download].
-
Download the Occ3D-nuScenes dataset from [https://github.com/Tsinghua-MARS-Lab/Occ3D]. The download link can be found in their README.md.
-
Download the Open Vocabulary Benchmark from here, extract the contents and rename the directory to
retrieval_benchmark. -
Generate the annotation files. This will put the annotation files into the
./datadirectory by default. The process can take up to ~1h.
python tools/create_data_bevdet.py
python tools/create_data_bevdet.py --version v1.0-test # we also need the test info files for the open vocabulary benchmark- Copy or softlink the files into the
./datadirectory. The structure of the data directory should be as follows:
data
βββ nuscenes
β βββ v1.0-trainval (Step 2, nuScenes+nuScenes-panoptic files)
β βββ sweeps (Step 2, nuScenes files)
β βββ samples (Step 2, nuScenes files)
β βββ panoptic (Step 2, nuScenes-panoptic files)
βββ gts (Step 3)
βββ retrieval_benchmark (Step 4)
βββ nuscenes_infos_train.pkl (Step 5)
βββ nuscenes_infos_val.pkl (Step 5)
βββ bevdetv2-nuscenes_infos_train.pkl (Step 6)
βββ bevdetv2-nuscenes_infos_val.pkl (Step 6)
βββ bevdetv2-nuscenes_infos_test.pkl (Step 6)
βββ rays (See next chapter)
βββ embeddings (See next chapter)- Download pretrained backbone weights for ResNet-50. Because the original download link does not work anymore, please find an anonymized link below.
Download the checkpoint, create a directory
./ckptsand put the file in there.
Downloadlink:
We recommend to create two directories embeddings and rays in a location with enough disk space and softlink them into ./data, as the following scripts will write data to these locations (the ./data directory should look like in the tree above).
- First, we pre-generate all training rays, so that we do not have to store the complete feature maps (~7 GB).
cd MaskCLIP
python tools/generate_rays.py --exact- Next, we need to prepare the MaskCLIP weights from the original CLIP model.
# In the MaskCLIP directory:
mkdir -p ./pretrain
python tools/maskclip_utils/convert_clip_weights.py --model ViT16 --backbone
python tools/maskclip_utils/convert_clip_weights.py --model ViT16-
Download the pre-trained MaskCLIP weights from this link and put them to
ckpts/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k.pth. -
Afterwards, we can start generating the features for each ray. This process can take a long time, and takes ~535 GB of storage.
# In the MaskCLIP directory:
python tools/extract_features.py configs/maskclip_plus/anno_free/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k__nuscenes_trainvaltest.py --save-dir ../data/embeddings/MaskCLIP --checkpoint ckpts/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k.pth --complete --sampleYou can speed up the process by starting multiple instances of the generation script that each handle different token ranges. For example, if you want to parallelize 4 scripts, you could do:
# In the MaskCLIP directory:
python tools/extract_features.py configs/maskclip_plus/anno_free/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k__nuscenes_trainvaltest.py --save-dir ../data/embeddings/MaskCLIP --checkpoint ckpts/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k.pth --complete --sample --start 0 --end 8538
python tools/extract_features.py configs/maskclip_plus/anno_free/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k__nuscenes_trainvaltest.py --save-dir ../data/embeddings/MaskCLIP --checkpoint ckpts/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k.pth --complete --sample --start 8538 --end 17074
python tools/extract_features.py configs/maskclip_plus/anno_free/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k__nuscenes_trainvaltest.py --save-dir ../data/embeddings/MaskCLIP --checkpoint ckpts/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k.pth --complete --sample --start 17074 --end 25611
python tools/extract_features.py configs/maskclip_plus/anno_free/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k__nuscenes_trainvaltest.py --save-dir ../data/embeddings/MaskCLIP --checkpoint ckpts/maskclip_plus_vit16_deeplabv2_r101-d8_512x512_8k_coco-stuff164k.pth --complete --sample --start 25611 --end 34150We provide configuration files for the full and reduced version we use in the paper in the ./configs directory.
If you want to train the LangOcc (Full) model:
# In the root directory of the repository:
# single gpu
python tools/train.py configs/lang_occ/lang-occ_full.py
# multiple gpu (replace "num_gpu" with the number of available GPUs) - 4 GPU's are reccomended.
./tools/dist_train.sh configs/lang_occ/lang-occ_full.py num_gpuIn order to reproduce the results of the paper, please use 4 GPU's, so that the learning rate remains unchanged. Also, due to some non-deterministic operations, the results may deviate slightly (up or down) from the results presented in the paper.
If you want to train the LangOcc (Reduced) model, you need to first train the reducer. To do that, we first precompute the CLIP features of the vocabulary:
# In the root directory
python tools/create_class_embeddings.py --use-templatesThen, we can train the reducer model:
# In the root directory
python tools/train_clip_reducer.py --use-templatesNote that the loss of this training might go to nan, but this is okay.
Afterwards, we can train the LangOcc (Reduced) model:
# single gpu
python tools/train.py configs/lang_occ/lang-occ_reduced.py
# multiple gpu (replace "num_gpu" with the number of available GPUs) - 4 GPU's are reccomended.
./tools/dist_train.sh configs/lang_occ/lang-occ_reduced.py num_gpuAfter training, you can test the model on the open vocabulary benchmark or on Occ3D-nuScenes.
- Evaluate on the Open Vocabulary Retrieval benchmark:
# In the root directory:
python tools/eval_open_vocab.py --cfg lang-occ_full --ckpt epoch_18_ema --use-templates- Evaluate on the Occ3D-nuScenes benchmark:
Before evaluation, we precompute the CLIP features of the vocabulary we use to assign class labels (if you have not done this already in the step above).
python tools/create_class_embeddings.py --use-templatesAfterwards, we can start the evaluation:
# single gpu
python tools/test.py configs/lang_occ/lang-occ_full.py work_dirs/lang-occ_full/epoch_18_ema.pth --eval mIoU --use-templates
# multiple gpu
./tools/dist_test.sh configs/lang_occ/lang-occ_full.py work_dirs/lang-occ_full/epoch_18_ema.pth num_gpu --eval mIoU --use-templatesYou can also store the predicted occupancy for visualization by using the --save-occ-path flag:
# multiple gpu
./tools/dist_test.sh configs/lang_occ/lang-occ_full.py work_dirs/lang-occ_full/epoch_18_ema.pth num_gpu --eval mIoU --save-occ-path ./occIn the following, we list some common errors you might encounter while installing or running this repository, and how to fix them:
-
No kernel image found for bev_pool_v2
If you encounter this error, please uninstall mmdet3d again and make sure you have CUDA 11.3 installed and in your path. Also make sure you haveninja==1.11.1installed via pip. Then runpip install -v -e .again to compile the kernel images again. -
Error: "from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception"
In this case, please install the numpy version 1.23.5 viapip install numpy==1.23.5. -
Training stuck, no training logs printed
Sometimes thenerfaccextension will put a lock on the cuda files. If you do not see any training iteration logs after ~5mins, this might be the issue. Please interrupt the run and remove the lock under~/.cache/torch_extensions/py38_cu113/nerfacc_cuda/lock. Restart the training afterwards. -
Resume runs If the training is interrupted at any point and you want to resume from a checkpoint, you can simply use the
--resume-fromcommand as follows:
./tools/dist_train.sh configs/lang_occ/lang-occ_full.py num_gpu --resume-from /path/to/checkpoint/latest.pthThe checkpoints are usually saved under the work_dirs directory. By default, a checkpoint is created every 4 epochs.
- Environment
Please note that this code has only been tested on Linux machines. It is not guaranteed to work on Windows.
This project is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.
For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.
This software is a research prototype, solely developed for and published as part of the publication cited above.
Please feel free to open an issue or contact personally if you have questions, need help, or need explanations. Don't hesitate to write an email to the following email address: simon.boeder@de.bosch.com
The codebase is forked from BEVDet (https://github.com/HuangJunJie2017/BEVDet).
Copyright (c) 2022 Robert Bosch GmbH
SPDX-License-Identifier: AGPL-3.0