- Introduction
NeRF: High-quality view synthesis
Deep multi-view stereo methods: Quickly reconstruct scene geometry via direct network inference, 일반적으로 MVS 라고 많이 부르는 듯
► Point-NeRF: 3D point cloud 를 이용하여 위 두 방법의 장점을 합친 방법
- Point-NeRF 는 scene surface 주변에 있는 neural point features 를 합쳐서 효과적으로 render
- Pre-trained 된 network 의 inference 를 통해 Point-NeRF 를 initialize 할 수 있음 → point cloud 생성
NeRF: Reconstruction with MLPs leads to a long time, Purely depends on per-scene fitting
Point-NeRF: Effectively initialized via dnn, pre-trained across scenes, Leverage point clouds
기존 point-based rendering 기술들은 Rasterization 과 2D CNN operating 을 이용함
point-NeRF 의 경우 이러한 neural point 들을 3D 상에서 처리
Deep multi-view stereo (MVS) tech 사용
Train CNN for 2D feature extraction
Train point generation module for novel view image rendering
point growing 과 pruning 을 통해 COLMAP 과 같은 recon tech 를 사용했을 때 나타나는 hole 이나 outlier 문제를 해결
- Related Work
Scene Representations
Traditional 3D Representations
- Volume
- An end-to-end 3D neural network for multiview stereopsis (ICCV 2017)
- A theory of shape by space carving (IJCV 2000)
- Volumetric and multi-view cnns for object classification on 3d data (CVPR 2016)
- Photorealistic scene reconstruction by voxel coloring (IJCV 1999)
- 3d shapenets: A deep representation for volumetric shapes (CVPR 2015)
- Point clouds
- Learning representations and generative models for 3D point clouds (ICML 2018)
- Pointnet: Deep learning on point sets for 3d classification and segmentation (CVPR 2017)
- MVPnet: Multi-view point regression networks for 3D object reconstruction from a single image (AAAI 2019)
- Mesh
- Learning category-specific mesh reconstruction from image collections (ECCV 2018)
- Pixel2mesh: Generating 3d mesh models from single RGB images (ECCV 2018)
- Depth maps
- Deepmvs: Learning multi-view stereopsis (CVPR 2018)
- Learning depth from single monocular images using deep convolutional neural fields (TPAMI 2016)
- Implicit functions
- Learning implicit fields for generative shape modeling (CVPR 2019)
- Occupancy networks: Learning 3d reconstruction in function space (CVPR 2019)
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision (CVPR 2020)
- Plenoxels: Radiance fields without neural networks (CVPR 2022 Oral)
Recent Neural Scene Representations
- Deep reflectance volumes: Relightable reconstructions from multi-view photometric images (ECCV 2020)
- Neural volumes: Learning dynamic renderable volumes from images (SIGGRAPH 2019)
- Deepvoxels: Learning persistent 3D feature embeddings (CVPR 2019)
- Stereo magnification: learning view synthesis using multiplane images (ACM TOG 2018)
Global MLPs
- Nerfies: Deformable neural radiance fields (ICCV 2021)
- Nerf++: Analyzing and improving neural radiance fields (arXiv 2020)
Multi-view Reconstruction and Rendering
- Structure-from-motion
- Structure-from-motion revisited (CVPR 2016)
- BA-net: Dense bundle adjustment network (ICLR 2019)
- Sfmnet: Learning of structure and motion from video (arXiv 2017)
- Multi-view stereo techniques
- Deep stereo using adaptive thin volume representation with uncertainty awareness (CVPR 2020)
- Accurate, dense, and robust multiview stereopsis (TPAMI 2009)
- A theory of shape by space carving (IJCV 2000)
- Pixelwise View Selection for Unstructured Multi-View Stereo (ECCV 2016)
- MVSnet: Depth inference for unstructured multi-view stereo (ECCV 2018)
Neural radiance fields
- Dynamic scene capture
- Neural scene flow fields for space-time view synthesis of dynamic scenes (CVPR 2021)
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance field (arXiv 2021)
- Relighting
- Neural reflectance fields for appearance acquisition (arXiv 2020)
- Nerd: Neural reflectance decomposition from image collections (ICCV 2021)
- Appearance editing
- Neutex: Neural texture mapping for volumetric neural rendering (CVPR 2021)
- Fast rendering
- Plenoctrees for real-time rendering of neural radiance fields (arXiv 2021)
- Baking neural radiance fields for real-time view synthesis (arXiv 2021)
- Generative models
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis (CVPR 2021)
- Giraffe: Representing scenes as compositional generative neural feature fields (CVPR 2021)
- Graf: Generative radiance fields for 3d-aware image synthesis (arXiv 2020)
- Following the original NeRF
- Neural reflectance fields for appearance acquisition (arXiv 2020)
- Neural scene flow fields for space-time view synthesis of dynamic scenes (CVPR 2021)
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance field (arXiv 2021)
- Neutex: Neural texture mapping for volumetric neural rendering (CVPR 2021)
- Generalizable radiance fields
- PixelNeRF (CVPR 2021): 2D image feature
- IBRNet (CVPR 2021): 2D image feature
- MVSNeRF (ICCV 2021): Fast voxel-based radiance field reconstruction, but
- PointNeRF (this paper): 3D neural points
- Method
Point-NeRF Representation
Point-based radiance field
$P=\left\{\left(p_i, f_i, \gamma_i\right) \mid i=1, \ldots, N\right\}$: Neural Point Cloud - 이를 통해 Radiance field 를 만들어냄
$p_i$: Location at $i$
$f_i$: Neural feature vector - Encodes the local scene content
$\gamma_i \in[0,1]$: Confidence value - Represents how likely that point is being located near an actual scene surface
3D location $x$ 가 주어졌을 때, 반경 $R$ 을 기준으로 주변 $K$ 개의 neighboring neural point 를 query
$(\sigma, r)=$ Point-NeRF $\left(x, d, p_1, f_1, \gamma_1, \ldots, p_K, f_K, \gamma_K\right)$
PointNet 같은 neural network 사용, 여러 개의 sub-MLP 사용
일단 각 neural point 에 대한 processing 진행 후, multi-point information aggregate
Per-point processing
MLP $F$ processes each neighboring neural point to predict a new feature vector for the shading location $x$ by
$f_{i, x}=F\left(f_i, x-p_i\right)$
View-dependent radiance regression
$f_x=\sum_i \gamma_i \frac{w_i}{\sum w_i} f_{i, x}$, where $w_i=\frac{1}{\left\|p_i-x\right\|}$
Inverse-distance weight $w_i$ 를 통해 neural feature aggregate, 가까이 있을수록 더 높은 contribution
Density regression
$\sigma_i=T\left(f_{i, x}\right)$
$\sigma=\sum_i \sigma_i \gamma_i \frac{w_i}{\sum w_i}, w_i=\frac{1}{\left\|p_i-x\right\|}$
Point-NeRF Reconstruction
Generating initial point-based radiance fields
- Point location and confidence
- Leverage 3D CNNs which produce high-quality stuff
- $I_q$: Input image
- $\Phi_q$: Camera parameters
- $q$: Viewpoint
- MVSNet: (1) Build plane-swept cost volume by warping 2D image features, (2) Regress depth probability volume using deep 3D CNNs
- $\left\{p_i, \gamma_i\right\}=G_{p, \gamma}\left(I_q, \Phi_q, I_{q_1}, \Phi_{q_1}, I_{q_2}, \Phi_{q_2}, \ldots\right)$
- Point features
- 2D CNN $G_f$: VGG net
- $\left\{f_i\right\}=G_f\left(I_q\right)$
- End-to-end reconstruction
Optimizing point-based radiance fields
가끔 COLMAP 과 같은 external recon method 들은 불안정한 관계로 hole 이나 outlier 를 만들 수 있음
따라서 다음을 제안
- Point pruning
- $\mathcal{L}_{\text {sparse }}=\frac{1}{|\gamma|} \sum_{\gamma_i}\left[\log \left(\gamma_i\right)+\log \left(1-\gamma_i\right)\right]$
- Confidence values for pruning unnecessary outlier points
- Point growing
- $\alpha_j=1-\exp \left(-\sigma_j \Delta_j\right), \quad j_g=\underset{j}{\operatorname{argmax}} \alpha_j$
- Cover missing regions
- Experiment
Datasets
- DTU dataset
- NeRF Synthetic dataset
- Tanks and Temples
IBRNet 이 성능이 더 좋은 대신 큰 network 를 사용하여 optimize 하는데 더 오래 걸림
- Discussion
- Reference
[1] Xu, Qiangeng, et al. "Point-nerf: Point-based neural radiance fields." CVPR 2022 [Paper link]