[CVPR 2022 Oral] Point-NeRF: Point-based Neural Radiance Fields

Paper Review/3D Reconstruction (3DGS, NERF, LRM)

[CVPR 2022 Oral] Point-NeRF: Point-based Neural Radiance Fields

이성훈 Ethan 2023. 8. 8. 08:48

- Introduction

NeRF: High-quality view synthesis

Deep multi-view stereo methods: Quickly reconstruct scene geometry via direct network inference, 일반적으로 MVS 라고 많이 부르는 듯

► Point-NeRF: 3D point cloud 를 이용하여 위 두 방법의 장점을 합친 방법

Point-NeRF 는 scene surface 주변에 있는 neural point features 를 합쳐서 효과적으로 render
Pre-trained 된 network 의 inference 를 통해 Point-NeRF 를 initialize 할 수 있음 → point cloud 생성

Original NeRF 보다 속도도 빠르고 품질도 좋음 / COLMAP 과 비교했을 때 빈 부분 없이 rendering 을 잘함

NeRF: Reconstruction with MLPs leads to a long time, Purely depends on per-scene fitting

Point-NeRF: Effectively initialized via dnn, pre-trained across scenes, Leverage point clouds

기존 point-based rendering 기술들은 Rasterization 과 2D CNN operating 을 이용함

point-NeRF 의 경우 이러한 neural point 들을 3D 상에서 처리

Deep multi-view stereo (MVS) tech 사용

Train CNN for 2D feature extraction

Train point generation module for novel view image rendering

point growing 과 pruning 을 통해 COLMAP 과 같은 recon tech 를 사용했을 때 나타나는 hole 이나 outlier 문제를 해결

- Related Work

Scene Representations

Traditional 3D Representations

Volume
- An end-to-end 3D neural network for multiview stereopsis (ICCV 2017)
- A theory of shape by space carving (IJCV 2000)
- Volumetric and multi-view cnns for object classification on 3d data (CVPR 2016)
- Photorealistic scene reconstruction by voxel coloring (IJCV 1999)
- 3d shapenets: A deep representation for volumetric shapes (CVPR 2015)
Point clouds
- Learning representations and generative models for 3D point clouds (ICML 2018)
- Pointnet: Deep learning on point sets for 3d classification and segmentation (CVPR 2017)
- MVPnet: Multi-view point regression networks for 3D object reconstruction from a single image (AAAI 2019)
Mesh
- Learning category-specific mesh reconstruction from image collections (ECCV 2018)
- Pixel2mesh: Generating 3d mesh models from single RGB images (ECCV 2018)
Depth maps
- Deepmvs: Learning multi-view stereopsis (CVPR 2018)
- Learning depth from single monocular images using deep convolutional neural fields (TPAMI 2016)
Implicit functions
- Learning implicit fields for generative shape modeling (CVPR 2019)
- Occupancy networks: Learning 3d reconstruction in function space (CVPR 2019)
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision (CVPR 2020)
- Plenoxels: Radiance fields without neural networks (CVPR 2022 Oral)

Recent Neural Scene Representations

Deep reflectance volumes: Relightable reconstructions from multi-view photometric images (ECCV 2020)
Neural volumes: Learning dynamic renderable volumes from images (SIGGRAPH 2019)
Deepvoxels: Learning persistent 3D feature embeddings (CVPR 2019)
Stereo magnification: learning view synthesis using multiplane images (ACM TOG 2018)

Global MLPs

Nerfies: Deformable neural radiance fields (ICCV 2021)
Nerf++: Analyzing and improving neural radiance fields (arXiv 2020)

Multi-view Reconstruction and Rendering

Structure-from-motion
- Structure-from-motion revisited (CVPR 2016)
- BA-net: Dense bundle adjustment network (ICLR 2019)
- Sfmnet: Learning of structure and motion from video (arXiv 2017)
Multi-view stereo techniques
- Deep stereo using adaptive thin volume representation with uncertainty awareness (CVPR 2020)
- Accurate, dense, and robust multiview stereopsis (TPAMI 2009)
- A theory of shape by space carving (IJCV 2000)
- Pixelwise View Selection for Unstructured Multi-View Stereo (ECCV 2016)
- MVSnet: Depth inference for unstructured multi-view stereo (ECCV 2018)

Neural radiance fields

Dynamic scene capture
- Neural scene flow fields for space-time view synthesis of dynamic scenes (CVPR 2021)
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance field (arXiv 2021)
Relighting
- Neural reflectance fields for appearance acquisition (arXiv 2020)
- Nerd: Neural reflectance decomposition from image collections (ICCV 2021)
Appearance editing
- Neutex: Neural texture mapping for volumetric neural rendering (CVPR 2021)
Fast rendering
- Plenoctrees for real-time rendering of neural radiance fields (arXiv 2021)
- Baking neural radiance fields for real-time view synthesis (arXiv 2021)
Generative models
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis (CVPR 2021)
- Giraffe: Representing scenes as compositional generative neural feature fields (CVPR 2021)
- Graf: Generative radiance fields for 3d-aware image synthesis (arXiv 2020)
Following the original NeRF
- Neural reflectance fields for appearance acquisition (arXiv 2020)
- Neural scene flow fields for space-time view synthesis of dynamic scenes (CVPR 2021)
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance field (arXiv 2021)
- Neutex: Neural texture mapping for volumetric neural rendering (CVPR 2021)
Generalizable radiance fields
- PixelNeRF (CVPR 2021): 2D image feature
- IBRNet (CVPR 2021): 2D image feature
- MVSNeRF (ICCV 2021): Fast voxel-based radiance field reconstruction, but
- PointNeRF (this paper): 3D neural points

- Method

Point-NeRF Representation

Point-based radiance field

$P=\left\{\left(p_i, f_i, \gamma_i\right) \mid i=1, \ldots, N\right\}$: Neural Point Cloud - 이를 통해 Radiance field 를 만들어냄

$p_i$: Location at $i$

$f_i$: Neural feature vector - Encodes the local scene content

$\gamma_i \in[0,1]$: Confidence value - Represents how likely that point is being located near an actual scene surface

3D location $x$ 가 주어졌을 때, 반경 $R$ 을 기준으로 주변 $K$ 개의 neighboring neural point 를 query

$(\sigma, r)=$ Point-NeRF $\left(x, d, p_1, f_1, \gamma_1, \ldots, p_K, f_K, \gamma_K\right)$

PointNet 같은 neural network 사용, 여러 개의 sub-MLP 사용

일단 각 neural point 에 대한 processing 진행 후, multi-point information aggregate

Per-point processing

MLP $F$ processes each neighboring neural point to predict a new feature vector for the shading location $x$ by

$f_{i, x}=F\left(f_i, x-p_i\right)$

View-dependent radiance regression

$f_x=\sum_i \gamma_i \frac{w_i}{\sum w_i} f_{i, x}$, where $w_i=\frac{1}{\left\|p_i-x\right\|}$

Inverse-distance weight $w_i$ 를 통해 neural feature aggregate, 가까이 있을수록 더 높은 contribution

Density regression

$\sigma_i=T\left(f_{i, x}\right)$

$\sigma=\sum_i \sigma_i \gamma_i \frac{w_i}{\sum w_i}, w_i=\frac{1}{\left\|p_i-x\right\|}$

Point-NeRF Reconstruction

주황색 화살표가 radiance field initialization, 파란색 화살표가 per-scene optimization

Generating initial point-based radiance fields

Point location and confidence
- Leverage 3D CNNs which produce high-quality stuff
- $I_q$: Input image
- $\Phi_q$: Camera parameters
- $q$: Viewpoint
- MVSNet: (1) Build plane-swept cost volume by warping 2D image features, (2) Regress depth probability volume using deep 3D CNNs
- $\left\{p_i, \gamma_i\right\}=G_{p, \gamma}\left(I_q, \Phi_q, I_{q_1}, \Phi_{q_1}, I_{q_2}, \Phi_{q_2}, \ldots\right)$
Point features
- 2D CNN $G_f$: VGG net
- $\left\{f_i\right\}=G_f\left(I_q\right)$
End-to-end reconstruction

Optimizing point-based radiance fields

가끔 COLMAP 과 같은 external recon method 들은 불안정한 관계로 hole 이나 outlier 를 만들 수 있음

따라서 다음을 제안

Point pruning
- $\mathcal{L}_{\text {sparse }}=\frac{1}{|\gamma|} \sum_{\gamma_i}\left[\log \left(\gamma_i\right)+\log \left(1-\gamma_i\right)\right]$
- Confidence values for pruning unnecessary outlier points
Point growing
- $\alpha_j=1-\exp \left(-\sigma_j \Delta_j\right), \quad j_g=\underset{j}{\operatorname{argmax}} \alpha_j$
- Cover missing regions

- Experiment

Datasets

DTU dataset
NeRF Synthetic dataset
Tanks and Temples

IBRNet 이 성능이 더 좋은 대신 큰 network 를 사용하여 optimize 하는데 더 오래 걸림

- Discussion

- Reference

[1] Xu, Qiangeng, et al. "Point-nerf: Point-based neural radiance fields." CVPR 2022 [Paper link]

저작자표시

'Paper Review > 3D Reconstruction (3DGS, NERF, LRM)' 카테고리의 다른 글

[ICLR 2024] LRM: Large reconstruction model for single image to 3d (0)	2024.12.21
[SIGGRAPH 2023] 3D Gaussian Splatting for Real-Time Radiance Field Rendering (0)	2024.01.08
[WACV 2023] Vision Transformer for NeRF-Based View Synthesis from a Single Input Image (0)	2023.04.15
[CVPR 2021] pixelNeRF: Neural Radiance Fields from One or Few Images (0)	2023.04.15
[ECCV 2020 oral] NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (0)	2023.03.12

현재글[CVPR 2022 Oral] Point-NeRF: Point-based Neural Radiance Fields

Ethan's Winery

이성훈 Ethan

Continual Learning, GAN, fewshot, image classification, 딥러닝, dl, 용어, incremental learning,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Ethan's Winery