Paper Review/3D Reconstruction (3DGS, NERF, LRM)

[CVPR 2022 Oral] Point-NeRF: Point-based Neural Radiance Fields

이성훈 Ethan 2023. 8. 8. 08:48

- Introduction

 

 

NeRF: High-quality view synthesis

 

Deep multi-view stereo methods: Quickly reconstruct scene geometry via direct network inference, 일반적으로 MVS 라고 많이 부르는 듯

 

► Point-NeRF: 3D point cloud 를 이용하여 위 두 방법의 장점을 합친 방법

 

  1. Point-NeRF 는 scene surface 주변에 있는 neural point features 를 합쳐서 효과적으로 render
  2. Pre-trained 된 network 의 inference 를 통해 Point-NeRF 를 initialize 할 수 있음 → point cloud 생성

 

Original NeRF 보다 속도도 빠르고 품질도 좋음 / COLMAP 과 비교했을 때 빈 부분 없이 rendering 을 잘함

 

NeRF: Reconstruction with MLPs leads to a long time, Purely depends on per-scene fitting

 

Point-NeRF: Effectively initialized via dnn, pre-trained across scenes, Leverage point clouds

 

 

기존 point-based rendering 기술들은 Rasterization 과 2D CNN operating 을 이용함

 

point-NeRF 의 경우 이러한 neural point 들을 3D 상에서 처리

 

Deep multi-view stereo (MVS) tech 사용

 

Train CNN for 2D feature extraction

 

Train point generation module for novel view image rendering

 

 

point growing 과 pruning 을 통해 COLMAP 과 같은 recon tech 를 사용했을 때 나타나는 hole 이나 outlier 문제를 해결

 


- Related Work

 

 

Scene Representations


Traditional 3D Representations

 

  • Volume
    • An end-to-end 3D neural network for multiview stereopsis (ICCV 2017)
    • A theory of shape by space carving (IJCV 2000)
    • Volumetric and multi-view cnns for object classification on 3d data (CVPR 2016)
    • Photorealistic scene reconstruction by voxel coloring (IJCV 1999)
    • 3d shapenets: A deep representation for volumetric shapes (CVPR 2015)
  • Point clouds
    • Learning representations and generative models for 3D point clouds (ICML 2018)
    • Pointnet: Deep learning on point sets for 3d classification and segmentation (CVPR 2017)
    • MVPnet: Multi-view point regression networks for 3D object reconstruction from a single image (AAAI 2019)
  • Mesh
    • Learning category-specific mesh reconstruction from image collections (ECCV 2018)
    • Pixel2mesh: Generating 3d mesh models from single RGB images (ECCV 2018)
  • Depth maps
    • Deepmvs: Learning multi-view stereopsis (CVPR 2018)
    • Learning depth from single monocular images using deep convolutional neural fields (TPAMI 2016)
  • Implicit functions
    • Learning implicit fields for generative shape modeling (CVPR 2019)
    • Occupancy networks: Learning 3d reconstruction in function space (CVPR 2019)
    • Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision (CVPR 2020)
    • Plenoxels: Radiance fields without neural networks (CVPR 2022 Oral)

 

 

Recent Neural Scene Representations

 

  • Deep reflectance volumes: Relightable reconstructions from multi-view photometric images (ECCV 2020)
  • Neural volumes: Learning dynamic renderable volumes from images (SIGGRAPH 2019)
  • Deepvoxels: Learning persistent 3D feature embeddings (CVPR 2019)
  • Stereo magnification: learning view synthesis using multiplane images (ACM TOG 2018)

 

Global MLPs

 

  • Nerfies: Deformable neural radiance fields (ICCV 2021)
  • Nerf++: Analyzing and improving neural radiance fields (arXiv 2020)

 

 

Multi-view Reconstruction and Rendering

 

  • Structure-from-motion
    • Structure-from-motion revisited (CVPR 2016)
    • BA-net: Dense bundle adjustment network (ICLR 2019)
    • Sfmnet: Learning of structure and motion from video (arXiv 2017)
  • Multi-view stereo techniques
    • Deep stereo using adaptive thin volume representation with uncertainty awareness (CVPR 2020)
    • Accurate, dense, and robust multiview stereopsis (TPAMI 2009)
    • A theory of shape by space carving (IJCV 2000)
    • Pixelwise View Selection for Unstructured Multi-View Stereo (ECCV 2016)
    • MVSnet: Depth inference for unstructured multi-view stereo (ECCV 2018)

 

 

Neural radiance fields

 

  • Dynamic scene capture
    • Neural scene flow fields for space-time view synthesis of dynamic scenes (CVPR 2021)
    • Hypernerf: A higher-dimensional representation for topologically varying neural radiance field (arXiv 2021)
  • Relighting
    • Neural reflectance fields for appearance acquisition (arXiv 2020)
    • Nerd: Neural reflectance decomposition from image collections (ICCV 2021)
  • Appearance editing
    • Neutex: Neural texture mapping for volumetric neural rendering (CVPR 2021)
  • Fast rendering
    • Plenoctrees for real-time rendering of neural radiance fields (arXiv 2021)
    • Baking neural radiance fields for real-time view synthesis (arXiv 2021)
  • Generative models
    • pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis (CVPR 2021)
    • Giraffe: Representing scenes as compositional generative neural feature fields (CVPR 2021)
    • Graf: Generative radiance fields for 3d-aware image synthesis (arXiv 2020)
  • Following the original NeRF
    • Neural reflectance fields for appearance acquisition (arXiv 2020)
    • Neural scene flow fields for space-time view synthesis of dynamic scenes (CVPR 2021)
    • Hypernerf: A higher-dimensional representation for topologically varying neural radiance field (arXiv 2021)
    • Neutex: Neural texture mapping for volumetric neural rendering (CVPR 2021)
  • Generalizable radiance fields
    • PixelNeRF (CVPR 2021): 2D image feature
    • IBRNet (CVPR 2021): 2D image feature
    • MVSNeRF (ICCV 2021): Fast voxel-based radiance field reconstruction, but 
    • PointNeRF (this paper): 3D neural points

 


- Method

 

 

 

 

Point-NeRF Representation

 

Point-based radiance field

 

$P=\left\{\left(p_i, f_i, \gamma_i\right) \mid i=1, \ldots, N\right\}$: Neural Point Cloud - 이를 통해 Radiance field 를 만들어냄

 

$p_i$: Location at $i$

 

$f_i$: Neural feature vector - Encodes the local scene content

 

$\gamma_i \in[0,1]$: Confidence value - Represents how likely that point is being located near an actual scene surface

 

 

3D location $x$ 가 주어졌을 때, 반경 $R$ 을 기준으로 주변 $K$ 개의 neighboring neural point 를 query

 

$(\sigma, r)=$ Point-NeRF $\left(x, d, p_1, f_1, \gamma_1, \ldots, p_K, f_K, \gamma_K\right)$

 

PointNet 같은 neural network 사용, 여러 개의 sub-MLP 사용

 

일단 각 neural point 에 대한 processing 진행 후, multi-point information aggregate

 

 

Per-point processing

 

MLP $F$ processes each neighboring neural point to predict a new feature vector for the shading location $x$ by

 

$f_{i, x}=F\left(f_i, x-p_i\right)$

 

 

View-dependent radiance regression

 

$f_x=\sum_i \gamma_i \frac{w_i}{\sum w_i} f_{i, x}$, where $w_i=\frac{1}{\left\|p_i-x\right\|}$

 

Inverse-distance weight $w_i$ 를 통해 neural feature aggregate, 가까이 있을수록 더 높은 contribution

 

 

Density regression

 

$\sigma_i=T\left(f_{i, x}\right)$

 

$\sigma=\sum_i \sigma_i \gamma_i \frac{w_i}{\sum w_i}, w_i=\frac{1}{\left\|p_i-x\right\|}$

 

 

Point-NeRF Reconstruction

 

주황색 화살표가 radiance field initialization, 파란색 화살표가 per-scene optimization

 

Generating initial point-based radiance fields

 

  • Point location and confidence
    • Leverage 3D CNNs which produce high-quality stuff
    • $I_q$: Input image
    • $\Phi_q$: Camera parameters
    • $q$: Viewpoint
    • MVSNet: (1) Build plane-swept cost volume by warping 2D image features, (2) Regress depth probability volume using deep 3D CNNs
    • $\left\{p_i, \gamma_i\right\}=G_{p, \gamma}\left(I_q, \Phi_q, I_{q_1}, \Phi_{q_1}, I_{q_2}, \Phi_{q_2}, \ldots\right)$
  • Point features
    • 2D CNN $G_f$: VGG net
    • $\left\{f_i\right\}=G_f\left(I_q\right)$
  • End-to-end reconstruction

 

 

Optimizing point-based radiance fields

 

가끔 COLMAP 과 같은 external recon method 들은 불안정한 관계로 hole 이나 outlier 를 만들 수 있음

 

따라서 다음을 제안

 

  • Point pruning
    • $\mathcal{L}_{\text {sparse }}=\frac{1}{|\gamma|} \sum_{\gamma_i}\left[\log \left(\gamma_i\right)+\log \left(1-\gamma_i\right)\right]$
    • Confidence values for pruning unnecessary outlier points
  • Point growing
    • $\alpha_j=1-\exp \left(-\sigma_j \Delta_j\right), \quad j_g=\underset{j}{\operatorname{argmax}} \alpha_j$
    • Cover missing regions

 


- Experiment

 

 

Datasets

 

  • DTU dataset
  • NeRF Synthetic dataset
  • Tanks and Temples

 

DTU dataset

IBRNet 이 성능이 더 좋은 대신 큰 network 를 사용하여 optimize 하는데 더 오래 걸림

 

 

NeRF-Synthetic dataset

 

 

 

 


- Discussion

 

 


- Reference

 

[1] Xu, Qiangeng, et al. "Point-nerf: Point-based neural radiance fields." CVPR 2022 [Paper link]