Processing math: 100%

Paper Review/3D Reconstruction (3DGS, NERF, LRM)

[CVPR 2021] pixelNeRF: Neural Radiance Fields from One or Few Images

이성훈 Ethan 2023. 4. 15. 21:36

- Introduction

 

 

Problem define: 기존 NeRF 는 너무 많은 수의 image 를 요구하며 너무 긴 optimization 시간으로 인해 impractical

 

► pixelNeRF 는 image feature 를 사용하지 않는 NeRF 와 달리, 각 pixel 에 aligned 된 spatial image feature 를 input 으로 사용

 

► pixelNeRF 는 NeRF 와 달리 few input image 로 잘 작동함

 

Framework

  • Single Image
    1. Input image → Fully convolutional image feature grid
    2. Sample the corresponding image feature via projection and bilinear interpolation
    3. Query specification is sent along with the image features to NeRF network

 

  • Multiple Image
    1. Input image → Latent representation in each camera's coordinate frame
    2. Pooled in an intermediate layer

 

pixelNeRF 와 달리 기존 NeRF 는 few image 로 poor 결과를 보임

 

View space: Viewer-centered coordinate system

Canonical space: Objected-centered coordinate system


- Method

 

 

Total architecture

 

  • Fully-convolutional image encoder E: Input image 를 pixel-aligned feature grid 로 encode
    • Pretrained ResNet-34

 

  • NeRF network f: Outputs color and density

 

 

Single-Image pixelNeRF

 

 

Input image I

 

Feature volume W=E(I)

 

Camera ray x: Retrieve the corresponding image feature by projecting  x to π(x)

 

Feature vector W(π(x)): Bilinearly interpolating between the pixelwise features

 

Image features, position, viewing direction is passed into the NeRF network

 

f(γ(x),d;W(π(x)))=(σ,c)

 

γ(): Positional encoding

 

d: Viewing direction

 

x: Query point

 

 

Multiple Views

 

기존 연구에선 test time 에 single input view 만을 사용하던 것과는 다르게, pixelNeRF 에선 test time 에 arbitrary number 의 input view 를 사용하여 additional information 을 제공

 

ith input image I(i)

 

Camera transform from the world space to its view space with  P(i)=[R(i)t(i)]

 

x(i)=P(i)x,d(i)=R(i)d

 

 

V(i)=f1(γ(x(i)),d(i);W(i)(π(x(i))))

 

V(1) 부터 V(i) 의 mean 을 구해 f2 로 이동

 

(σ,c)=f2(ψ(V(1),,V(n)))

 

만약 single input 인 경우엔 그냥 f=f2f1


- Experiment

 

 

Datasets

 

  • ShapeNet 
    • Category-specific
    • Category-agnostic

 

  • ShapeNet scenes
    • Unseen categories
    • Multiple objects
    • Domain transfer to real car photos

 

  • DTU MVS dataset
    • Real scenes

 

Baselines

  • SRN
  • DVR
  • SoftRas for category-agnostic setting

 

Metrics

  • PSNR
  • SSIM
  • LPIPS

 

Category-specific

 

Separate model for cars and chairs

 

 

 

Test time 에 2개의 input view 를 넣었을 때 reconstruction 이 더 잘 됨을 알 수 있음

 

 

 

Category-agnostic

 

Single model for 13 largest ShapeNet categories

 

 

 

Unseen-categories

 

 

Multiple-objects

 

 


- Discussion

 


- Reference

 

[1] Yu, Alex, et al. "pixelnerf: Neural radiance fields from one or few images." CVPR 2021 [Paper link]