[CVPR 2023] GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection

Paper Review/Out-of-Distribution Detection (OOD)

[CVPR 2023] GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection

이성훈 Ethan 2023. 7. 4. 02:57

- Introduction

OOD detection scenarios

Covariate shift: Change in the input distribution
Semantic shift: Change in the label distribution

Existing OOD detection works

Predictive distribution
Incorporate feature statistics for ID data
Requires a portion of training data
Internal feature activation

GOAL: Explore and push the limits of OOD detection when the output of a softmax layer is the only available source of information - Post-hoc method

Main Contribution

Only uses predictive distribution
Performs very well (성능에 자신이 있다면 이것도 contribution 으로 가져갈 수 있군..)

- Related Work

Score Design

Maximum predicted Softmaz Probability (MSP)
Minimum Mahalanobis distance between feature and class-wise centroids
Energy Score
Hard Maximum of logits

Previous Methods

GradNorm
pNML
ViM
...

- Method

Generalized Entropy Score

Aim: Rely solely on the logits and predictive distribution

Agnostic to any information on the classifier training, training set, explicit OOD samples
Neural Collapse Hypothesis: Feature from the penulimate layer has very limited additional info compared to logits

Generalized Entropy $\mathit{G}$: Differentiable and concave function on the categorical distributions $\Delta^\mathit{C}$

Bregman Divergence: $D_G(\mathbf{p} \| \mathbf{q}):=G(\mathbf{q})-G(\mathbf{p})+(\mathbf{p}-\mathbf{q})^{\top} \nabla G(\mathbf{q})$

Assumption: $G$ is invariant under permutations of the elements in $p$

Bregman Divergence between $p$ and the uniform categorical distribution $u=1/C$ reduces to the negated generalized entropy

$\begin{aligned} D_G(\mathbf{p} \| \mathbf{u}) & =G(\mathbf{u})-G(\mathbf{p})+(\mathbf{p}-\mathbf{u})^{\top} \nabla G(\mathbf{u}) \\ & \doteq-G(\mathbf{p})+\underbrace{(\mathbf{p}-\mathbf{u})^{\top} \nabla G(\mathbf{u})}_{=0} .\end{aligned}$

$\nabla G(\mathbf{u})=\nabla G(\mathbf{1} / C)=\kappa \mathbf{1}$ 이기 때문에, 마지막 term 이 0 이 됨을 알 수 있음

► 결론적으로 negated entropy를 사용하는 것은 predictive distribution $p$와 uniform distribution $u$ 사이의 통계적 거리(statistical distance) 로 해석될 수 있다.

$G_\gamma(\mathbf{p})=\sum_j p_j^\gamma\left(1-p_j\right)^\gamma,\, \gamma \in (0,1)$

0과 1 사이에선 concave 함을 알 수 있음

Shannon Entropy 보다 Generalized Entropy 가 $\gamma$ 의 값이 작을수록 더 민감해짐

결과적으로 GEN 의 motivation 은 simple 하고 straightforward (라고 하는데... 정보이론에 익숙하지 않은 나에겐 잘 와닿지 않는다..)

아무튼 Generalized Entropy 를 사용하여 minor deviations 를 amplify 하는게 포인트 → Detail 을 좀 더 잘 잡아내지 않을까..

- Experiment

Datasets

ID
- ImageNet-1K
OOD
- OpenImage-O
- Texture
- iNaturalist
- ImageNet-O

ImageNet-1K penultimate layer output dimension and top-1 accuracy

Baseline

Big Transfer (BiT)
ViT
RepVGG
ResNet-50-D
DeiT
Swin

Per-Dataset performance of OOD detection methods

Average performance of OOD detection methods

- Discussion

사실 연구에 사용하기 위해 ViT에서 성능이 제일 좋았으면 좋았을텐데... 싶지만...

그래도 굉장히 간단한 방법으로 여러가지 dataset 및 baseline 에서 sota 를 찍었기 때문에 꽤나 의미가 있는 연구라고 생각된다.

- Reference

[1] Liu, Xixi, et al. "GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection." CVPR 2023 [Paper link]

저작자표시 (새창열림)

'Paper Review > Out-of-Distribution Detection (OOD)' 카테고리의 다른 글

[ECCV 2022] DICE: Leveraging Sparsification for Out-of-Distribution Detection (0)	2023.03.17

현재글[CVPR 2023] GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection

Ethan's Winery

이성훈 Ethan

용어, GAN, incremental learning, 딥러닝, dl, Continual Learning, fewshot, image classification,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Ethan's Winery