- Introduction
OOD detection scenarios
- Covariate shift: Change in the input distribution
- Semantic shift: Change in the label distribution
Existing OOD detection works
- Predictive distribution
- Incorporate feature statistics for ID data
- Requires a portion of training data
- Internal feature activation
GOAL: Explore and push the limits of OOD detection when the output of a softmax layer is the only available source of information - Post-hoc method
Main Contribution
- Only uses predictive distribution
- Performs very well (성능에 자신이 있다면 이것도 contribution 으로 가져갈 수 있군..)
- Related Work
Score Design
- Maximum predicted Softmaz Probability (MSP)
- Minimum Mahalanobis distance between feature and class-wise centroids
- Energy Score
- Hard Maximum of logits
Previous Methods
- GradNorm
- pNML
- ViM
- ...
- Method
Generalized Entropy Score
Aim: Rely solely on the logits and predictive distribution
- Agnostic to any information on the classifier training, training set, explicit OOD samples
- Neural Collapse Hypothesis: Feature from the penulimate layer has very limited additional info compared to logits
Generalized Entropy $\mathit{G}$: Differentiable and concave function on the categorical distributions $\Delta^\mathit{C}$
Bregman Divergence: $D_G(\mathbf{p} \| \mathbf{q}):=G(\mathbf{q})-G(\mathbf{p})+(\mathbf{p}-\mathbf{q})^{\top} \nabla G(\mathbf{q})$
Assumption: $G$ is invariant under permutations of the elements in $p$
Bregman Divergence between $p$ and the uniform categorical distribution $u=1/C$ reduces to the negated generalized entropy
$\begin{aligned} D_G(\mathbf{p} \| \mathbf{u}) & =G(\mathbf{u})-G(\mathbf{p})+(\mathbf{p}-\mathbf{u})^{\top} \nabla G(\mathbf{u}) \\ & \doteq-G(\mathbf{p})+\underbrace{(\mathbf{p}-\mathbf{u})^{\top} \nabla G(\mathbf{u})}_{=0} .\end{aligned}$
$\nabla G(\mathbf{u})=\nabla G(\mathbf{1} / C)=\kappa \mathbf{1}$ 이기 때문에, 마지막 term 이 0 이 됨을 알 수 있음
► 결론적으로 negated entropy를 사용하는 것은 predictive distribution $p$와 uniform distribution $u$ 사이의 통계적 거리(statistical distance) 로 해석될 수 있다.
$G_\gamma(\mathbf{p})=\sum_j p_j^\gamma\left(1-p_j\right)^\gamma,\, \gamma \in (0,1)$
0과 1 사이에선 concave 함을 알 수 있음
Shannon Entropy 보다 Generalized Entropy 가 $\gamma$ 의 값이 작을수록 더 민감해짐
결과적으로 GEN 의 motivation 은 simple 하고 straightforward (라고 하는데... 정보이론에 익숙하지 않은 나에겐 잘 와닿지 않는다..)
아무튼 Generalized Entropy 를 사용하여 minor deviations 를 amplify 하는게 포인트 → Detail 을 좀 더 잘 잡아내지 않을까..
- Experiment
Datasets
- ID
- ImageNet-1K
- OOD
- OpenImage-O
- Texture
- iNaturalist
- ImageNet-O
Baseline
- Big Transfer (BiT)
- ViT
- RepVGG
- ResNet-50-D
- DeiT
- Swin
- Discussion
사실 연구에 사용하기 위해 ViT에서 성능이 제일 좋았으면 좋았을텐데... 싶지만...
그래도 굉장히 간단한 방법으로 여러가지 dataset 및 baseline 에서 sota 를 찍었기 때문에 꽤나 의미가 있는 연구라고 생각된다.
- Reference
[1] Liu, Xixi, et al. "GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection." CVPR 2023 [Paper link]
'Paper Review > Out-of-Distribution Detection (OOD)' 카테고리의 다른 글
[ECCV 2022] DICE: Leveraging Sparsification for Out-of-Distribution Detection (0) | 2023.03.17 |
---|