[CVPR 2021] ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Paper Review/Continual Learning (CL)

[CVPR 2021] ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

이성훈 Ethan 2023. 7. 3. 03:19

- Introduction

In real-world applications, incremental data are often partially labeled.

ex) Face Recognition, Fingerprint Identification, Video Recognition

→ Semi-Supervised Continual Learning: Insufficient supervision and large amount of unlabeled data.

SSCL 에선 기존 CL 에서 사용하던 regularization-based method, replay-based method 가 잘 작동하지 않음 (왜 Architecture-based method 를 뺐는지는 모르겠음)

다만, Joint training 에서 strong semi-supervised classifier 는 significantly outperform

→ Catastrophic forgetting of unlabeled data problem in SSCL

► Online replay with discriminator consistency (ORDisCo): Continually learns the classifier with GAN

Replay data is sampled from the conditional generator to the classifier in an online manner.

Main Contributions

Semi-supervised Continual learning
ORDisCo: classifier & conditional GAN
Various benchmarks

- Method

Problem Formulation: Labeled & Unlabeled data for each task data with CIL setting (No task label)

Conditional Generation on Incremental Semi-supervised Data

Triple-network structure which learns a classifier with conditional GAN (from SSL GAN sota)

Classifier $\mathit{C}$, Generator $\mathit{G}$, Discriminator $\mathit{D}$

Classifier Loss

$\mathit{L}_{\mathit{C},pl}(\theta_\mathit{C}) = \mathit{L}_{sl}(\theta_\mathit{C})+\mathit{L}_{ul}(\theta_\mathit{C})$

Discriminator Loss

$\mathit{L}_{\mathit{D},pl}(\theta_\mathit{D}) = \mathbb{E}_{x,y\sim b_l\cup smb}[\log{(D(x,y))}] +\alpha \mathbb{E}_{y'\sim p_{y'}, z \sim p_z}[\log{(1-D(G(z,y'),y'))}]$

$+(1-\alpha) \mathbb{E}_{x\sim bu}[\log{(1-D(x,\mathit{C}(x))}]$

Generator Loss

$\mathit{L}_{\mathit{G}}(\theta_\mathit{G}) = \mathbb{E}_{y'\sim p_{y'}, z \sim p_z}[\log{(1-D(G(z,y'),y'))}] $

Improving Continual Learning of Unlabeled Data (Forgetting 완화 방법)

각 행은 같은 레이블을 가지는 image로 구성. Conditional GAN 을 사용하여 Supervised Memory Buffer (SMB)를 구성한 (b) 와 달리 ORDisCO 를 사용하여 SMB를 구성한 (c) 가 outperform 하는 것을 알 수 있음.

(1) Online semi-supervised generative replay

Offline generative replay

All old generators are saved and conditional samples are replayed for the classifier
Each current task generator is saved

ORDisCo

Generate samples from $\mathit{G}$ and replay them to $\mathit{C}$
Time and storage efficient

Classifier Loss

$\mathit{L}_{\mathit{C}}(\theta_\mathit{C})=\mathit{L}_{\mathit{C},pl}(\theta_\mathit{C}) + \mathit{L}_{\mathit{G}\to \mathit{C}}(\theta_\mathit{C})$

Generator Loss

$\mathit{L}_{\mathit{G}\to \mathit{C}}(\theta_\mathit{C})=\mathbb{E}_{y'\sim p_{y'}, z \sim p_z}[\log{(1-D(G(z,y'),y'))}]$

$+\mathbb{E}_{y'\sim p_{y'},z\sim p_z, \epsilon,\epsilon '} [\left\|\mathit{C}(\mathit{G(x,y'),\epsilon }) -\mathit{C}(\mathit{G(x,y'),\epsilon '})\right\|]$

(2) Stabilization of discrimination consistency

Discriminator Loss

$\mathit{L}_{\mathit{D}}(\theta_\mathit{D})=\mathit{L}_{\mathit{D},pl}(\theta_\mathit{D}) + \lambda \sum_{i}^{}\xi _{1:b,i}(\theta_{\mathit{D},i}-\theta_{\mathit{D},i}^*)^2$

2nd term 은 Parameter Regularization term

- Experiment

Setting

New Instance[2]: All classes are shown in the first batch while subsequent instances of known classes become available over time
New Class: For each sequential batch, new object classes are available so that the model must deal with the learning of new classes without forgetting previously learned ones

Dataset

SVHN
CIFAR10
Tiny-ImageNet

New Instance: 30/30/10 batches with small amout of labels to each batch

New Class: Same split with New Instance, but 5 binary classification tasks

Architecture

Wide ResNet

Baseline

Mean teacher (MT)
Supervised Memory Buffer (SMB)
Unsupervised Memory Buffer (UMB)
Unified Classifier (UC)

New Instance

Tiny-ImageNet 10 batches average accuracy

New Class

- Discussion

확실히 2021 논문이기도 하고 SSCL 을 처음 제안하다보니, 최근 트렌드와는 조금 다른 setting 들이 보인다.

최근 트렌드에 맞게 multi-task classification 및 일반적인 CIL setting 이 필요해보인다.

- Reference

[1] Wang, Liyuan, et al. "Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning." CVPR 2021 [Paper link]

[2] Parisi, German I., et al. "Continual lifelong learning with neural networks: A review." Neural networks 2019 [Paper link]

저작자표시

'Paper Review > Continual Learning (CL)' 카테고리의 다른 글

[CVPR 2023] CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning (0)	2023.03.27
[CVPR 2020] Few-Shot Class-Incremental Learning (0)	2023.03.12

현재글[CVPR 2021] ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning

Ethan's Winery

이성훈 Ethan

image classification, 용어, dl, GAN, Continual Learning, fewshot, 딥러닝, incremental learning,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Ethan's Winery