결과 재현성을 위한 SEED 고정

배경

dacon 대회에서 재현성 확인을 위해 같은 코드로 결과를 다시 내어 제출했는데, 결과가 계속 달라져서 원인을 찾아보니 data loader에도 seed를 설정해줘야 한다는 것이다. 아래 공식문서를 살펴봐도 된다.

https://pytorch.org/docs/stable/notes/randomness.html

Reproducibility — PyTorch 2.5 documentation

Reproducibility Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds. However, t

pytorch.org

# 워커 초기화 시드 고정 함수
def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)

# 생성기 시드 고정
g = torch.Generator()
g.manual_seed(42)

#데이터 loader
train_loader = torch.utils.data.DataLoader(train_dataset_A_B,
                                            batch_size=CFG.BATCH_SIZE,
                                            shuffle=True,
                                            worker_init_fn=seed_worker,
                                            generator=g )

해당 코드로 DataLoader seed를 고정하여 재현성이 가능하도록 설정했다.

{'column_name': ['P7', 'P5', 'P26', 'P17', 'P21', 'P1'], 'input': tensor([[[0.1760],
         [0.1961],
         [0.1813],
         ...,
         [0.2464],
         [0.2464],
         [0.2464]],

결과 재현성을 위한 SEED 고정

import random
import numpy as np
import torch

# 시드 고정 함수
def seed_everything(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

seed_everything(42)

'경진대회' 카테고리의 다른 글

클라우드 GPU (0)	2025.02.02

배경

결과 재현성을 위한 SEED 고정

'경진대회' 카테고리의 다른 글

티스토리툴바