All models use multi-GPU setting with a total batch size of 4096 on ImageNet-1k and 1024 on ImageNet-22k. Training from scratch on ImageNet-1k. python -m torch.distributed.launch --nproc_per_node=8 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results