Abstract: Vision transformers (ViT) have shown promise in various vision tasks while the U-Net based on a convolutional neural network (CNN) remains dominant in diffusion models. We design a simple ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results