ResNet

Observations

The first longer-scale training attempt at 114 epochs, over 1hr and 14 mins. We used a baseline ResNet architecture. As for data augmentations, we implemented basic transformations and CutBlur, over 128x128 patches for training.

At epoch 114, we reached a max validation of PSNR 21.16dB. The improvement grew to be stagnant, and even linear improvement, we were trending only 23-25 dB at epoch 2000 scheduled for 22 hours, so we escaped early.

Possible Improvements

The ResNet architecture could lack the same potential as something like UNet. We could increase the parameter count.
In an independent test on a single image, we measured we could overfit on a single image to get a PSNR of 39-45 PSNR, demonstrating potential. However, our augmentations may be too aggressive (e.g, CutBlur) for our model to truely learn the model. CutBlur would have to be introduced only if we begin to overfit, or if we scale up our training set.
We could experiment with learning rate adjustments.
BatchNorm is sometimes omitted in models as it may obscure pixel-level adjustments.
simm loss is not pixel-level based. Our goal is primarily PSNR. We could use or just use exclusively L1 weight or using L2 (MSE) loss, as PSNR is mathematically tied directly to MSE.
We could experiment with GELU or Swish over Relu.
Global residual connections are not always beneficial for performance or training stability (https://arxiv.org/pdf/2603.11323).