Effective Receptive field (ERF) plays an important role in transform coding,
which determines how much redundancy can be removed at most during transform
and how many spatial priors can be utilized to synthesize textures during
inverse transform. Existing methods rely on stacks of small kernels, whose ERF
remains not large enough instead, or heavy non-local attention mechanisms,
which limit the potential of high resolution image coding. To tackle this
issue, we propose Large Receptive Field Transform Coding with Adaptive Weights
for Learned Image Compression (LLIC). Specifically, for the first time in
learned image compression community, we introduce a few large kernel-based
depth-wise convolutions to reduce more redundancy while maintaining modest
complexity. Due to wide range of image diversity, we propose to enhance the
adaptability of convolutions via generating weights in a self-conditioned
manner. The large kernels cooperate with non-linear embedding and gate
mechanisms for better expressiveness and lighter point-wise interactions. We
also investigate improved training techniques to fully exploit the potential of
large kernels. In addition, to enhance the interactions among channels, we
propose the adaptive channel-wise bit allocation via generating channel
importance factor in a self-conditioned manner. To demonstrate the
effectiveness of proposed transform coding, we align the entropy model to
compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC,
LLIC-TCM. Extensive experiments demonstrate our proposed LLIC models have
significant improvements over corresponding baselines and achieve
state-of-the-art performances and better trade-off between performance and
complexity.

Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC)

This article introduces a new method for learned image compression called Large Receptive Field Transform Coding with Adaptive Weights (LLIC). It addresses the issue of effectively removing redundancy and utilizing spatial priors to synthesize textures during inverse transform.

Existing methods in learned image compression often rely on stacks of small kernels or heavy non-local attention mechanisms. However, these approaches either limit the potential for high-resolution image coding or do not have a large enough Effective Receptive Field (ERF). The authors propose a solution by introducing a few large kernel-based depth-wise convolutions to reduce more redundancy while maintaining reasonable complexity.

One notable aspect of LLIC is its multi-disciplinary nature, drawing on concepts from various fields such as transform coding, multimedia information systems, animations, artificial reality, augmented reality, and virtual realities. This integration highlights the potential application of LLIC in these domains, as well as its potential impact on the wider field of multimedia information systems.

The authors also aim to enhance the adaptability of convolutions by generating weights in a self-conditioned manner, considering the wide range of image diversity. The large kernels in LLIC, combined with non-linear embedding and gate mechanisms, contribute to better expressiveness and lighter point-wise interactions.

To fully exploit the potential of large kernels, the authors investigate improved training techniques. This demonstrates their commitment to ensuring that LLIC achieves optimal performance and complexity trade-offs.

In addition to addressing the ERF issue, LLIC improves interactions among channels by proposing adaptive channel-wise bit allocation. This process involves generating channel importance factors in a self-conditioned manner, further enhancing the effectiveness of the proposed transform coding.

The authors evaluate the performance of LLIC by aligning the entropy model with existing transform methods. They obtain models called LLIC-STF, LLIC-ELIC, and LLIC-TCM. Extensive experiments demonstrate that these LLIC models outperform corresponding baselines and achieve state-of-the-art performance while maintaining a better trade-off between performance and complexity.

In summary, the introduction of LLIC in learned image compression represents a significant advancement in the field. By addressing the limitations of existing methods and incorporating large kernel-based convolutions, LLIC demonstrates the potential for improved image compression performance. Its multi-disciplinary nature also shows how concepts from various fields can be leveraged to further enhance multimedia information systems, animations, artificial reality, augmented reality, and virtual realities.

Read the original article