The article introduces a Cloud-Device Collaborative Continual Adaptation framework to enhance the performance of compressed, device-deployed Multimodal Large Language Models (MLLMs). This framework addresses the challenge of deploying large-scale MLLMs on client devices, which often results in a decline in generalization capabilities when the models are compressed.
The framework consists of three key components:
1. Device-to-Cloud Uplink:
In the uplink phase, the Uncertainty-guided Token Sampling (UTS) strategy is employed to filter out-of-distribution tokens. This helps reduce transmission costs and improve training efficiency by focusing on relevant information for cloud-based adaptation.
2. Cloud-Based Knowledge Adaptation:
The proposed Adapter-based Knowledge Distillation (AKD) method enables the transfer of refined knowledge from larger-scale MLLMs in the cloud to compressed, pocket-size MLLMs on the device. This allows the device models to benefit from the robust capabilities of the larger-scale models without requiring extensive computational resources.
3. Cloud-to-Device Downlink:
In the downlink phase, the Dynamic Weight update Compression (DWC) strategy is introduced. This strategy adaptively selects and quantizes updated weight parameters, enhancing transmission efficiency and reducing the representational disparity between the cloud and device models. This ensures that the models remain consistent and synchronized during deployment.
The article highlights that extensive experiments on multimodal benchmarks demonstrate the superiority of the proposed framework compared to prior Knowledge Distillation and device-cloud collaboration methods. It is worth noting that the feasibility of the approach has also been validated through real-world experiments.
This research has significant implications for the deployment of large-scale MLLMs on client devices. By leveraging cloud-based resources and employing strategies for efficient data transmission, knowledge adaptation, and weight parameter compression, the proposed framework enables compressed MLLMs to maintain their performance and generalization capabilities. This can greatly enhance the usability and effectiveness of MLLMs in various applications where device resources are limited.