Welcome to the Nexus of Ethics, Psychology, Morality, Philosophy and Health Care

Welcome to the nexus of ethics, psychology, morality, technology, health care, and philosophy

Wednesday, November 12, 2025

Self-Improvement in Multimodal Large Language Models: a survey.

Deng, S., Wang, K., et al. (2025, October 3).
arXiv.org.

Abstract

Recent advancements in self-improvement for Large Language Models (LLMs) have efficiently enhanced model capabilities without significantly increasing costs, particularly in terms of human effort. While this area is still relatively young, its extension to the multimodal domain holds immense potential for leveraging diverse data sources and developing more general self-improving models. This survey is the first to provide a comprehensive overview of self-improvement in Multimodal LLMs (MLLMs). We provide a structured overview of the current literature and discuss methods from three perspectives: 1) data collection, 2) data organization, and 3) model optimization, to facilitate the further development of self-improvement in MLLMs. We also include commonly used evaluations and downstream applications. Finally, we conclude by outlining open challenges and future research directions.

Here are some thoughts that summarize this paper. MLLMs are learning to improve without human oversight.

This survey presents the first comprehensive overview of self-improvement in Multimodal Large Language Models (MLLMs), a rapidly emerging paradigm that enables models to autonomously generate, curate, and learn from their own multimodal data to enhance performance without heavy reliance on human annotation. The authors structure the self-improvement pipeline into three core stages: data collection (e.g., via random sampling, guided generation, or negative sample synthesis), data organization (including verification through rules, external or self-based evaluators, and dataset refinement), and model optimization (using techniques like supervised fine-tuning, reinforcement learning, or Direct Preference Optimization). The paper reviews representative methods, benchmarks, and real-world applications in domains such as math reasoning, healthcare, and embodied AI, while also outlining key challenges—including modality alignment, hallucination, limited seed model capabilities, verification reliability, and scalability. The goal is to establish a clear taxonomy and roadmap to guide future research toward more autonomous, general, and robust self-improving MLLMs.