Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables. Automating these table-centric tasks with Large Language Models (LLMs) offers significant public benefits, garnering interest from academia and industry. This survey provides an extensive overview of table tasks, encompassing not only the traditional areas like table question answering (Table QA) and fact verification, but also newly emphasized aspects such as table manipulation and advanced table data analysis. Additionally, it goes beyond the early strategies of pre-training and fine-tuning small language models, to include recent paradigms in LLM usage. The focus here is particularly on instruction-tuning, prompting, and agent-based approaches within the realm of LLMs. Finally, we highlight several challenges, ranging from private deployment and efficient inference to the development of extensive benchmarks for table manipulation and advanced data analysis.
Tables are a fundamental component of various daily activities, playing a crucial role in tasks such as database queries, spreadsheet calculations, and generating reports from web tables. The automation of these table-centric tasks using Large Language Models (LLMs) has attracted significant attention and has the potential to provide substantial public benefits. This comprehensive survey delves into the multiple dimensions of table tasks, encompassing not only traditional areas like table question answering (Table QA) and fact verification but also highlighting emerging aspects such as table manipulation and advanced table data analysis.
Traditionally, researchers have focused on employing pre-training and fine-tuning techniques for small language models. However, this survey extends beyond those early strategies and explores recent paradigms in LLM usage. Specifically, it sheds light on instruction-tuning, prompting, and agent-based approaches within the realm of LLMs. These advancements open up new possibilities for leveraging LLMs to effectively tackle table-related challenges.
One notable aspect of this survey is its multi-disciplinary nature. The utilization of LLMs for table tasks involves a combination of disciplines, including natural language processing, machine learning, and database management. By embracing a multi-disciplinary perspective, researchers and practitioners can leverage insights from different domains to enhance the capabilities of LLMs in tackling complex table-based problems.
While the potential benefits of LLMs in table tasks are promising, several challenges lie ahead. One challenge is the private deployment of LLMs, which raises concerns about data privacy and confidentiality. Efforts must be made to develop robust methodologies that ensure sensitive information is safeguarded when using LLMs in real-world applications.
Another challenge is the efficient inference of LLMs, as their large model sizes can lead to significant computational overhead. Researchers need to focus on developing optimization techniques and efficient algorithms to enable fast and practical deployment of LLM-based table solutions.
Furthermore, the development of extensive benchmarks for table manipulation and advanced data analysis is crucial to objectively evaluate the performance of LLMs. By creating standardized evaluation criteria and datasets, researchers can compare different approaches and measure progress in the field.
In conclusion, this comprehensive survey provides valuable insights into the use of Large Language Models in table tasks. The multi-disciplinary nature of this research area and the inclusion of emerging paradigms underscore the potential of LLMs in automating table-centric activities. Although challenges exist, addressing them through collaborations across various disciplines will pave the way for further advancements and practical applications of LLMs in the domain of tables.