--- license: llama3.1 datasets: - RUCKBReasoning/TableLLM-SFT language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct tags: - table - QA - Code --- # TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios | **[Paper](https://arxiv.org/abs/2403.19318)** | **[Training set](https://huggingface.co/datasets/RUCKBReasoning/TableLLM-SFT)** | **[Github](https://github.com/RUCKBReasoning/TableLLM)** | **[Homepage](https://tablellm.github.io/)** | We present **TableLLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. TableLLM is fine-tuned based on [Llama3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). TableLLM generates either a code solution or a direct text answer to handle tabular data manipulation tasks based on different scenarios. Code generation is used for handling spreadsheet-embedded tabular data, which often involves the insert, delete, update, query, merge, and plot operations of tables. Text generation is used for handling document-embedded tabular data, which often involves the query operation of short tables. ## Evaluation Results We evaluate the code solution generation ability of TableLLM on three benchmarks: WikiSQL, Spider and Self-created table operation benchmark. The text answer generation ability is tested on four benchmarks: WikiTableQuestion (WikiTQ), TAT-QA and FeTaQA. The evaluation result is shown below: | Model | WikiTQ | TAT-QA | FeTaQA | WikiSQL | Spider | Self-created | Average | | :------------------- | :----: | :----: | :----: | :-----: | :----: | :----------: | :-----: | | TaPEX | 38.6 | – | – | 83.9 | 15.0 | / | 45.8 | | TaPas | 31.6 | – | – | 74.2 | 23.1 | / | 43.0 | | TableLlama | 24.0 | 22.3 | 20.5 | 43.7 | - | / | 23.4 | | TableGPT2(7B) | 77.3 | 88.1 | 75.6 | 63.0 | 77.34 | 74.42 | 76.0 | Llama3.1 (8B) | 71.9 | 74.3 | 83.4 | 40.6 | 18.8 | 43.2 | 55.3 | | GPT3.5 | 58.5 | 72.1 | 71.2 | 81.7 | 67.4 | 77.1 | 69.8 | | GPT4o |**91.5**|**91.5**|**94.4**|84.0| 69.5 |77.8|84.8| | CodeLlama (13B) | 43.4 | 47.3 | 57.2 | 38.3 | 21.9 | 47.6 | 43.6 | | Deepseek-Coder (33B) | 6.5 | 11.0 | 7.1 | 72.5 | 58.4 | 73.9 | 33.8 | | StructGPT (GPT3.5) | 52.5 | 27.5 | 11.8 | 67.8 |**84.8**| / | 43.1 | | Binder (GPT3.5) | 61.6 | 12.8 | 6.9 | 78.6 | 52.6 | / | 36.3 | | DATER (GPT3.5) | 53.4 | 28.5 | 18.3 | 58.2 | 26.5 | / | 33.0 | | TableLLM-8B (Ours) |89.1|89.5|93.4|**89.6**|81.1|77.8|**86.7**| ## Prompt Template The prompts we used for generating code solutions and text answers are introduced below. ### Code Solution The prompt template for the insert, delete, update, query, and plot operations on a single table. ``` [INST]Below are the first few lines of a CSV file. You need to write a Python program to solve the provided question. Header and first few lines of CSV file: {csv_data} Question: {question}[/INST] ``` The prompt template for the merge operation on two tables. ``` [INST]Below are the first few lines two CSV file. You need to write a Python program to solve the provided question. Header and first few lines of CSV file 1: {csv_data1} Header and first few lines of CSV file 2: {csv_data2} Question: {question}[/INST] ``` The csv_data field is filled with the first few lines of your provided table file. Below is an example: ``` Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15 M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7 F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9 M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10 I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7 ``` ### Text Answer The prompt template for direct text answer generation on short tables. ```` [INST]Offer a thorough and accurate solution that directly addresses the Question outlined in the [Question]. ### [Table Text] {table_descriptions} ### [Table] ``` {table_in_csv} ``` ### [Question] {question} ### [Solution][INST/] ```` For more details about how to use TableLLM, please refer to our GitHub page: