Project
Benchmark for evaluation of LLMs on tabular data tasks
Objective
Populate and enrich a benchmark of tabular data tasks for evaluation of LLMs with our evaluation framework. Primary work includes tasks and datasets selection, writing data loaders, preparing task cards with input/output details and pre-processing steps, and prompts for the tasks. Test the data processing pipeline using our framework and evaluate select set of tasks with LLMs. This benchmark standardizes evaluating tabular data tasks in uniform manner against LLMs. Goal is to add a variety of tabular data tasks and make it a rich resource for benchmarking.
Outcome
Benchmark for evaluation of tabular data tasks with LLMs (includes code for each task added to benchmark and benchmark documentation)
Apply By Date |
17 Nov 2023 |
Students |
1 / 1 |
Duration |
3 months |
Mentor |
Rajmohan C |
Tools-Technologies | Jupyter Python Notebooks, |
College | |