Project -
Benchmark for evaluation of LLMs on tabular data tasks
Objective
Populate and enrich a benchmark of tabular data tasks for evaluation of LLMs with our evaluation framework. Primary work includes tasks and datasets selection, writing data loaders, preparing task cards with input/output details and pre-processing steps, and prompts for the tasks. Test the data processing pipeline using our framework and evaluate select set of tasks with LLMs. This benchmark standardizes evaluating tabular data tasks in uniform manner against LLMs. Goal is to add a variety of tabular data tasks and make it a rich resource for benchmarking.
Outcomes
Benchmark for evaluation of tabular data tasks with LLMs (includes code for each task added to benchmark and benchmark documentation)
Apply by Date
17/11/2023
Applied Teams
1 / 1
Duration
3 months
College
1. IIT Jodhpur
Tools-Technologies
Jupyter Python Notebooks, Python
Mentor
Rajmohan C