Connect Build

Project

Benchmark for evaluation of LLMs on tabular data tasks

Objective

Populate and enrich a benchmark of tabular data tasks for evaluation of LLMs with our evaluation framework. Primary work includes tasks and datasets selection, writing data loaders, preparing task cards with input/output details and pre-processing steps, and prompts for the tasks. Test the data processing pipeline using our framework and evaluate select set of tasks with LLMs. This benchmark standardizes evaluating tabular data tasks in uniform manner against LLMs. Goal is to add a variety of tabular data tasks and make it a rich resource for benchmarking.

Outcome

Benchmark for evaluation of tabular data tasks with LLMs (includes code for each task added to benchmark and benchmark documentation)

Apply By Date	17 Nov 2023
Students	1 / 1
Duration	3 months
Mentor	Rajmohan C

Tools-Technologies

Jupyter Python Notebooks,

College

1. IIT Jodhpur