Project
GraphQL Dataset Challenge
Objective
GraphQL is a powerful query language for APIs that allows clients to fetch precise data efficiently and flexibly, querying multiple resources with a single request. However, crafting complex GraphQL query operations can be challenging. Large Language Models (LLMs) offer an alternative by generating GraphQL queries from natural language, but they struggle due to limited exposure to publicly available GraphQL schemas, often resulting in invalid or suboptimal queries. Furthermore, no benchmark test data suite is available to reliably evaluate the performance of contemporary LLMs.
To address this, we prepared and published a large-scale, cross-domain Text-to-GraphQL query operation dataset in EMNLP 2024. The dataset includes 10,940 training triples spanning 185 cross-source data stores and 957 test triples over 14 data stores. Each triple consists of a GraphQL schema, GraphQL query operation, and corresponding natural language query.
This dataset helped us benchmark the existing models and improve their performance. In this work, we plan to extend this paper and prepare a more comprehensive dataset.
Outcome
The aim of this project is to submit a fresh dataset paper in EMNLP 2025. Please only apply if you have the bandwidth for the next 3 months, as the submission deadline will be around mid-June.
| Apply By Date |
15 Feb 2025 |
| Students |
1 / 5 |
| Duration |
4 months |
| Mentor |
Manish Kesarwani |