Project
Towards Robust and Automated Transformation Generation for Schema-Mapped Data
Objective
Project Objectives:
1. Dynamic Validation without Ground Truth
- Develop methods to validate generated transformations using mined specifications and synthetic test data.
- Investigate whether synthetic data and fuzzing can replace explicit user validation.
2. Support for Multi-Pattern Inputs
- Detect heterogeneous data patterns in source columns (e.g., different date formats).
- Cluster inputs by pattern and synthesize a separate transformation for each cluster.
3. Automatic Multi-Dimensional Validation
- Design automatic validators to assess transformations across:
- Semantic validity (does the output make sense?),
- Data validity (parseability, type checks),
- Runtime efficiency (execution cost, code quality),
- Readability (documentation, maintainability).
Methodology:
- Extend the existing LLM-based pipeline that uses sample selection, few-shot prompting, and error-driven repair.
- Integrate dynamic validation by generating synthetic test cases and checking semantic/data consistency.
- Implement clustering + routing methods for multi-pattern inputs.
- Develop lightweight validators for runtime, readability, and correctness.
Experiments:
- Evaluate on existing public benchmarks (FlashFill, IJCAI-data, SyGuS) and create new ones.
- Metrics: transformation accuracy, robustness to heterogeneous inputs, validation precision/recall, runtime cost.
- Ablation studies: with vs. without dynamic validation, single vs. multi-pattern transforms, baseline vs. validator-guided pipeline.
Outcome
Publication in a tier 1 AI conference
| Apply By Date |
31 Dec 2025 |
| Students |
0 / 3 |
| Duration |
6 months |
| Mentor |
Shashank Mujumdar |
Tools-Technologies | NLP API, WatsonX.ai |
Platform | 1 ) WatsonX |
College | | 1. IIIT Delhi | | 2. IIIT Hyderabad | | 3. IISc (Bangalore) | | 4. IIT Bombay | | 5. IIT Chennai | | 6. IIT Delhi | | 7. IIT Guwahati | | 8. IIT Kanpur | | 9. IIT Kharagpur | | 10. IIT Roorkee |
|