Project
Towards Robust and Automated Transformation Generation for Schema-Mapped Data
Objective
Project Objectives: 1. Dynamic Validation without Ground Truth - Develop methods to validate generated transformations using mined specifications and synthetic test data. - Investigate whether synthetic data and fuzzing can replace explicit user validation. 2. Support for Multi-Pattern Inputs - Detect heterogeneous data patterns in source columns (e.g., different date formats). - Cluster inputs by pattern and synthesize a separate transformation for each cluster. 3. Automatic Multi-Dimensional Validation - Design automatic validators to assess transformations across: - Semantic validity (does the output make sense?), - Data validity (parseability, type checks), - Runtime efficiency (execution cost, code quality), - Readability (documentation, maintainability). Methodology: - Extend the existing LLM-based pipeline that uses sample selection, few-shot prompting, and error-driven repair. - Integrate dynamic validation by generating synthetic test cases and checking semantic/data consistency. - Implement clustering + routing methods for multi-pattern inputs. - Develop lightweight validators for runtime, readability, and correctness. Experiments: - Evaluate on existing public benchmarks (FlashFill, IJCAI-data, SyGuS) and create new ones. - Metrics: transformation accuracy, robustness to heterogeneous inputs, validation precision/recall, runtime cost. - Ablation studies: with vs. without dynamic validation, single vs. multi-pattern transforms, baseline vs. validator-guided pipeline.
Outcome
Publication in a tier 1 AI conference
Apply By Date 31 Dec 2025
Students 0 / 3
Duration 6 months
Mentor Shashank Mujumdar
Tools-Technologies
NLP API, WatsonX.ai
Platform
1 ) WatsonX
College
1. IIIT Delhi
2. IIIT Hyderabad
3. IISc (Bangalore)
4. IIT Bombay
5. IIT Chennai
6. IIT Delhi
7. IIT Guwahati
8. IIT Kanpur
9. IIT Kharagpur
10. IIT Roorkee