Connect Build

Project

Data Augmentation for Personal Knowledge Graph Population

Objective

To identify bias in datasets (as in AI Fairness, bias against any minority population), we first need to be able to identify and extract protected variables (personal data entities) from unstructured text. Our team in IBM Research has done some work in this area. During this student project, we want to explore the problem of determining if the distribution of protected variables is biased towards a subset of possible values and hence biased on that protected variable. We are proposing a solution that uses a combination of generative and discriminative models to determine if the dataset is biased on any of the protected variables.

Outcome

1. Extract protected variables from unstructured text using data-discovery-api. 2. Ability to identify the distribution of protected variables in a dataset. 3. Create models to predict if the dataset is biased on a protected variable.

Apply By Date	31 Dec 2020
Students	10 / 10
Duration	2-3 months
Mentor	Balaji Ganesan

Tools-Technologies	Jupyter Python Notebooks,
College	All College
Documents	1 ) Bias Detection in Unstructured Datasets Download