Login

Fake It Till You Make It: Synthetic Data Generation with Decentralized Learning

BA, MA
State: Open
Published: 2025-03-05
 The increasing demand for high-quality data in machine learning applications has led to significant research on synthetic data generation. However, centralized approaches pose security and privacy risks, particularly when handling sensitive datasets. This thesis explores a novel approach to decentralized synthetic data generation, integrating Multi-Agent Systems (MAS) [3] and Federated Learning (FL) [2] to ensure that data remains distributed while preserving statistical properties. MAS provides a computational framework where intelligent agents collaborate and communicate efficiently through distributed techniques, such as gossiping [1], to generate synthetic data without exposing private information. Meanwhile, FL enhances privacy by exchanging only model updates instead of raw data. This interdisciplinary research will contribute to privacy-preserving synthetic data generation, enabling scalable, resilient and privacy-focused solutions for real-world applications.  
 
This thesis will involve the design, implementation, and evaluation of a decentralized synthetic data generation framework using Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), or diffusion models. The student will be expected to (i) conduct a literature review on synthetic data generation and decentralized learning, (ii) design a MAS-based system for privacy-preserving data synthesis, (iii) develop a prototype integrating MAS with FL to generate synthetic data in a decentralized setting, and (iv) evaluate the generated data's quality, privacy and scalability compared to real datasets. The student should have experience with machine learning, Python and AI modules like PyTorch. This research offers the potential for a scientific publication.  
 
Sources to Consider:  
[1] S. M. Hedetniemi, S. T. Hedetniemi, and A. L. Liestman. “A survey of gossiping and broadcasting in communication networks.” Networks, vol. 18, no. 4, pp. 319–349, 1988.  
[2] B. McMahan and D. Ramage. “Federated learning: Collaborative machine learning without centralized training data.” Google Research Blog, vol. 3, 2017.  
[3] M. Wooldridge and N. R. Jennings. “Intelligent agents: Theory and practice.” The Knowledge Engineering Review, vol. 10, no. 2, pp. 115–152, 1995.  
[4] Xu, L., Skoularidou, M., Cuesta-Infante, A. and Veeramachaneni, K., 2019. Modelling tabular data using conditional gan. Advances in neural information processing systems32.
[5] Fonseca, J. and Bacao, F., 2023. Tabular and latent space synthetic data generation: a literature review. Journal of Big Data10(1), p.115.
Supervisor: Francisco Enguix, Weijie Niu
Please contact <niu@ifi.uzh.ch>
10% Literature study, 30% Design, 50% Implementation, 10% Documentation
Programming in Python; Deep Learning Framework (Pytorch); Experience in Generative Models is preferred

Supervisors: Weijie Niu

back to the main page