Data is crucial to the training and development of artificial intelligence (AI) systems. However, three key data-related issues can act as barriers to development and deployment of AI capabilities and systems.
- First, the development of AI technologies has – at least in part – depended on the availability of large datasets to train AI models.
- Second, data is a resource whose availability, collection, cleaning, use and sharing is affected by factors such as collection costs, lack of real-world data in certain domains, as well as regulatory, legal and ethical constraints.
- Third, data quality, representativeness, and diversity are directly linked to an AI model’s performance, level of bias, accuracy and reliability.
Synthetic data – data that is artificially generated in the digital world with properties that are often derived from an original set of data – has been proposed as a solution to address some of these data-related issues, especially for AI model training. However, synthetic data is no panacea, and has been shown to potentially exacerbate many of the issues it seeks to curtail, sparking governance discussions.
To explore the governance challenges of synthetic data in the context of international security, UNIDIR’s Security and Technology Programme held an event titled Technology and Security Seminar on Synthetic Data: Exploring Governance Implications.
This report provides a summary of the key themes and takeaways from discussions at the event. The report is divided into two parts, reflecting the structure of the event. The first part provides a short overview of the technology and its uses in the military domain. The second part presents the various views, issues and potential challenges to governance presented by synthetic data in the context of international security.
Citation: Federico Mantellassi, “Governance Implications of Synthetic Data in the Context of International Security”, UNIDIR, Geneva, 2024.