Data Leakage from AI Modeling-Based Imaging Platforms: Research Paper Reveals the Hidden Risks

Research Reveals Data Leakage in AI Modeling-Based Imaging Platforms


A group of researchers from American and Swiss universities, in collaboration with Google and its subsidiary DeepMind, has released a research paper explaining how data can leak from AI modeling-based imaging platforms such as DALL-E, Imagen, or Stable Diffusion.

These platforms work by generating images based on a specific text message entered by the user, such as “avocado armchair,” and producing an image corresponding to the text in just a few seconds.

The generative artificial intelligence models used in these platforms have been trained on a large number of images with predefined descriptions. The idea is that neuralworks can generate new and unique images after processing a vast amount of training data.

However, the research reveals that these generated images are not always unique. In some cases, the neuralwork can reproduce an image that exactly matches a previous image used in training, thereby unintentionally revealing private information.

This raises concerns about the privacy and confidentiality of the training data used in these AI models.

Generative AI Models and Training Data

The results of deep learning systems may seem magical to non-specialists, but in reality, neuralworks operate based on the same principle. They are trained using a large dataset with accurate descriptions of each image. For example, to create an image of a cat, the algorithm studies thousands of real photographs or drawings of cats.

Once trained, the neuralwork can generate new images based on the learned patterns. This methodology applies not only to images but also to text, video, and sound.

The starting point for all neuralworks is the set of training data. Neuralworks cannot create new objects without being trained on relevant data.

Data Leakage and Recommendations

The researchers highlight machine learning models that distort training data by adding noise and then train neuralworks to restore the original images. However, this method has an increased tendency to leak data, and the original training data can be extracted in various ways:

  • Using specific queries to force the neuralwork to produce a known source image
  • Reconstructing the original image from a partial version
  • Determining if a particular image is included in the training data

The researchers suggest several recommendations to improve the specificity of the original training set:

  1. Eliminate repetition in training groups
  2. Rework the training images by adding noise or altering brightness
  3. Test the algorithm with special training images to prevent unintentional reproduction

Copyright Infringement and AI-Generated Art

In January 2023, three artists filed a lawsuit against AI-based image-generation platforms for using their copyrighted images without permission.

Neuralworks can copy an artist’s style, potentially depriving them of income. Algorithms may engage in outright plagiarism, generating images that are nearly identical to the work of real artists.

The Future of AI and Security

Generative art platforms have sparked an interesting debate regarding the balance between artists and technology developers. However, the research paper also highlights the safety concerns associated with AI models.

Although hypothetical for now, there is a potential risk of an intelligent assistant or unauthorized script accessing sensitive information or creating copies of personal documents using public neuralworks.

Furthermore, real-world problems have already emerged, such as the use of AI models to generate malicious code or GitHub Copilot utilizing open-source code without respecting the author’s copyright and privacy.

As neuralworks continue to evolve, it is imperative to address and mitigate the security risks associated with their use.

Follow AsumeTech on

More From Category

More Stories Today

Leave a Reply