Next DLP Blog

Questions for generative AI vendor security review

Written by Lauren Koppelman | Dec 6, 2023 10:35:26 PM

Identifying risk when using Generative AI Tools

Developments in Artificial Intelligence (AI) and Machine Learning (ML) are booming. While these technologies have been in use for several years, the launch of ChatGPT, Google Bard, and other Large Language Models (LLM) introduced many to their powers and efficiencies, making the tools more readily available to the general public.

AI promises improved operational efficiency and the ability to scale services. Better yet, organizations no longer need to build these technologies themselves. Instead, they can leverage AI-as-a-Service (AIaaS) offerings to accelerate time to value.


Categories of Generative AI tools

These can include:

  • Chatbots: an application that simulates human conversation. Chatbots use AI, natural language processing, and machine learning to understand questions, predict a user’s needs, and generate responses. Chatbots are used on websites to answer questions from visitors, guide them to the appropriate web pages, or gather information before directing them to a human for assistance. Predictive chatbots continue to learn and improve over time as the data set of queries and responses grows.
  • Synthetic data: AI can be used to generate artificial data sets that approximate the statistical properties of real-world data. Synthetic data sets can be used to train machine learning models or test algorithms without the need for large amounts of real-world data or compromising the privacy of humans.
  • AI-generated code: AI has the potential to automate many aspects of software development to make it faster and more efficient. Natural language processing can turn descriptions of a desired program into executable code. AI techniques like machine learning can be trained on large sets of existing code to identify patterns and generate new code similar to the training data.
  • Search: AI search tools simplify searches by providing natural language responses. However, models like ChatGPT and Google Bard are “black box” models with inner workings that are not easily understandable and are not great at logical reasoning, leading to what many call hallucinations. In reality, the system performs how it was designed: statistically choosing word combinations rather than compiling information and using critical thinking to draw conclusions. New search tools are emerging that are trained on specific data sets and designed to provide evidence-based responses.
  • Text and image content: Various tools convert natural language descriptions into images, slide presentations, video, and audio. These tools can generate concept art, visual prototypes, and other design materials or PowerPoint presentations from written or spoken descriptions. Other tools of this type can be trained to create marketing content, including emails, blog posts, 1:1 messaging, and long-form content.

Security Risks of AI as a Service


Security Challenges of Generative AI Tools

While these solutions have many benefits, organizations should also do their due diligence to understand the third-party risk they present.

Intellectual Property risk

Information entered into any AI system may become part of the system’s data set. While OpenAI states they do not use content submitted through their API to improve their service, they can use content from non-API sources for that purpose. Most services will use feedback from users to improve their responses. 

Depending on the specific system and how it is designed and implemented, this function, intended to improve the AI model, could put your IP at risk. For example, the system could use your images, text, or source code as examples or as the basis for answers provided to other users. Accidental inclusion in training data could expose trade secrets or inadvertently provide other users with similar images or product functionality. Similarly, a patent application or other IP uploaded to a language translation application may become part of that system’s training set.

This risk is why, after an employee uploaded proprietary source code to a system while trying to debug the code, Samsung banned employees from using public Chatbots. A single employee simply trying to get work done potentially exposed their IP and vulnerabilities in the semiconductors that an attacker could exploit. Another Samsung employee reportedly submitted confidential notes on an internal meeting to ChatGPT and asked it to create a presentation from the notes.

Emerging technologies bring both opportunities and challenges. As organizations adopt generative AI tools, it is essential to assess the potential risks involved. Conducting a thorough vendor assessment can help identify any security vulnerabilities and ensure appropriate security controls are in place. Risk assessments should be performed regularly to avoid emerging threats and protect sensitive data. Organizations can mitigate the risks of using generative AI tools by taking proactive measures and implementing robust security practices.

What you can do

To reduce these risks, it is essential to carefully evaluate any AI system before entering your IP into it. Evaluation may involve assessing the system's security measures, encryption standards, data handling policies, and ownership agreements. From a data protection standpoint, acknowledge that AI systems, including Chatbots, search engines, text/image converters, are potential data exfiltration channels.

Questions to ask vendors about IP risks

  • Does the service claim any rights to user inputs?
  • Do those rights vary based on the input method (i.e. through a web UI v. API)?
  • Does the service use user feedback? Is it used without restrictions?
  • How is feedback data protected? Is it attributable to users or licensees? Is it anonymized and encrypted?
  • Are there carve-outs in the confidentiality agreements for the use of feedback?
  • How long does the service provider save data submitted by users?
  • Are there opt-out provisions for the use of your data?

IP Ownership

An AI system’s only source of information is its training set. In a public system with input from multiple entities, the output can potentially be based on proprietary data from other users. For example, a training set may include copyrighted material from books, patent applications, web pages, and scientific research. Suppose this AI system is used to create a new product or invention. In that case, the output may be subject to patent protection or IP rights by the user who originally submitted that information. Also, consider the opposite situation. The US Copyright Office recently ruled that the output from AI is not eligible for copyright protection unless it includes “sufficient human authorship.” 

What you can do

While OpenAI and Google Bard explicitly state that users own both input and output and do not use user-inputted information to augment their training base, other services may differ. Data privacy is a critical consideration when using generative AI tools. Organizations must review the terms and conditions of the service to understand how data is stored and handled. Ensuring that any data submitted to the service is encrypted and protected from unauthorized access is vital. Questions to ask vendors about data privacy include whether they store data, how long it is stored, and whether it is anonymized and encrypted. Organizations can protect sensitive information and comply with privacy regulations by prioritizing data privacy. AI users should be aware of the potential IP rights associated with the output generated by AI systems and take appropriate measures to protect their IP rights and respect the IP rights of others. A system's legal terms and conditions may assign rights to its output to a user, but it is unclear whether these systems have the legal right to make those assignments. Include your legal team when evaluating any AI system.

Questions to ask vendors about IP ownership

  • Do you have full rights to outputs, including patent rights to novel algorithms or models produced by the service?
  • Are users required to attribute output to the service?

Attacks on AI Systems

Many tools have restrictions for their use. For example, most chatbots have been programmed not to provide examples of malicious code, such as ransomware and viruses. Most will not engage in hate speech or illegal activities. There may also be consequences for violating the guidelines, ranging from warnings to account suspension or termination, depending on the severity of the violation.

However, inputs to AI and ML systems can be viewed as similar to inputs to other applications. In a software application, developers implement input validation to prevent attacks like SQL injection and cross-site scripting. AI systems are just coming to terms with system hacks. For example, indirect prompt injection is when an LLM is asked to analyze some text on the web and instead starts to take instructions from that text. Researchers have shown that this could be used to trick users of website chatbots into providing sensitive information. Other “jailbreaking” techniques can cause systems to violate controls and produce hateful content.

What you can do

Include AI systems in your appropriate use policies to protect your organization against reputational damage. Have a reporting mechanism for unusual output and an incident response plan in case of an attack on the system.

Questions to ask vendors about malicious attacks

  • What steps has the vendor taken to mitigate risk from malicious input?
  • Who is responsible for activities/actions taken using your credentials?
  • Who is responsible for the output of the tool?
  • Who is legally liable if you use the service?
  • How is the service’s compliance with applicable laws measured and enforced?
  • Must the user comply with applicable laws when using the service?

Data Privacy

Data provided to AI and ML systems may be stored by their providers. Suppose said data includes personal information from users, such as names, email addresses, and phone numbers. In that case, this data is subject to privacy regulations such as the California Consumer Privacy Act (CCPA), the Virginia Consumer Data Protection Act, Europe’s General Data Protection Regulation (GDPR), and similar regulations.

Some AI service providers, such as those producing social media content and demand generation emails, may integrate with other services such as Facebook accounts, content management systems, and sales automation systems. Depending on the service provider, this could expose sensitive data or cause it to be uploaded to the provider’s servers.

What you can do

Review the terms and conditions of any service you use. If you use an AI system or service that stores data, you should ensure the data you submit is encrypted to protect it from unauthorized access. This review can help prevent data breaches and protect sensitive personal, financial, or other confidential information.

Questions to ask vendors about data privacy

  • Do you store data we submit to the service? For how long?
  • Is the data we submit anonymized and encrypted?
  • Do we have the ability to delete specific data?
  • Is the data attributable to a user or entity?
  • Is the data used for any internal or external purposes?

General Cyber-hygiene

You should train users to practice cyber hygiene like any other third-party service. Maintaining good cyber hygiene is essential when using generative AI tools. Users should be trained on security awareness and best practices, such as using strong passwords and not sharing account credentials. Multi-factor authentication should be implemented for systems handling sensitive information. Organizations should also ensure that they have valid licenses for the AI tools they use and comply with the service's license agreements. By practicing good cyber hygiene, organizations can reduce the risk of unauthorized access and protect their sensitive data.

What you can do

Train your users on security awareness and cyber hygiene. Institute strong password policies across all applications and use multi-factor authentication for systems handling sensitive information.

Questions to ask vendors about general cyber-hygeine

  • Is an account needed to access the platform?
  • Can the user access the tool instantly?
  • Is multi-factor authentication available?
  • Is accurate and complete information required to create an account?
  • Does the service provider verify user information? If so, how?
  • Can you use the service on behalf of another person?
  • Can users share login credentials outside of a licensee’s domain?
  • Is there an age restriction for using the service?

Protect Your Sensitive Data

Finally, organizations should consider leveraging Data Loss Prevention to monitor and control the information submitted to such services. It is essential to have the ability to recognize and classify sensitive data as it is accessed and used. Organizations can proactively identify and mitigate the risk of data loss or unauthorized disclosure by implementing DLP measures when using generative AI tools. Some of these services can serve as exfiltration channels. Your DLP should include recognizing and classifying sensitive data as it is accessed and used. For example, suppose a user attempts to copy sensitive data into a query or adds customer information to a service. In that case, your DLP platform should recognize that attempt, track the steps leading up to the event, and intervene before the data is leaked.

Watch an on-demand demo to learn more about how Reveal helps with insider risk management and data loss prevention.

Frequently asked questions

How is AI used in cybersecurity?

Artificial Intelligence (AI) is being implemented in many types of cybersecurity solutions. Following are some of the ways AI is currently being used to protect IT environments.

  • Threat identification and prevention - AI tools can quickly inspect large volumes of data from diverse sources to identify suspicious behavior that may be a cyberattack. 
  • Automated security operations - AI and machine learning (ML) tools can take the appropriate actions to address threats or vulnerabilities before they can impact the environment. Cybersecurity professionals can rely on AI automation and focus on strategic decision-making. 
  • Vulnerability assessments - AI-powered solutions such as User and Entity Behavior Analytics (UEBA) provide insights into new vulnerabilities so they can be addressed before they are exploited by threat actors.
  • User-friendly security tools - AI interfaces and chatbots can streamline the use of cybersecurity tools. This is especially useful for companies with limited security resources.
  • Data loss prevention—Modern data loss prevention (DLP) platforms extensively use AI technology to safeguard an organization’s sensitive data from unauthorized use.

What are the risks of implementing generative AI in cybersecurity?

Implementing AI in cybersecurity must be done with the following potential risks in mind.

  • The content generated by AI tools will only be as good as the data used to train the application. 
  • The tool may provide incorrect statements as facts. Users may need to verify the information generated by an AI platform. 
  • The platform may be biased due to the training information it has received.
  • Prompt injection attacks can be used to cause a platform to generate offensive or misleading information.
  • Threat actors may attempt data poisoning attacks to corrupt the AI model and cause it to produce undesirable results.

Is AI a cybersecurity threat?

Yes, AI can pose various threats to cybersecurity. Threat actors can use AI to attack an IT environment in the following ways. 

  • AI-powered malware development kits available on the dark web enable virtually any motivated individual to create malicious software, potentially increasing the number of cyberattacks significantly.
  • AT technology can be used to develop sophisticated phishing and other types of social engineering attacks. Machine learning tools consolidate data from legitimate sources to create convincing and deceptive messages.
  • Adversarial attacks on an organization’s AI platforms can cause systems to make incorrect decisions that damage a business. 

AI tools can be used to perpetrate business email compromise attacks that can deceive users into divulging sensitive data and cause reputational damage to the targeted organization.