Freitag, 24. November 2023

Who’s Harry Potter? - Can AI really forget something it has learned & GDPR

The question "Who's Harry Potter?" is the title of the article by Ronen Eldan (Microsoft Research) and Mark Russinovich (Azure) on the topic of whether AI systems can forget something once they have learned it. As far as the topic of "forgetting" is involved, the GDPR also comes up here. Article 17 of the GDPR regulates the right to deletion / to be forgotten. Microsoft has already provided information on the topic for AI solutions in the context of the GDPR.
But one thing at a time...

Who’s Harry Potter?

Ronen Eldan and Mark Russinovich wanted to make the Llama2-7b model forget the content of the Harry Potter books. The background to this is that the data set "books3", which contains many other copyrighted texts in addition to the Harry Potter books, was allegedly used to train the LLM. Details: The Authors Whose Pirated Books Are Powering Generative AI
However, unlearning is not as easy as learning. How to train or fine-tune an LLM in Azure OpenAI is described here. Essentially, a JSONL file is used to instruct a base model which answer should be given to an explicit question:

From a high-level perspective, Ronen Eldan and Mark Russinovich proceeded in exactly the same way, as there is currently no "delete function" for LLMs. The model was therefore trained to answer questions about Harry Potter differently:
However, these adjustments resulted in the model hallucinating significantly more. The ability to hallucinate is a key feature of generative AI solutions. If the model has no information to generate an answer, an answer is created on the basis of likelihood calculation. This is called hallucinating. This results in outputs such as this one, which claims that Frankfurt Airport will have to close in 2024:

Ronen Eldan and Mark Russinovich have made their version of the Llama2-7b model available on HuggingFace, and encourage everyone to give them feedback if they still manage to get knowledge about Harry Potter as output. Details: https://arxiv.org/abs/2310.02238 And here is the link to the article: Who's Harry Potter? Making LLMs forget

Privacy, and Security for Microsoft AI solutions

As mentioned above, the right to be forgotten is only one aspect when it comes to the requirements of the GDPR or ISO/IEC 27018. Microsoft does not offer any explicit legal support in the actual sense. Rather, it is described that Microsoft AI solutions also generally meet the necessary requirements. The key points here are:
  • Prompts, responses and data accessed via Microsoft Graph are not used for the training of LLMs, including those of Microsoft 365 Copilot.
  • For customers from the European Union, Microsoft guarantees that the EU data boundary will be respected. EU data traffic remains within the EU data boundary, while global data traffic in the context of AI services can also be sent to other countries or regions.
  • Logical isolation of customer content within each tenant for Microsoft 365 services is ensured by Microsoft Entra authorization and role-based access control.
  • Microsoft ensures strict physical security, background screening and a multi-level encryption strategy to protect the confidentiality and integrity of customer content.
  • Microsoft is committed to complying with applicable data protection laws, such as the GDPR and data protection standards, such as ISO/IEC 27018.
Currently (November 24, 2023) Microsoft does not yet offer any guarantees for data in-rest in the context of Microsoft 365 Copilot. This applies to customers with Advanced Data Residency (ADR) in Microsoft 365 or Microsoft 365 Multi-Geo. Microsoft 365 Copilot builds on Microsoft's current commitments for data security and data protection. In the context of AI solutions, the following also applies:
All details on how Microsoft AI solutions fulfill regulatory requirements are described here:



Dienstag, 31. Oktober 2023

Prepare your organization for Microsoft 365 Copilot

Make sure that all permissions are set correctly. Check that all technical requirements are met and assign Copilot licenses to your user.

All of these points are certainly part of rollout planning. However, they are not the only ones and, above all, the implementation is not done quickly for many companies.

In this article from September 21, 2023, it was announced that Microsoft 365 Copilot will be generally available on 1. November: https://blogs.microsoft.com/blog/2023/09/21/announcing-microsoft-copilot-your-everyday-ai-companion/ 

A classic public preview, as known from other products, was not available for Copilot. Therefore, the usual method of evaluating features with a small pilot group in the company, as soon as a feature is available as a public preview, was not available.

Currently, it is still the case that at least 300 licenses have to be purchased in order to use Microsoft 365 Copilot:


This article describes what you can do today to prepare for Microsoft 365 Copilot, even if you don't have the license available yet.

Copilot, Bing Chat Enterprise, Azure OpenAI – When to use what

The Microsoft 365 Copilot license is an add-on to an existing Microsoft 365 E3/E5 license and is currently quoted at $30 per user/month. Many companies are therefore planning a mix, which may look like this, for example:
  • Approximately 20 - 30% of the employees get a Microsoft 365 Copilot license. These are mainly the so-called power users. 
  • Own solutions based on Azure OpenAI are only implemented for users and requirements where it is really about providing very specific solutions. 

In general, the checking scheme is structured like this:
  • Should the AI have access to own data? - If the answer is YES, the solution with Bing Chat Enterprise is not working
  • Are the features of Microsoft 365 Copilot suitable to cover the requirements? - If the answer is NO, then the answer is to create your own solution based on Azure OpenAI and possibly customized LLMs.

How Microsoft 365 Copilot is designed

Microsoft 365 Copilot is available as a plugin in Office Apps or as M365 Chat in Teams. When a user enters a request, it is tailored through an approach called "grounding". This method makes the user's input more specific and ensures that the user receives answers that are relevant and usable for his specific request. To obtain the data, a semantic index is used. This is also where security trimming takes place, ensuring that a user only receives answers generated based on data they are allowed to access. This is done via the Microsoft Graph. The response generated in this way is then returned to the user. Microsoft Copilot can also be extended. To do this Graph Connectors (https://www.microsoft.com/microsoft-search/connectors) can be used.
Example in Word and M365 Chat:

Get ready for Microsoft 365 Copilot

Before Microsoft 365 Copilot can be used, some prerequisites have to be fulfilled. For example, the solution is only available from a minimum Office version or only in the Office Apps for Enterprise. This also applies to Outlook.  Microsoft 365 Copilot requires the new Outlook, which is available for Windows and Mac. All details about the requirements for Copilot are described in this article: Microsoft 365 Copilot requirements

Preparing for the launch of Copilot

As mentioned above, it is recommended to make Copilot available to a selected test group first. Even if at least 300 licenses have to be purchased, the actual license can then only be assinged to selected users. The feedback from this test group can then be used to plan the further rollout. Microsoft provides the following information and guidelines for this tasks:
The most important task of this test group is to check that the access- rights-concept has been implemented properly, using representative scenarios. A user's request for information to which he does not have access must provide no answers. In general, the Search in Microsoft 365 can also be used independently of Copilot for such tests. The search at https://www.office.com/search returns results from all Microsoft 365 services and connected sources. This includes Teams chats, emails in Outlook, and posts in Viva Engage. The example shows that searching for sensitive information should not bring up any matches:

Create a Copilot Center of Excellence

In the article How to get ready for Microsoft 365 Copilot, Microsoft recommends creating a Center of Excellence for Copilot. This Center of Excellence can then be used to provide training materials, updates regarding the rollout in the company, FAQs and other information. The Center of Excellence is intended to be a central place for users to find everything related to the topic. Microsoft provides extensive material for this:
The Center of Excellence can then also provide information on where the limitations of AI and Copilot are and what needs to be considered from a regulatory perspective The EU has published the EU AI ACT for this reason.










Donnerstag, 7. September 2023

All You Need is Guest

At Blackhat 2023, there was a session called "All You Need is Guest". The session describes how to highjack a Microsoft 365 tenant with a guest user via the Power Platform. There is also already the tool you need for this on GitHub: power-pwn.
In the Microsoft 365 tenant, only a trial / seed license for the PowerPlatform must be available. As of today, this license type cannot be declined administratively or it is not possible to completely prevent users from procuring this license themselves via settings in Microsoft 365. Trial licenses or "Self Service Purchase" can be configured. This is described here: Manage self-service purchases and trials. There are also scripts available, such as this one on GitHub, to disable everything that can be disabled:  AllowSelfServicePurchase for the MSCommerce PowerShell module. However, the so-called Seeded License cannot be deactivated and is always available to the user.

To protect against an attack, the easiest way is to set up a Conditonal Access Policy that denies guest users access to the PowerPlatform:
However, depending on whether guest users need access to PowerPlatform as part of business processes, details can be selected in the settings for "Specific users included":





Donnerstag, 31. August 2023

Bing Chat Enterprise & Office Online

Bing Chat and Bing Chat Enterprise can interact with Office Online apps. The result is close to what Microsoft Copilot promises in M365 apps.

Bing Chat Enterprise

Bing Chat and Bing Chat Enterprise are extensions to Bing Search available at https://bing.com/ chat. Essentially, both solutions offer ChatGPT-like capabilities that are further enhanced by Bing Search.

Bing Chat Enterprise differs from Bing Chat because it covers aspects of data privacy as well, and sign-in is via a Microsoft Entra ID, formerly Azure AD account. This allows the use of Microsoft Entra security features to secure and control the login. The input and generated texts are not recorded, analyzed or used by Bing Chat Enterprise to train the language models. However, Bing Chat Enterprise cannot access company data in Microsoft 365 and use it to create answers. This is only possible with Microsoft Copilot or via Azure OpenAI.

Bing Chat Enterprise is available through https://bing.com/chat and the Microsoft Edge for Business sidebar. A Microsoft 365 E3, E5, Business Standard, Business Premium, or A3 or A5 license is required to use the feature. Further details on this topic are described here: Bing Chat Enterprise - Your AI-powered chat with commercial data protection

Chat Enterprise & Office Online Apps

To use Bing Chat in combination with the M365 Office Online Apps the way via the Edget Sidebar has to be used. Details on configuration and how IT administrators can control the feature across the enterprise are described here: Manage the sidebar in Microsoft Edge.

If the feature is enabled, it looks like this:

Now Bing Chat Enterprise can be used for all apps that are used in the browser.

The two features Chat and Compose are available for this. The Compose function offers detailed options to generate texts in the required context and style. Thus, tone, format and the length of the text can be defined. The following example shows the result for the question: "What can you do with SharePoint Online?", with the settings: Tone: Professional, Format: Paragraph and Length: Medium:
With the "Add to site" function, the generated texts can be transferred to the Office Online apps Word, Excel, PowerPoint, Outlook, Teams, SharePoint, etc. The text is inserted at the cursor position in the app. In the following example, the generated text is inserted directly into the chat in Teams via the "Add to site" button:
The other direction is also available. However, the "Chat" function must be used here. In the following example, the user has opened a mail in Outlook Online and wants to have the text of the mail translated into Spanish. To do this, the text must be highlighted. Next, Bing Chat automatically asks what should happen to the marked text. The user says "Please translate into Spanish" and Bing Chat can do the job.
To make this feature available, the "Allow access to any web page or PDF" feature must be enabled in the Bing Chat configuration using " Notification  and App Settings":

Donnerstag, 29. Juni 2023

Azure OpenAI on your own data

With the current previews, the Azure OpenAI services can now also be used for own data. The following example shows the Azure services that are required for this.

Required services:

The following services are required and must be used in combination with each other:

  • Azure OpenAI
  • Azure Cognitive Search Index

As of June 2023, Azure OpenAI is still in preview and you need to sign up for the preview to use the features. Azure Cognitive Search is available by default.

Azure Cognitiv Search

First, an instance of Azure Cognitive Search needs to be provisioned. How to do this is described here: Create an Azure Cognitive Search service in the portal.

The next step is to configure where the data that should be indexed is located. There are various options for this, all of them results in an index in the newly created Azure Cognitive Search instance. On the Overview page, you will find the link "Connect your data -> Learn more", which provides a good overview of the options:

For testing scenarios, demo data provided by Microsoft can also be used. The "Import" button takes you to the currently available options:
Especially for customers whose data is stored in M365 / SharePoint Online and Teams, the option "SharePoint Online" is very interesting. Currently, this option is still in preview and cannot be used via the UI. How Azure Cognitive Search can currently be used to index data in SharePoint Online is described here: Index data from SharePoint document libraries.

In my example I use the demo data "hotel-sample" that Microsoft provides:

The index contains the following fields:
These fields can now be used in Azure OpenAI.

Azure OpenAI

As soon as an instant of Azure OpenAI is created / available, the Azure OpenAI Studio is also available. This is where you have the options "ChatGPT-Playground":

In the "ChatGPT-Playground" you can now work either with the data from the general Language Model GPT or with your own data:
To use your own data from the first step "Azure Cognitive Search" you have to select the corresponding details:

The "Index data field mapping" in the next step is optional, but recommended because it increases the quality of the answers significantly. In the example with the "hotel-sample" demo data it looks like this:
This can then be used to generate texts based on GPT and the associated data.
Example:

Do we still need metadata when using Azure OpenAI?

Answer: The question is asked incorrectly. The question should be: Does AI have access to all relevant information. This includes metadata. The example shows the problem. Azure OpenAI does not know the information even it is there.



Mittwoch, 14. Juni 2023

Podcast on the topic of Next Level AI

I was part of two podcasts with Torben Blankertz and Michael Greth on the topic of Next Level AI.

Here are the links to the podcasts:

Einführen von Microsoft Copilot, ChatGPT und AI – mit Nicki Borell:

LINK: https://podcast.blankertz-pm.de/069-einfuehren-von-microsoft-copilot-chatgpt-und-ai-mit-nicki-borell/ 


MikeOnAI DerTalk - S01E02 mit Nicki Borell

LINK: https://youtu.be/vMVmIAL8I5k



Mittwoch, 7. Juni 2023

Next Level AI

Writing assistance, code generation, and conclusions over data - How machine learning and artificial intelligence generate and understand natural language.

The whitepaper by Dr. Michel Rath and Nicki Borell explains the current state of technology, what OpenAI and Microsoft are doing, and how interested customers can benefit from it.

The first part covers the architecture and technical details of Generative Pre-trained Transformers, or GPT for short. It covers basic concepts such as LLM's - (Large Language Model) and what a Prompt is. The whitepaper explains the difference between OpenAI, Azure OpenAI and the announced Microsoft Copilot feature.

Details:

  • Introducing ChatGPT
  • Basic terms
    • LLM
    • Prompt
  • GPT3, GPT4 and other models
  • The cooperation between Open AI and Microsoft
    • Microsoft Copilot
    • What is the difference between Microsoft Copilot and Microsoft OpenAI
  • Current state of technology - what is available, what is announced?
The second part deals with the legal aspects of the topic. How to create a guideline for employees while respecting the protection of company secrets and what needs to be considered when using ChatGPT with an eye on copyright.

Details:

  • Legal Aspects related to ChatGPT & Co.
    • ChatGPT and data protection
    • Guidelines for employees and the protection of trade secrets
    • ChatGPT and Copyright
    • The AI Regulation is coming

Download

  • Download German version: LINK
  • Download English version: LINK

Authors





Dr.Michael Rath is a lawyer, a specialist in information technology law and a partner at Luther Rechtsanwaltsgesellschaft mbH, based in Cologne. He coordinates Luther's Information Tech & Ccommunications practice area. He is also a Certified ISO/IEC 27001 Lead Auditor.




Nicki Borell is co-founder of Experts Inside, a technology consultancy focused on Azure and Microsoft 365, and the head behind the label "Xperts At Work". His focus topics are enterprise collaboration, security and compliance. The Azure OpenAI services and the GPT language model therefore fit perfectly into his work context. They build another integartion between Microsoft 365  and Microsoft's Azure services. Content generation and semantic search are thus also possible for the data within a Microsoft 365 environment in a secure and controllable way.

Mittwoch, 29. März 2023

Introducing Microsoft Security Copilot

Converting questions into actions

Key Functions:

  • Simplify the complex 
  • Catch what others miss 
  • Address the talent gap

Ask Security Copilot questions in natural language and get actionable answers.

Microsoft Security Copilot combines a Large Language Model (LLM) with a Microsoft security-specific model.

When Security Copilot receives a question from a security expert, it uses a security-specific language model to provide answers that can help assess and resolve the incident:

In doing so, the Microsoft Security Copilot response leads to a higher quality of detection and reduces the time needed to resolve the problem:
The solution is thus a kind of SOC as a Service powed by AI.

Security Copilot is a learning system, which means it is continuously improving. Users can directly provide feedback on the answers and solutions suggested by Security Copilot via the integrated interface. The Security Copilot is also able to prepare and report / document incidents:








Donnerstag, 2. März 2023

Anonymize your Microsoft 365 reports

The topic of data protection in the context of Microsoft 365 is still ongoing and not finally clarified in all details. The handling of user information and reports is not only a point from the GDPR. Other audits and ISO standards also address this point. For this reason, Microsoft 365 has been offering the option to output anonymous user names in reports instead of the actual user names. Settings -> Org Settings -> Services -> Reports:

By default, the function is active and the reports are anonymized. However, the actual log data is not changed, but the data in the reports is displayed anonymized, depending on the setting. The anonymization can thus be switched on or off and the user data in the reports change ahock: 

The setting affects the following reports in Microsoft 365:
  • Email Activity
  • Mailbox Activity
  • OneDrive files
  • SharePoint Activity
  • SharePoint Site Usage
  • Microsoft Teams Activity
  • Yammer Activity
  • Active users in Microsoft 365 Services and Apps
  • Groups Activity