Sonntag, 15. September 2024

Fake it till you make it - how good are AI detectors really?

The fact is, you have to make it clear when something was created by AI. But who wants to control that and how?

It's the same age-old game as counterfeiting banknotes, counterfeiting products and so on. As soon as one side gains a new level, the other side has to follow suit or at least pretend to have caught up.

In the case of AI, we have the EU AI Act or, in Germany, the KI-Verordnung, which set out the legal framework. This states in Article 50, quote: ...shall disclose that the content has been artificially generated or manipulated: https://www.euaiact.com/article/50 

ATTENTION: This is in no way a legal advice!

If the person publishing the content does not do this, however, numerous app providers are now advertising that they can do this for you:


How do these checking apps work?

AI checkers identify various characteristics, including recurring phrases, consistent sentence structures and the absence of personal aspects in the texts. By examining these patterns, an AI detector can recognize whether content was created by humans or by an AI.
Research currently classifies these three approaches:
  • Machine learning: classifiers learn from sample texts, but need to be trained for many text types, which is expensive.
  • Digital watermarks: Invisible watermarks in text that can be recognized by algorithms. Providers of AI tools would have to insert these watermarks, which is very unlikely.
  • Statistical parameters: Requires access to probability values of the texts, which is difficult without API access.
IT journalist Melissa Heikkilä says: “The enormous speed of development in this sector means that any methods of recognizing AI-generated texts will look very old very quickly.” Source: https://www.heise.de/hintergrund/Wie-man-KI-generierte-Texte-erkennen-kann-7434812.html

OpenAI released the AI Text Classifier in early 2023 to recognize AI-generated texts. However, the recognition rate was very low at just 26%, and 9% of human texts were incorrectly classified as AI texts. Due to this insufficient accuracy, OpenAI removed the tool from the market in mid-2023. A new version of the tool is not yet available. This example shows that we should not have too high expectations of AI recognition tools.
In general, anyone can use the following aspects to decide for themselves whether a text was created by a human or by an AI:
  • AI-generated texts often are not very original or varied and contain many repetitions, while human authors vary more when writing.
  • A style with many keywords strung together could also indicate an AI as the author.
  • AI tools often make mistakes with acronyms, technical terms and conjunctions.

How well do the AI detectors work?

There are now quite a few of these apps. From free / commercial financed or to be paid, there is everything. Here are two overviews:
I did my tests with noplagiat. The tool is rated as good to very good and recognized texts that I had created for this article by Microsoft Copilot in Edge.

Test 1:
My prompt: “List the planets in our solar system. Tell which is the largest and which is the smallest. Name the largest moons. How old is our solar system? Explain how our solar system was formed. Formulate the answer as a scientific essay.

Result and rating:


Test 2:
BUT a simple and small adjustment to the prompt caused noplagiat to stumble:
PROMPT: "List the planets in our solar system. Tell which is the largest and which is the smallest. Name the largest moons. How old is our solar system? Explain how our solar system was formed. Formulate the answer as a scientific essay. Use the writing style of Stephen King."

Result and evaluation:


Only this small addition in the prompt reduced the rate from 47% to 6%. Well, you might not want to write an essay about our solar system in the style of Stephen King. However, the example clearly shows where the weaknesses of the current solutions are.

Test 3:
It also recognized texts that were 100% not created by an AI correctly. This is the abstract of my new book, which is just about to be finalized:

The next version of AI apps

Enriching the AI-generated text with content, reviewing the text and adding text passages/facts from other sources is what separates the next version of AI apps. Here it is the case that you do not receive the answer to your prompt immediately, but that it can sometimes take a few hours. With Studytexter.de, for example, up to 4 hours.
Here are three examples of such solutions:
  • https://studytexter.de/: Quote from the homepage - Your entire term paper at the touch of a button in under 4 hours. Innovative AI text synthesis - especially for German academic papers. 1000x better than ChatGPT.
  • https://neuroflash.com/de/: Quote from the homepage - Strengthen your marketing with personalized AI content. The all-in-one solution for brand-compliant content with AI, from ideation to content creation and optimization. neuroflash helps marketing teams save time, ensure a consistent message and improve creative processes.
  • https://thezimmwriter.com/: ZimmWriter is the world's first AI content writing software for Microsoft Windows. It allows you to use the AI provided by OpenAI directly on your desktop! -> Note: The app advertises 10 features that are supposed to be entered and many details that are supposed to make generated text unique.

Conclusion

Considering that humans do themselves well to check the output and adapt / reformulate it if necessary, it is currently not possible to verify reliably whether a text was created by an AI or not. 



Sonntag, 8. September 2024

AI and the productivity and quality of consultants work

The study Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality examines the impact of artificial intelligence on the productivity and quality of life of knowledge workers. This is a field study conducted by the Harvard Business School in collaboration with the Boston Consulting Group.

Key statements:

  • Experimental conditions: The study included 758 counselors who were divided into three groups: without AI access, with GPT-4 AI access, and with GPT-4 AI access plus an introduction to prompt engineering.
  • Increase in productivity: Consultants who used AI were significantly more productive. On average, they completed 12.2% more tasks and required 25.1% less time.
  • Quality improvement: The quality of the tasks supported by AI was more than 40% higher compared to the control group.
  • Different effects: Consultants with below-average performance benefited more from AI support (43% increase in performance) than those with above-average performance (17% increase in performance).
  • Limitations of AI: For tasks that were outside the capabilities of AI, consultants with AI support were 19 percentage points less successful in providing correct answers.
  • Use of AI: Two main patterns of successful AI use were identified: “Centaurs”, who divide tasks between humans and AI, and “Cyborgs”, who fully integrate their workflows with AI.

The study highlights that AI can offer significant benefits for the productivity and quality of life of knowledge workers, but also names challenges and risks, especially for tasks that are outside the current capabilities of AI.

A detailed summary can be downloaded here:

Zusammenfassung über KI und die Produktivität und Qualität von Wissensarbeitern.pdf

Recap on AI and Knowledge Worker Productivity and Quality.pdf


The linke to the complete study is this one: https://www.hbs.edu/faculty/Pages/item.aspx?num=64700

Mittwoch, 28. August 2024

Copilot in M365 & PowerPoint had some couples-therapy

UPDATES August 2024:
The article Work Smarter: Copilot Productivity Tips by Briana Taylor from August 26, 2024 is about Copilot in PowerPoint this time.

The article refers to the roadmap ID: 406170 
The article is structured as follows:
  • Tip 1: Create presentations using brand templates
  • Tip 2: Create presentations from Word & PDF documents
  • Tip 3: Add images to your presentations
At least the function behind Tip 2 has been available for some time. Focusing Tip 1, some technical requirements are necessary in order to be able to use this feature. An Organizational Asset Library (OAL) must be set up for PowerPoint, in which the PowerPoint templates (.potx files) must then be stored and maintained centrally.
How to do this is described here: Create an organization assets library.

In order to then create a PowerPoint based on your own master with the support of Copilot, this master must first be selected:

The rest is then the same as before:
The article from which the screenshot is taken, Add a slide or image to your presentation with Copilot in PowerPoint, also goes into Tip 3: Add images to your presentations:

Montag, 26. August 2024

#genAI - has pushed business to the limit even further!

We simply took a system that was already at its limit and added another layer to it!

The Great Acceleration (Die Große Beschleunigung)

This is the title of a book by Christian Stöcker (Die Große Beschleunigung: Climate change, digitalization, economic growth - how we can hold our own in an exponentially changing world | https://amzn.eu/d/a77rREs)
A key topic of the book is the exponential growth and rapid change that we are currently seeing and have seen in recent years. For people, but even more so for entire cultures or companies, it is a challenge to understand and manage such changes. The rapid pace of change in today's world has far-reaching consequences for the economy and for companies. Christian Stöcker mentions the following in the context of artificial intelligence, for example:
  • Technological disruption: Progress in technologies such as generative AI is driving change. Companies must continuously integrate new technologies in order to remain competitive.
  • Lack of skilled workers: The demand for qualified employees is increasing, particularly in areas such as IT and data analysis. 
  • New business models and processes: Digitalization often requires a complete realignment of business models and internal processes. This can lead to an increased workload and the need for constant reorientation.
  • Regulatory requirements: New laws and regulations, such as the AI Regulation, the EU AI Act, pose challenges. Companies must ensure that they meet these requirements.
  • Cultural change: The changes brought about by artificial intelligence / GenAI also require an adaptation of the corporate culture. Flexibility, a willingness to innovate and a new form of employee management are becoming increasingly important.
These facts show that in a rapidly changing world, companies not only need to work faster, but also smarter.
The article Capitalism in the mistrust trap by Michael Hüther follows the same line. The high rate of change and the constant emergence of new pseudo-innovations creates enormous pressure on companies. In the medium and long term, this can lead to an exhaustion of resources and the workforce.

Gartner, where do we stand?

The Gartner Hyper Cicle is a good indicator / orientation for innovations.
The Hyper Cycle is a graphical representation of innovative topics in five phases. It shows the maturity and acceptance of new technologies.
  • Innovation trigger: A technological breakthrough arouses interest and generates media attention. Often there are no usable products yet and the commercial feasibility is unproven.
  • Peak of Inflated Expectations: Early successes and exaggerated expectations lead to hype. Many companies show interest, but there are also many disappointments.
  • Trough of Disillusionment: Interest declines as the technology fails to meet high expectations. Some providers fail or withdraw.
  • Slope of Enlightenment: The benefits of the technology become clearer and better understood. Second and third generations of products appear and more companies begin to fund pilot projects.
  • Plateau of Productivity: The technology is widely adopted and its market relevance becomes clear. It is now widely used and brings measurable benefits.
Hyper Cycle help to separate the hype from the actual drivers of a technology and make well-founded decisions about technology investments. As far as artificial intelligence / GenAI is concerned, it looks like this.
2023 - 2024 shows we are entering the field of disillusionment / the valley of disappointment:
The article Almost every third GenAI project is discontinued also fits in with this. The article lists the following reasons, among others:
  • Poor data quality: Many projects fail due to inadequate and incorrect data.
  • Escalating costs: The development and implementation of GenAI models is expensive.
  • Unclear business value: Companies have challenges proving the business value of GenAI projects.
GenAI solutions, such as Microsoft Copilot or Google Gemini, are currently in several Hyper Cicle phases simultaneously. It is still an innovation trigger that causes amazement, especially among people who are less tech-savvy. This still leads to Peak of Inflated Expectations. Employees in this phase name use cases that GenAI solutions will not be able to achieve in the foreseeable future. Example:
  • Suppliers should be regularly evaluated by an AI with regard to their quality. The AI should analyze relevant complaints and take them into account in a summarized evaluation. In doing so, the AI should be guided by the supplier's previous assessments and automatically inform the supplier of any changes to its status.
  • The AI should evaluate and analyze commodity index data available online in order to make forecasts about the current and future price and supply situation.
  • An AI application that comprehensively checks applications for guidelines and evaluation criteria and decides whether the application should be approved. The AI also creates a detailed and legally binding justification.
If it becomes clear that such scenarios cannot be implemented by GenAI, at least not at present, the next step is Trough of Disillusionment.

GenAI as an enabling technology

The article GenAI as an Enabling Technology: Empowering Yourself and Gaining Your Employee by Dr. Jim Walsh describes how GenAI can help people increase their skills and productivity. It can enable users to perform their tasks more efficiently and successfully.
Examples:
  • GenAI supports the automation of routine tasks, the creation of content and the analysis of large amounts of data.
  • Content creation: Automated generation of texts and graphics for marketing campaigns, for example.
  • Personalization: Optimization of advertisements and customer segmentation for targeted marketing measures.
  • Chatbots and digital assistants: Support in customer service and internal processes.
  • Program code generation: Automated creation and improvement of software code.

Reality check

Unfortunately, some of the marketing promises made by the major brands are still a long way from reality. This is shown, for example, by the comparison between Microsoft Copilot and Google Gemini 
Here are two further examples that clearly show that we are still at the beginning with some of the new solutions:
Neither of these examples are general showstoppers. However, the expectation of users and companies is that new solutions will bring benefits and simplifications rather than having to fulfill conditions. This is the kind of thing that puts people off in the first place.

But - and there is almost always a “but”

There is a silver lining on the horizon. The topic of GenAI is becoming more and more mainstream. Together with LinkedIn, Microsoft has published the 2024 Work Trend Index Annual Report.
The most important findings as to why GenAI will become established were summarized as follows:
  • Employees expect AI in the workplace because they already know and use the apps from their private lives.
  • AI raises the bar for employees and breaks down career barriers.
  • A type of AI power user is emerging that will play a special role in the future.
For details see also: GPT - here to stay

The key point is...

... that GenAI is not a no-brainer in companies either. When Facebook came along and social intranets became a trend in companies at the same time, people often said: “Nobody needed training for Facebook either. So why should that be necessary for our social intranet?”
As many will remember, training and implementation concepts were necessary for social intranet projects to be successful. It's exactly the same with GenAI.

Ok, there are exceptions - here's one:
  • AI can help to minimize less demanding tasks so that employees can focus on more important and essential activities.
This is derived from the 𝙀𝙢𝙥𝙡𝙤𝙮𝙚𝙚 𝙎𝙞𝙜𝙣𝙖𝙡𝙨 survey, which is conducted every six months to gain insights into employee wellbeing and productivity. The survey results show that access to AI can increase employee productivity and engagement. Source and further details: The Key to a Thriving Workforce? A Smart Approach to AI.
Mind you, “can” and not “must”. And without an adoption plan, this will only apply to a few committed employees.

Companies and IT departments are lagging behind the trend

IT departments, innovation drivers and strategy departments in companies are often still struggling with the switch to cloud solutions with their evergreen approach. In addition, there are the aspects that were explained at the beginning of the article. And now there is also the new topic of GenAI.
Another point is that the self-awareness of employees has changed. Until a few years ago, users used the IT solutions that were made available to them. Today, the motto is increasingly: Isn't there an app for this that we can download and use? In the context of GenAI: ChatGPT <-> Microsoft Copilot or Google Gemini etc.

When users take IT into their own hands and how to deal with it

IT departments can no longer reduce themselves to technology alone. The “strategic consulting” factor within the company is the key to success and acceptance among employees. Multi-speed IT approaches such as “Information Competence Centers” have proven their worth. However, such approaches also require IT or an “Information Competence Center” to take on a new / different role in the organizational chart.
References:



Samstag, 24. August 2024

Chat with a Video

CBS News: Here are six notable passages from former President Barack Obama's keynote address Tuesday night, Day 2 of the Democratic National Convention 2024: The YouTube video isn't long. It's just 13:23 minutes long. In it, Barak Obama criticizes Donald Trump's presidency and highlights concerns about his approach to governing. Source: 6 moments from Barack Obama's speech at the 2024 DNC

Because the video is on YouTube, it would also have been an option to use the ChatGPT for YouTube app to chat with the video.

But what if it's about internal company videos or videos with confidential content that you don't want to upload to YouTube - then the combination of Microsoft Stream & Copilot in M365 is a solution for working with a video using genAI technology.

Step by step

Once the video has been uploaded to Stream, the transcription will start automatically.
Once this has been completed, prompts such as the following can be used:
  • Summarize the video
  • List the action items


Or, even though the video is in English, prompts like:
  • Fasse zusammen, was über Donald Trump gesagt wird (eng: Summarize what is said about Donald Trump)

Of course you can also watch the whole video, which in this case is only 13 minutes long, and answer the questions yourself.
At the end of the day, the solution of uploading videos to Stream and then using Copilot to work with the videos is what the article A Smart Approach to AI means with the point “A key insight is that AI helps minimize monotonous tasks so that employees can focus on more important and essential activities”.

Dienstag, 20. August 2024

Retrieval-Augmented Generation - The metasearch engine in the age of AI

What is Retrieval Augmented Generation / RAG?

A nice analogy that also makes it clear what RAG is, is the concept of a metasearch engine. Here, the search query is forwarded to several other search engines. The results of all the requested services are then collected, processed and made available to the user. RAG is a technique in which an AI model is combined with other data sources in addition to the data in the LLM (Large Language Model) in order to generate more precise and contextually relevant answers. This is therefore a very similar approach to the metasearch engine. Even the two schematic diagrams of the technologys are similar:
RAG is used in this way in Microsoft 365 Copilot. To extend the capabilities of the AI, information is retrieved from various data sources and integrated into the response generation. This enables Copilot not only to access pre-trained data, but also to use current and specific information from other sources, including the data in the M365 Tenant. Access is via the Microsoft Graph. This also ensures that the underlying permission concept is always respected by the AI.

Copilot in Microsoft 365 uses RAG - this cannot be customized

In Microsoft 365 Copilot, RAG is used to improve responses to user queries. Copilot can access various data sources, such as documents, emails, Teams chats, etc., to provide well-grounded and accurate answers.
This also determines which functions / roles Copilot provides in the respective apps.
Examples:
  • Word: Generate text with and without formatting in new or existing documents.
  • Excel: Suggestions for formulas, chart types and insights for data in Excel sheets.
  • PowerPoint: Create a presentation from a prompt or a Word file.
Complete overview:

Now we have GraphRAG - that can be customized

The article Unlocking LLM discovery on narrative private data describes GraphRAG, a new method from Microsoft Research that extends the capabilities of large language models (LLMs) to access and analyze your data.
GraphRAG combines LLM-generated knowledge graphs with machine learning to improve document analysis performance, for example. This method shows significant improvements in answering complex questions compared to standard approaches.

A key benefit of GraphRAG is its ability to identify and understand topics and concepts in large data sets, even if the data was not previously known to the LLM. Here are some practical use cases for this technology:
  • Information extraction: GraphRAG can be used to extract specific information from large document collections or databases.
  • Content generation: GraphRAG helps to create content that requires in-depth contextual knowledge.
  • Customer support: GraphRAG can improve customer support by accessing a knowledge base and providing accurate answers to customer queries.
  • Knowledge management: In large organizations, GraphRAG can help to make efficient use of existing knowledge by retrieving and consolidating relevant information from different departments and documents.

Quickstart

To get started with the GraphRAG system (https://github.com/microsoft/graphrag), it is recommended to use the Solution Accelerator package (https://github.com/Azure-Samples/graphrag-accelerator). This offers a user-friendly end-to-end solution based on Azure resources, quote: One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure
The graphic shows, for example, the following sources for own solutions and GraphRAG:
  • Azure Blob Storage
  • Cosmos DB
  • Azure OpenAI
  • Azure AI Search / Vectorstore
  • Container Registry
  • Application Insights

As described on GraphRAG's GitHub page, Prompt Tuning options can also be used to customize the solution to your needs and use cases:

Samstag, 3. August 2024

Turn your spaces into places

The preview for Microsoft Places has been available for some time now. However, it took a while for all the features to work.
All in all, it is a bit fiddly to activate and set up the preview. The steps are described in the following Microsoft articles:
Frank Carius has put it all together in one article: https://www.msxfaq.de/cloud/funktionen/microsoft_places.htm Thank you for that!

And of course Copilot in Microsoft 365 will also be able to work with the data from Microsoft Places. The data is stored in the backend in Exchange Online and is therefore available to Copilot via the Microsoft Graph. Prompts such as the following can then be realized by users.
  • I need a room in the office in Berlin to meet Oliver on August 9th. What are the possibilities?
  • Show me the available desks in the office in New York at Time Square for September 17. Please also list which desk has which equipment.
Source and further details on Microsoft Places, Copilot and Teams Rooms: AI brings new life to flexible work with Microsoft Places

Microsoft Places also offers new possibilities completely independently of Copilot. The feature is also integrated into Outlook, for example. There, the Places Finder is available in the Outlook calendar to schedule meetings:
In order to activate and use the preview of Microsoft Places, one of the following license packages must be available:
  • Microsoft 365 Business Basic
  • Microsoft 365 Business Standard
  • Microsoft 365 Business Premium
  • Microsoft 365 or Office 365 (E1, E3, E5)
  • Microsoft 365 or Office 365 (A1, A3, A5)
  • Microsoft 365 Frontline Worker (F1, F3)

Montag, 29. Juli 2024

Google Gemini compared with Microsoft Copilot

The release of ChatGPT by OpenAI at the end of 2022 has re-shuffled the cards on the AI market. Microsoft is the largest investor at OpenAI. OpenAI's technology is therefore also the foundation of Copilot products.

Google is taking a slightly different path. Its own company, Google DeepMind Technologies Limited, has developed the Gemini solution. Google Gemini was originally called Google Bard and is the follow-up to the LLMs LaMDA and PaLM 2.

What is being compared in this test?

Microsoft Copilot

Google Gemini

Copilot in Edge

https://www.bing.com/chat

Google Gemini

https://gemini.google.com/app

Copilot in Microsoft 365 / Microsoft Word

https://www.office.com/chat

Gemini for Google Workspace add-on

https://workspace.google.com/solutions/ai/

Overview

Google Gemini:

Google Gemini is not based on a single model, but on a series of different LLMs. Each of these LLMs has different dimensions and a different mix between efficiency and the ability to find answers.

The official homepage of Gogole Gemini is this one: https://blog.google/technology/ai/google-gemini-ai

Feature availability:
  • Gemini is available as part of the Google Early Access Test Program.
  • The solution is also available via a Gemini for Google Workspace add-on and for users with private accounts via Google One AI Premium.
The Gemini for Google Workspace add-on was used for this comparison.

Microsoft Copilot:

The Copilot solution from Microsoft has a little different architecture. Copilot in Edge, formerly Bing Chat Enterprise, is very similar to Google Gemini. Copilot in Microsoft 365, on the other hand, is integrated into the Microsoft 365 cloud solution and is therefore always part of an M365 subscription. Copilot in Microsoft 365 has access to the data in the tenant via the Graph interface. The permissions model, i.e. who has access to which data within Microsoft 365, is an aspect that is always respected.
In addition, Copilot in Microsoft 365 uses orchestration. Copilot knows from which app the prompt was sent, and this has an impact on the output. For example, Copilot in Word focuses on being a writing assistant, while Copilot in Excel has its benefits in formulas and diagrams. There is no such deep integration in Google Workspace with Gemini.

Comparison

Copilot in Edge (formerly Bing Chat Enterprise) & Google Gemini App

One of the major points of generative AI solutions is that there is only limited transparency about the data used to train the models. For GPT 3 there is this list from OpenAI:
  • Common Crawl -> 60%
  • WebText2 -> 22%
  • Books1 -> 8%
  • Books2 -> 8%
  • Wikipedia -> 3%
Even this is only very high level and for many other models / versions there is not even that. Also for Gemini only this statement could be found: “According to Google's Terms of Service and Privacy Policy, the sources of training data for Google's Gemini AI include publicly available sources and information from Gemini apps. These are used to improve and develop Google's products, services and machine learning technologies.”
The sources on which the LLMs were trained can therefore only be determined to a very limited level and lead to curious / incorrect results over and over again.

Test 1

Prompt: “Who scored the most goals in a soccer match?
The answer focuses purely on men's soccer. It is remarkable that the two apps provide different answers. The very simple prompt is surely also partially the reason for this.
If you ask in the dialog with the prompt: “Which woman scored the most goals?”, the apps provide the following answers:
Findings:
Both apps show similar behavior. The AIs only respond to women's soccer when asked.

Test 2

Prompt: “Can I log in to ChatGPT via Azure authentication?

Findings:
The answers from both apps are not good / misleading. The answer from Gemini is also wrong. In general, you can log in to OpenAI and therefore also to ChatGPT with an Azure account / Entra ID.

However, that was not a good prompt either. (PS: Prompt Engineering: https://platform.openai.com/docs/guides/prompt-engineering 😊 )
A prompt that would work better would be, for example: “Can I use an account from Azure AD or Entra ID to log in to OpenAI / https://chatgpt.com/auth/login?

Copilot in Word & Google Docs + Gemini for Google Workspace Add-On

Both solutions offer the feature to analyze and summarize texts as well as to create texts.

The “Gemini for Google Workspace Add-On” was used in Google Docs: https://workspace.google.com/u/0/marketplace/app/ai_assist_for_gemini_in_sheets_docs_and/985356259375
Copilot in Microsoft 365” was used in Microsoft Word: https://www.microsoft.com/de-de/microsoft-365/microsoft-copilot

Test 3

Context: Ask me anything about this document

For this comparison, the same Word document (docx) was opened in Microsoft Word and in Google Docs. The document “A quick guide to secure Office 365.docx” describes the possibilities of securing Office 365 and monitoring and controlling access with features such as Defender for Cloud Apps etc.
Copilot in Word welcomes the user with the message “Ask me anything about this document”. The predefined prompt: “Summarize this document” generates a correct result:
Questions to the document such as “What does the document say about multifactor authentication? Should this be used?” are also answered correctly. Copilot generates in addition jump labels to the respective place in the document.
Gemini for Google Workspace Add-On welcomes the user with “Enter prompt here”. The Refine -> Select the text -> Summarize function is available to summarize the document. The result is also correct.
The feature to “chat” with the document and ask questions was only available in the early access test program for Google Workspace Labs at the time of testing (June 2024). Unfortunately, this function could not be tested with the add-on used. Here is an example from Google on how it would look like:
Findings:
The integration and therefore the usability of Copilot in Word is better than the Gemini solution with Google Docs. Example: If you use a Word version that is set to German, for example, Copilot also delivers its summary in German. Gemini does not do this when using exactly the same settings (document in English and Google Docs in German).

Test 4

Context: Describe what you would like to write

When it comes to using the apps as a writing assistant, you are greeted by Copilot in Word with the text “Describe what you would like to write”. Both solutions offer this feature. The following prompt was used for the comparison in both apps: “Write an essay about Dietrich Bonhoeffer. The text should be an overview of his life and work as well as his role in the resistance. Also include what happened after his death.

Findings:
Both solutions provide a comparably good result.

Azure OpenAI Studio & Google AI Studio

Even before Copilot, the Azure OpenAI feature was available from Microsoft. Google AI Studio is the counterpart to this solution.
When comparing the two products, it is noticeable that Google AI Studio is an interesting prospect, especially in terms of price and the number of tokens. The Azure solution scores points with its strategic partnership with OpenAI and the ability to use all the extensive Azure features, including security and compliance, in the context of AI solutions.

Google Gemini
  • Models: Gemini 1.0 Pro, Gemini 1.0 Ultra, Gemini 1.0 Ultra Vision, Gemini 1.5 Pro, Gemini 1.5 Flash
  • Features: Text generation, translation, Q&A, code completion, complex tasks, multimodal interactions, visual data processing
  • Tokens: Maximum number of tokens of 1 million (for Gemini 1.5 Pro and Gemini 1.5 Flash)
  • Price: Gemini 1.5 Pro is 30% cheaper than GPT-4o for input and output tokens
Azure OpenAI
  • Models: GPT-4o and older GPT models such as GPT-4, GPT 3.5 etc.
  • Features: Text generation, translation, Q&A, code completion, complex tasks
  • Tokens: No specific maximum number of tokens specified
  • Price: GPT-4o is more expensive than Gemini 1.0 Pro and Gemini 1.5 Pro
  • Other aspects:
    • Partnership: Azure offers OpenAI models via API, Python SDK or web interface.
    • Integration into the Azure Suite

Summary

Microsoft Copilot and Google Gemini look very similar at first glance. The user interface is similar and the functionality is also similar. The price of the two solutions is also roughly the same. However, if you take a closer look, it quickly becomes clear that Copilot and Azure OpenAI are currently ahead of Google Gemini.
I have done a number of tests and these are my findings:
  • Microsoft Copilot is ahead of Gemini in the quality of AI generated answers. The results are more accurate and consistent. Gemini still makes mistakes too often. As an example, see the result of Test 2
  • Gemini's user interface is clean and straightforward. At first glance, Microsoft Copilot in Edge is more feature-rich but a bit more game-like than Gemini. 
  • Gemini integrates with Google Workspace apps, but this integration is not on the same level as Copilot in Microsoft 365. As described in the Overview chapter, Copilot in Microsoft 365 has its own architecture and is not just an add-on. Part of this architecture is also the RAG functionality, which, among other things, ensures that Copilot knows his current context. For example, the AI acts as a writing assistant in Word and supports you in Excel when writing formulas or creating diagrams. More details: How Copilot for Microsoft 365 works: A deep dive

Samstag, 1. Juni 2024

GPT - here to stay

GPT = Generative Pre-trained Transformer

  • G = Generative -> An output is generated
  • P = Pre-trained -> The model was pre-trained
  • T = Transformer

OpenAI published Chat GPT in November 2022. In June 2023, I wrote my first blog post on this topic: Next Level AI. A lot has happened in the meantime, there have been further versions and new models such as GPT-4, GPT-4o and small language models such as Phi-3. Nevertheless, it is still true that ChatGPT and therefore services such as Microsoft Copilot are not intelligent in the true sense of the word. Nevertheless, they are very helpful and that is why they are here to stay.

To ensure that the models generate the most useful results from the start, i.e. the “G” for generative, in the name GPT, they are pre-trained, i.e. the “P” for pre-trained, in the name GPT.

These models are trained using deep learning. Random values are used to generate an output. This calculated output is then checked against an output that should ideally have been calculated. The model contains a feature that can be used to return the error/deviation from the ideal result as a correction. This means that it is trained in such a way that the correct solution is now likely to be produced if the same input is used again. It is therefore transformed. The “T” in the name GPT.

Details and further information: https://en.wikipedia.org/wiki/Generative_pre-trained_transformer

According to Gregory Bateson's learning theory, this corresponds to so-called “Zero-Order”, also known as Try & Error. But deep learning sounds better 😉.

Users report that Microsoft Copilot answers them in a friendly way when they ask nicely and in a rude way when their prompt was rude. This effect can also be explained by the way this technology works. Pre-processing takes place before the prompt is sent to the LLM. Details are described here: Microsoft Copilot for Microsoft 365 overview. Something similar happens with OpenAI / ChatGPT. The prompt, as entered by the user, remains as the baseline. So if the prompt is formulated in an unfriendly way, the response will also correspond to this tenor. The orchestration / grounding has no influence on this. Quote in the context of Copilot:

Copilot then pre-processes the input prompt through an approach called grounding, which improves the specificity of the prompt, to help you get answers that are relevant and actionable to your specific task. The prompt can include text from input files or other content discovered by Copilot, and Copilot sends this prompt to the LLM for processing. Copilot only accesses data that an individual user has existing access to, based on, for example, existing Microsoft 365 role-based access controls.

This phenomenon, that the Copilot answer is based on the language of the prompt entered by the user, is therefore not related to the next stage of learning according to Gregory Bateson, protolearning or even deuterolearning.

  • Protolearning can be regarded as simple association.  I learn that when I see green, I go, and when I see red, I stop.
  • Deuterolearning is a learning of context.  If you reverse the association, how long does it take for the organism to adapt? 

Source: https://www.aaas.org/taxonomy/term/9/protolearning-deuterolearning-and-beyond 

In fact, you can even tell ChatGPT and Copilot which role and style it should use. Example: Please formulate a reply to this e-mail and use a very friendly style.

Here to stay

Together with LinkedIn, Microsoft has published the 2024 Work Trend Index Annual Report. It identifies the following four key points:

  1. Employees want AI at work - and they won’t wait for companies to catch up.
  2. For employees, AI raises the bar and breaks the career ceiling.
  3. The rise of the AI power user - and what they reveal about the future.
  4. The Path Forward
The first point here is the most important and clearly differentiates AI from pseudo-trends such as Blockchain or Virtual Reality. ChatGPT was disruptive at the time of its release in November 2022. Just like Apple with the first iPhone in 2007, OpenAI created something that did not exist at this level before. This version of generative AI could be used by a normal user who had no special knowledge of the technology and produced meaningful and useful outputs. Example: Act as a travel guide and tell me what I should see in Rome. The output is certainly helpful when it comes to planning a trip to Rome.

Employees want AI at work

Just like the iPhone, generative AI applications are currently mostly going viral in companies. Employees are familiar with solutions such as ChatGPT or the video creator HyGen from their private lives. They have heard about them from friends or played around with them at home. HyGen's claim sums it up: “In just a few clicks, you can generate custom videos for social media, presentations, education and more.

Unless you work in marketing or in the PR department, social media usually refers to a private context. Presentations, education and more - is the bridge to business.

And they won’t wait for companies to catch up

The 2024 Work Trend Index Annual Report describes the phenomenon that every user knows: Professionals aren't waiting for official guidance or training - they're skilling up. In other words: What works is also used. It doesn't matter whether the company has officially introduced such a solution or whether you have to use your private access to OpenAI, HyGen or other apps.

The 2024 Work Trend Index Annual Report also sums up the impact of these trends: “For the vast majority of people, AI isn't replacing their job but transforming it, and their next job might be a role that doesn't exist yet”
The report also provides examples and scenarios from users:

How I use AI
  • I research and try new prompts
  • I regularly experiment with different ways of using AI
  • Before starting a task, I ask myself, “could AI help me with this?
How AI impacts my experience at work
  • AI helps me be more creative
  • AI helps me be more productive
  • AI helps me focus on the most important work

The path to the future

The opportunity for companies is to channel employee’s enthusiasm for AI into corporate success. This will look different for every company, but there are some general starting points:
  • Identify the business context of a problem or challenge and then try to use AI to solve it.
  • Take a top-down and bottom-up approach. Ask both your employees and the management in the company about their use cases with AI.
  • Empowering employees: AI in a business context is not intuitive. Factors such as the AI Regulation / the EU AI Act, the GDPR and topics such as who has access to which information are important here.


Donnerstag, 23. Mai 2024

How many Copilots are there? Fewer than you think

Why does Copilot in Microsoft 365, for example, also use data that is integrated into the M365 search via a connector, but Copilot in Word does not? Or to be more general: What is the difference between Copilot in Word and Copilot for Microsoft 365? Let's just ask him ourselves.
Prompt in Copilot in Word: What makes you different from Copilot in Microsoft 365?
The prompt was used with the "Draft with Copilot" feature:


The detail "Copilot in Word is more suitable for users who need more specific and detailed suggestions and feedback on their writing" in the generated text makes the difference clear. It says that Copilot in Word is a writing assistant. The specification that Copilot should act as a writing assistant here is like telling a prompt in ChatGPT, that the AI should act as a travel guide and that the focus should be on suggestions for a weekend in Rome.

The technology behind this is called Retrieval Augmented Generation. In general, every prompt that a user enters is pre-processed by the Copilot Orchestration. This is where the Retrieval Augmented Generation architecture is used. Although this step is not mentioned by name in the architecture diagram, it is explained in this video by Mary Pasch, Principal Product Manager at Microsoft for Copilot: How Copilot for Microsoft 365 works: A deep dive.
A term that is also frequently used is "grounding". In the end, it means the same, i.e. adapting the prompt entered by the user, respecting the context of the used app an combine these factors to achieve the best possible result. Details on this are shown in this overview:
Source and further details can be found in this article from Microsoft: Microsoft Copilot for Microsoft 365 overview.

Retrieval Augmented Generation (RAG)

Depending on the app from which the prompt is sent, the Retrieval Augmented Generation technology decides which information sources are used to generate an answer with the help of the LLM. This aspect, called Information Retrieval System, can also be used with Azure AI Search. The article Retrieval Augmented Generation (RAG) in Azure AI Search describes how the technology works, which is also used by Copilot Orchestration.


How Copilot for Microsoft 365 decides which plug-in to use

There is also a similar effect when using plug-ins. How does Copilot decide which Plug-Ins to use to respond to a prompt? This also happens in the Copilot orchestration. The article How Copilot for Microsoft 365 decides which plugin to use describes what should be done when creating Plug-Ins so that Copilot Orchestration can evaluate them properly. The app description is located in the app manifest:


Roundup

Although this does not answer the initial question of how many Copilots are there. But it is now clear that there are far fewer than you might think. Copilot in Microsoft 365, Copilot in Word, Copilot in Outlook etc. are all the same engine, which is only used differently through orchestration and pre- and post-processing. Solutions such as Copilot in GitHub etc., however, are independent solutions.