The release of ChatGPT by OpenAI at the end of 2022 has re-shuffled the cards on the AI market. Microsoft is the largest investor at OpenAI. OpenAI's technology is therefore also the foundation of Copilot products.
Google is taking a slightly different path. Its own company, Google DeepMind Technologies Limited, has developed the Gemini solution. Google Gemini was originally called Google Bard and is the follow-up to the LLMs LaMDA and PaLM 2.
What is being compared in this test?
Overview
Google Gemini:
Google Gemini is not based on a single model, but on a series of different LLMs. Each of these LLMs has different dimensions and a different mix between efficiency and the ability to find answers.
Feature availability:
- Gemini is available as part of the Google Early Access Test Program.
- The solution is also available via a Gemini for Google Workspace add-on and for users with private accounts via Google One AI Premium.
The Gemini for Google Workspace add-on was used for this comparison.
Microsoft Copilot:
The Copilot solution from Microsoft has a little different architecture. Copilot in Edge, formerly Bing Chat Enterprise, is very similar to Google Gemini. Copilot in Microsoft 365, on the other hand, is integrated into the Microsoft 365 cloud solution and is therefore always part of an M365 subscription. Copilot in Microsoft 365 has access to the data in the tenant via the Graph interface. The permissions model, i.e. who has access to which data within Microsoft 365, is an aspect that is always respected.
In addition, Copilot in Microsoft 365 uses orchestration. Copilot knows from which app the prompt was sent, and this has an impact on the output. For example, Copilot in Word focuses on being a writing assistant, while Copilot in Excel has its benefits in formulas and diagrams. There is no such deep integration in Google Workspace with Gemini.
Comparison
Copilot in Edge (formerly Bing Chat Enterprise) & Google Gemini App
One of the major points of generative AI solutions is that there is only limited transparency about the data used to train the models. For GPT 3 there is this list from OpenAI:
- Common Crawl -> 60%
- WebText2 -> 22%
- Books1 -> 8%
- Books2 -> 8%
- Wikipedia -> 3%
Even this is only very high level and for many other models / versions there is not even that. Also for Gemini only this statement could be found: “According to Google's Terms of Service and Privacy Policy, the sources of training data for Google's Gemini AI include publicly available sources and information from Gemini apps. These are used to improve and develop Google's products, services and machine learning technologies.”
The sources on which the LLMs were trained can therefore only be determined to a very limited level and lead to curious / incorrect results over and over again.
Test 1
Prompt: “Who scored the most goals in a soccer match?”
The answer focuses purely on men's soccer. It is remarkable that the two apps provide different answers. The very simple prompt is surely also partially the reason for this.
If you ask in the dialog with the prompt: “Which woman scored the most goals?”, the apps provide the following answers:
Findings:
Both apps show similar behavior. The AIs only respond to women's soccer when asked.
Test 2
Prompt: “Can I log in to ChatGPT via Azure authentication?”
Findings:
The answers from both apps are not good / misleading. The answer from Gemini is also wrong. In general, you can log in to OpenAI and therefore also to ChatGPT with an Azure account / Entra ID.
A prompt that would work better would be, for example: “Can I use an account from Azure AD or Entra ID to log in to OpenAI / https://chatgpt.com/auth/login?”
Copilot in Word & Google Docs + Gemini for Google Workspace Add-On
Both solutions offer the feature to analyze and summarize texts as well as to create texts.
Test 3
Context: Ask me anything about this document
For this comparison, the same Word document (docx) was opened in Microsoft Word and in Google Docs. The document “A quick guide to secure Office 365.docx” describes the possibilities of securing Office 365 and monitoring and controlling access with features such as Defender for Cloud Apps etc.
Copilot in Word welcomes the user with the message “Ask me anything about this document”. The predefined prompt: “Summarize this document” generates a correct result:
Questions to the document such as “What does the document say about multifactor authentication? Should this be used?” are also answered correctly. Copilot generates in addition jump labels to the respective place in the document.
Gemini for Google Workspace Add-On welcomes the user with “Enter prompt here”. The Refine -> Select the text -> Summarize function is available to summarize the document. The result is also correct.
The feature to “chat” with the document and ask questions was only available in the early access test program for Google Workspace Labs at the time of testing (June 2024). Unfortunately, this function could not be tested with the add-on used. Here is an example from Google on how it would look like:
Findings:
The integration and therefore the usability of Copilot in Word is better than the Gemini solution with Google Docs. Example: If you use a Word version that is set to German, for example, Copilot also delivers its summary in German. Gemini does not do this when using exactly the same settings (document in English and Google Docs in German).
Test 4
Context: Describe what you would like to write
When it comes to using the apps as a writing assistant, you are greeted by Copilot in Word with the text “Describe what you would like to write”. Both solutions offer this feature. The following prompt was used for the comparison in both apps: “Write an essay about Dietrich Bonhoeffer. The text should be an overview of his life and work as well as his role in the resistance. Also include what happened after his death.”
Findings:
Both solutions provide a comparably good result.
Azure OpenAI Studio & Google AI Studio
When comparing the two products, it is noticeable that Google AI Studio is an interesting prospect, especially in terms of price and the number of tokens. The Azure solution scores points with its strategic partnership with OpenAI and the ability to use all the extensive Azure features, including security and compliance, in the context of AI solutions.
Google Gemini
- Models: Gemini 1.0 Pro, Gemini 1.0 Ultra, Gemini 1.0 Ultra Vision, Gemini 1.5 Pro, Gemini 1.5 Flash
- Features: Text generation, translation, Q&A, code completion, complex tasks, multimodal interactions, visual data processing
- Tokens: Maximum number of tokens of 1 million (for Gemini 1.5 Pro and Gemini 1.5 Flash)
- Price: Gemini 1.5 Pro is 30% cheaper than GPT-4o for input and output tokens
Azure OpenAI
- Models: GPT-4o and older GPT models such as GPT-4, GPT 3.5 etc.
- Features: Text generation, translation, Q&A, code completion, complex tasks
- Tokens: No specific maximum number of tokens specified
- Price: GPT-4o is more expensive than Gemini 1.0 Pro and Gemini 1.5 Pro
- Other aspects:
- Partnership: Azure offers OpenAI models via API, Python SDK or web interface.
- Integration into the Azure Suite
Summary
Microsoft Copilot and Google Gemini look very similar at first glance. The user interface is similar and the functionality is also similar. The price of the two solutions is also roughly the same. However, if you take a closer look, it quickly becomes clear that Copilot and Azure OpenAI are currently ahead of Google Gemini.
I have done a number of tests and these are my findings:
- Microsoft Copilot is ahead of Gemini in the quality of AI generated answers. The results are more accurate and consistent. Gemini still makes mistakes too often. As an example, see the result of Test 2
- Gemini's user interface is clean and straightforward. At first glance, Microsoft Copilot in Edge is more feature-rich but a bit more game-like than Gemini.
- Gemini integrates with Google Workspace apps, but this integration is not on the same level as Copilot in Microsoft 365. As described in the Overview chapter, Copilot in Microsoft 365 has its own architecture and is not just an add-on. Part of this architecture is also the RAG functionality, which, among other things, ensures that Copilot knows his current context. For example, the AI acts as a writing assistant in Word and supports you in Excel when writing formulas or creating diagrams. More details: How Copilot for Microsoft 365 works: A deep dive