What’s the Difference Between Microsoft Copilot and ChatGPT?

Introduction

In this post I go into some detail of how the different Copilots in Microsoft 365 operate in practice and show that not all the Copilots are created equal. This information could be both useful from a technical perspective but also useful from a staff training perspective. When you rollout Microsoft Copilot, people within the organisation need to understand that all the Copilots within the Office applications are not the same and are all tuned in different ways.

A Copilot is a Copilot is a Copilot?

When I first heard about Microsoft Copilot and saw the similar looking Copilot frame on the right side of the screen, I figured it was probably just a common interface that could access the data from the application you had open at the time. However, after actually getting the opportunity of playing with Microsoft Copilot in the various apps it has becomes clear that it is actually a lot more complex than that. Each of the Copilots within the apps has been tailored to respond in a context that makes sense for the type of application that you’re using. This has been achieved by the engineers at Microsoft, using various methods of prompt engineering and orchestration in the background.

I thought it would be useful to demonstrate the differences in the way the various Copilots in different apps respond to the exact same prompt. For this demo I have chosen an innocuous query that is not explicit and could be interpreted in different ways to see what happens. The query I chose was “Tell me about the weather in Melbourne”. This is not the kind of prompt you would really use in practice but is instead something that I’ve chosen to highlight the differences in the way each Copilot responds to the prompt.

Let's start by querying the OpenAI ChatGPT 3.5 model and see how this foundation model interprets this request. This will offer a comparison to see the difference that the exact same prompt will give when asking it of the various Copilots.

1. ChatGPT 3.5

You will see here that the ChatGPT foundational model has interpreted this question as a request to know the specific temperature in Melbourne right now. This is because I wasn’t explicit enough in what I had asked the model and so I didn’t get back any general information about expected temperature ranges in Melbourne.

In setting up the ChatGPT model the OpenAI team appear to have created the system to fail gracefully in these cases where it thinks it's getting asked for data that's more current than it knows about. This is an unfortunate trait of the foundation models, they only know information up to when they were finished being trained. It is interesting that it did not respond with some more generic information about what the expected temperatures are throughout the year or historical information about the weather though (keep this in mind when we get to the Word Copilot example).

2. Bing Chat

Bing Chat is geared to behave much more like a web search engine. You can see in the example above that it reached out to the web and pulled back information from various websites about what the current temperature, and upcoming temperatures, will be in Melbourne. It also gave references to websites that it got this information from.

The method used here is called a Retrieval Augmented Generation (RAG) framework, where it doesn't ask the foundation model for the answer to the question directly. Instead, Bing will first retrieve some reputable sources for the kind of information being requested and provide that data as part of the prompt to the foundation model (also often referred to as Grounding the model with data). The foundation model here has been used to interpret the retrieved data instead of using its own “knowledge” from the data it was trained on. In this case, Bing is functioning as an orchestration engine that retrieves data which it compiles into an expanded prompt that will be sent to the ChatGPT mode in addition to your original query.

3. M365 Chat

When I asked the M365 Chat interface within Teams this question, it responded that it couldn’t find the answer to the question and recommended that I use a web search. This is because the M365 Chat Copilot uses a similar Retrieval Augmented Generation (RAG) framework to Bing. Rather than searching the Internet for information on the weather in Melbourne, it attempted a Semantic Index search (Reference: https://learn.microsoft.com/en-us/microsoftsearch/semantic-index-for-copilot) across the documents, emails, chats and other data within my Office 365 tenant. I didn’t actually have any information within my tenancy on this topic at the time. As a result, M365 Chat was unable to get any information to the pass onto the foundation model to provide an answer. What is interesting to me here, is that it didn’t just ask the foundation model to have a go at telling me about the weather in Melbourne, but instead apologised for not being able to find any documents about this.

Note: In this case, the Microsoft 365 Chat Copilot was configured to only have access to internal documents and was not enabled for searching the Internet for data. This is a setting that administrators have control over: https://learn.microsoft.com/en-us/microsoft-365-copilot/manage-public-web-access

Of course, had I have had documents that contained information on the weather in Melbourne it would have been able to answer me. Below is an example of the output when there is a document containing information about the weather in Melbourne. You will see here that the RAG model has been used to retrieve the data and the document is referenced below the response:

What is also interesting about the previous response is that this information was actually generated in Word from a later example that I ran for this blog post. The data being displayed here is actually an interpretation of information previously generated by the model. I find this to be an interesting, because when data like this keeps getting recycled through these models over time, will there start to be degradation of the quality of the information? Like a photocopy of a photocopy. Here’s an interesting article that goes into some more detail on what could be the result of this in the long term: (reference: https://cosmosmagazine.com/technology/ai/training-ai-models-on-machine-generated-data-leads-to-model-collapse/). Always take care to check the information the Copilot outputs before using the information.

4. Microsoft Word

Microsoft Word is usually used to create longer form documents, as a result, Microsoft has tuned the way the foundation model is prompted when you ask it questions in Word. In the example of asking it about the weather in Melbourne, the model responded with more of a Wikipedia style response, where it attempts to go into depth about what the climate is like in Melbourne throughout the year.

This is a stark difference to the way the ChatGPT foundation model tried to answer this question. This happens by design, as Microsoft realises that this is more likely what you want in a Word document rather than the wanting to know the temperature right now. The way they do this is by taking the original query and then adding additional (“system prompt”) information to it before sending it to the foundation model. This allows them to change the output to be more like what you might want in a Word document. It’s not clear exactly what Microsoft is including in the prompt that it sends to the foundation model, as you never get to see this additional information. If you play around enough with ChatGPT you can see that adding additional text like “provide an extended response similar to a reference encyclopaedia” will cause the model to give outputs more like this. I don’t believe it’s documented anywhere exactly what Microsoft add to the prompts to get these responses as the prompt engineering is a bit of secret sauce.

5. PowerPoint

The PowerPoint Copilot is an even more interesting topic as it doesn’t just produce text, it will also add pictures and make design choices when producing its output. You can see that for our example weather query it produced a nice picture of Melbourne’s botanical gardens and skyline, creates a meaningful heading and some dot points about the weather in Melbourne. It looks pretty impressive as an output to such a basic query:

This is all the more impressive when you have some understanding of what’s going on in the background for the PowerPoint Copilot. There is an interesting paper that I found which is produced by some of the research staff at Microsoft about how this works. It can be found here: https://arxiv.org/abs/2306.03460

TLDR: For apps like PowerPoint the Copilot needs to be able to tell the application itself how to style the page in addition to just generating text. This kind of thing can be done with scripting languages which the foundation model could be used to produce (like Github Copilot), however, this method is prone to syntax errors. The researchers at Microsoft found that it was safer to create a specialised domain specific language for describing the layout of a document (more like a declarative language like is used for Terraform or PowerShell Desired State Configuration). The language, in this case, is called Office Domain Specific Language (ODSL) and is designed to use a minimal number of tokens (words) and be easily describable as an input to a foundation model. Here’s an example of the language:

1 # Inserts new "Title and Content" slides after provided ones.

2 slides = insert_slides(precededBy=slides, layout="Title and Content")

When the prompt is sent to the model it will include schema information about what the ODSL language and what the format of the desired response. The model will then respond with a description of what each slide should look like in the desired ODSL format. The response is thoroughly checked and validated to have the right format and then translated into a lower-level language by an interpreter program which then gets executed by PowerPoint. This is both very cool and crazy that the foundation models are powerful enough to do these kinds of things.

6. Outlook

When you write an email your colleagues you don’t really want to be known as the person that writes the dreaded War and Peace novel length emails. Fortunately, Microsoft are aware of this and when designing the Outlook Copilot, they took this into account. The output of this Copilot is designed to produce output that looks like, in both format and content, like an email. You can see below that the simple weather in Melbourne prompt actually created what looks and reads like an email. I must admit it did take a bit of artistic licence and go on a bit more of a ramble than I would have liked in this case though:

7. Excel

The Excel Copilot is once again quite different than the other Copilots. Asking it the weather is not exactly what it’s supposed to be used for, but I asked it anyway, because, why not?:

In excel, the Copilot is more for creating formulas and reasoning over the data that is in your spreadsheets. In the current preview version, the Copilot will only work on data that is in a defined table. This is likely to do with the fact that the data needs to be ordered in such a way to be sent as a prompt to the foundation model. In doing this the data needs to retain all the column and row information but also keeps the token count low enough to be processed. I’m not sure if it’s clear how Microsoft could process an entire very large spreadsheet (with the potential complexity of multiple pages, and scattered data, etc) through the foundation models give their token limits currently. Until they figure this out, we may be stuck with only processing data that is in defined smaller tables for the time being.

If you are wondering what the Excel Copilot can actually do though, here’s an example of how you could ask the Excel Copilot to reason over the data in a table and give you an answer:

Also, here’s an example of how you can ask the Excel Copilot for a formula for producing a Fahrenheit column from a Celsius column:

8. Microsoft Whiteboard

The Microsoft Whiteboard Copilot has another take on what it produces based on our modest weather question. It produced a bunch of sticky notes for various things that the weather could be in Melbourne. This is more contextualized toward a brainstorming type of session which is be common when using a Whiteboard:

This is once again, a fun and different take on how a foundation model can be used to produce a more context aware output for the application at hand.

The Wrap Up

As you can see, all these Copilots across the Microsoft Office apps are all very different beasts, and this is something that people within your organisation should understand in order to get the most out of Copilot product set. This is certainly something to keep in mind when training staff on the potential use cases and determining which Copilot is right for the task at hand. Cheers!

My Teams Lab

Sunday, 21 January 2024