What are GPTs good for?

This post was originally written for 101 Ways

Introduction

ChatGPT, OpenAI, Dall-e. Midjourney etc. have been in the news a lot since the beginning of 2023. It would be impossible to miss them if you have any interest in data, technology, knowledge work, or automation. But what are they and what can we use them for in business?

What are GPTs?

A GPT (Generative Pre-trained Transformer) is effectively a sequence predictor - if you train a GPT on a set of data and then give it a new example it will try to continue the example by generating the data it predicts would occur next. A system like this when trained on a huge dataset of natural language (e.g. a complete crawl of all public internet documents) is called a Large Language Model (LLM).

In an LLM the text corpus used to train the system is broken up into tokens where a token is roughly a word, but also word parts and words with punctuation. Each token is represented by a set of numbers - a “vector”. Each token means what other tokens like it mean - i.e. where the vector contains ‘similar’ numbers - there is no pre-defined knowledge about verbs, noun-phrases etc. or any kind of definitions of terms.

In the 2-dimensional layout above each concept would be represented by a vector of 2 numbers, and you could make a good guess at what the 2 axes represent. But where would you place “a table”, “the colour red”, “cold”? If you have enough dimensions then it’s possible to represent all concepts well in an efficient way. ChatGPT for example builds an embedding - a vector database - with 1,536 dimensions.

Along with this embedding, the trained system produces an “Attention matrix” that describes how important tokens are to each other in a sequence and how they relate to each other¹.

Together these form a “Generative Pre-trained Transformer” (GPT) that, when given an input sequence of tokens, is able to make a probabilistic guess at what the next token should be, and the next, and so on - this is the basis of the various “Chat” GPT systems and can be very powerful, allowing users to build systems that can: translate texts between languages, answer questions (based on tokens seen in the training corpus), compose seemingly new poems, etc.

Aside: “GPT” is a general architecture spawned by the Attention is all you need paper. They were used for LLMs initially but the company OpenAI slightly skewed the landscape by calling their product “ChatGPT” - so there’s now a common misconception that GPT and LLM are synonymous, whereas really the GPT architecture is applicable to many domains as we’ll see below.

But are GPTs “Intelligent”, what can we use them for, and what are their limitations?

What is Intelligence?

Thinking fast/slow - System-1 vs System-2

Daniel Kahneman popularised an understanding of intelligence as a combination of two types of behaviour:

System 1 operates automatically and quickly, with little or or no effort and no sense of voluntary control.

System 2 allocates attention to the effortful mental activities that demand it, including complex computations. The operations of System 2 are often associated with the subjective experience of agency, choice and concentration.

When we think of ourselves we identify with System 2, the conscious, reasoning self that has beliefs, makes choices, and decides what to think about and what to do. … The automatic operations of System 1 generate surprisingly complex patterns of ideas, but only the slower System 2 can construct thoughts in an orderly series of steps … circumstances in which System 2 takes over, overruling the freewheeling impulses and associations of System 1. Thinking, Fast and Slow: Daniel Kahneman

GPT-based tools - that is, generative sequence predictors - are wholly an automation of System-1 thinking.

Think of the following sequence of exchanges:

Person: What do you get if you multiply six by nine?

Bot: Forty two

Person: I don’t think that’s right

Bot: I’m sorry, let me correct that

It may seem that the Chatbot has been induced into performing some kind of reflection on its answer but, at the time of writing, that is not the case. The first 3 exchanges were simply collected into a prompt and presented back to the bot verbatim for further sequence prediction: in the current state of chatbot technology the forth exchange, the last response from the bot, is still just a System-1 creation. Even a Mixture-of-Experts model is not doing any self awareness or self reflection.

These programs differ significantly from the human mind in their cognitive evolution, limitations, and inability to distinguish between the possible and impossible. They focus on description and prediction, rather than causal explanation, which is the mark of true intelligence, and lack moral thinking and creative criticism.

Spookily enough, the above text was auto-generated by an AI summarising a much longer essay, which was a response to the original “Noam Chomsky: The False Promise of ChatGPT” (unfortunately behind a paywall: the Chomsky essay opines that ChatGPT and the like are nothing more than high-tech plagiarism machines).

This lack of self-reflection - lack of System-2 thinking - is a serious problem, as I’ll discuss in the section on “Hallucinations”.

Pattern Recognition and Hallucinations

GPT systems generate text by making a weighted-random selection from the set of the most likely next token, and then the next, and so on. But “most” likely doesn’t necessarily imply how likely that token would be, and the GPT has no way of checking (an arbitrary value limit is likely to be different as you move around the vector space). So, under some circumstances a GPT can produce complete fiction - known as a “hallucination”.

(Eugene Cernan was the 11th person on the moon, the rest is roughly correct).

So when do GPTs respond with reality and when do they hallucinate? Here’s the thing:

A GPT is actually hallucinating ALL the time!

It’s just that sometimes the hallucinations correspond closely enough with the human’s perception of reality that the human reader thinks it’s true and intelligent…

Human minds are pattern-recognition engines generously attempting to make sense of everyday reality. The impact of this includes the effect of Pareidolia where you see faces, or other objects, in random patterns.

Is this The Beatles? Che Guevara? Someone else?
Is the toast hallucinating or is it just being toast?

So GPT chatbot systems do seem highly intelligent to the casual observer - and even pass some forms of the Turing test - but, like chess playing machines, now we know how they work it’s clear they’re not a general form of intelligence. It’s a fluke of having a large enough training dataset that the “probable next token” calculations are matching our expectations closely.

In the chat hallucination example above it may be that when counting the people who have ever been on the moon the number 11 is the biggest number that ever occurs. So maybe the GPT has formed some kind of counting model for moon visitors where 97th is also regarded as the biggest number. But is that true or am I just anthropomorphising with my guess? Without studying the weights in the model - a notoriously difficult task - it’s impossible to know. So, likewise, any claims about GPTs having full AGI (Artificial General Intelligence) should probably be met with scepticism.

The trick with GPT-based systems is to keep their use strictly within the domain they’re trained for, in this way there’s a high probability their hallucinations will actually match reality. The question of how to guarantee this is currently undecided. In the next section we’ll cover some use cases that may be surprising but are constrained well enough to be confident of a good accuracy.

Surprising GPT/LLM Applications

Weather Prediction

A number of mechanical weather prediction systems until recently have been based on Computational Fluid Dynamics (CFD) simulations - where the world is broken up into cells and the Newtonian mechanics of gas and energy flows are modelled mathematically. This is based on a physical understanding of nature and is reasonably effective though hugely costly in computation effort - for example the UK weather forecasting supercomputer had 460,000 cores and 2 Petabytes of memory: The Cray XC40 supercomputing system - Met Office A weather forecast 1 week ahead is actually pretty remarkable given the complexity of the problem.

But it’s possible that a transformer style deep-learning system, with no pre-knowledge of physics, can simply use a vast amount of historically measured weather data to predict a sequence of future weather events given a recent set of conditions. In this model the surface of the earth is broken up into a grid of columns, each column being a vector of numbers that represent physical values like temperature at certain heights, wind flow, moisture, rain at the surface etc. and some relative location information. There is data for this, measured and imputed, at 1 hour intervals for 40 years.

This particular research effort has been going for around a year, but has already shown predictions at least 70% as accurate as the CFD technique but 10,000x cheaper to run.

Solving geometry problems

Geometry problems can sometimes be solved mechanically simply by exhaustively searching through all the inductive steps from a given starting set of problem descriptions. But many cannot be solved this way. Often it requires some form of “intuitive” step, that experienced human solvers are good at, to add a new descriptive item to the problem (e.g. “let’s draw a line from this point to the opposite side of the triangle to form two right-angle triangles”) which makes the problem somehow easier to solve.

The AlphaGeometry system is able to do that “intuitive” step. By being trained on a large set of geometry solutions, it’s able to read a previously unseen problem and make suggestions for constructive additions that make the task of the mechanical solver possible.

Extracting Knowledge Graphs

You may have a large knowledge base written in vague and unstructured text - e.g. wikipedia - but want to represent it in a way that is deterministic that can be searched and reasoned about mechanically - e.g. Resource Description Framework (RDF), Semantic Web, Neo4j graph database, etc.

This is effectively a translation problem for an LLM. Similar to translating English into French, we can train an LLM with e.g. a large corpus of Cypher code so it can form a coding language model along with the natural language, and also some example translations of text to code so it can form a mapping prediction model. Note that the major LLMs like ChatGPT are trained on all publicly available text, including github, so have probably already been exposed to these graph languages.

How to use GPTs practically

The moving parts

There are 4 major categories of subsystem to deal with:

Large Language Model - built out of a massive corpus of text, images, videos or other data source, e.g. a full scan of the entire Internet. Billions of documents and data points, and costs millions of dollars to produce. There are only a few of these (from OpenAI, Meta, etc.)
Parameter Efficient Fine Tuning (PEFT)² - using a largish corpus (perhaps 100s-1,000s of examples) to produce a tweaking layer on top of an LLM. These cost only a few thousand or hundreds of dollars to produce and, therefore, there are an unmanageable amount of options to choose from (e.g. around 400,000 models stored in open-weight repositories)
Vector (Embedding) Database - can be built out of the corpus of documents from within your business, very cheap to produce.
Prompt Engineering - the details of the prompt sent to the GPT system: what to include in the “context window”

All integrations with an AI system have to navigate through at least these 4 parts.

Performing a full training of a base model is outside the reach of all but the largest business or organisation - only companies the size of Google, Meta, X/Twitter, OpenAI, Apple etc. can even contemplate this.

Usually a business needs to only consider the last 3 (i.e. basing the system on a pre-trained LLM base model) and, more commonly, even organising the training to fine-tune a base model is unnecessary - leaving the business to either produce an embedding database from a corpus of documents if required, or just engineering the prompts required to use an available model.

Selection of the most suitable base-model or fine-tuned model is very important though, this is just a small sample of the options available:

General
- ChatGPT
- Claude
- Llama
- HuggingChat
- Mixtral
Instruction
- Aligning language models to follow instructions
- Training language models to follow instructions with human feedback
Time series
- TimeGPT
Coding
Etc.
- The best AI chatbots in 2024

There are a number of commercial integrations, that have fine tuned a base model, that provide a conversational “Co-pilot” of some kind. Here are a few examples:

Hallucinations are a problem with many of these tools though. They can act as very well-read interns, doing lots of tedious work very quickly, but it’s important to have a competent human in the loop to edit their output. Certainly these systems shouldn’t be in a position to make promises on behalf of a company: e.g. via a public-facing “live chat” automated service agent, or by sending automated follow-up emails.

Prompt engineering

Prompt engineering involves crafting precise and effective instructions or prompts to guide AI systems in generating desired outputs. It’s like giving specific instructions to a highly intelligent assistant, teaching it how to respond in a way that aligns with our intentions.

The exploratory nature of Prompt Engineering makes it very similar to Data Science, but the tools and skills needed are different as one drives the tools using normal natural language.

Chat with your Docs & RAG

It’s possible to train an LLM/GPT on the contents of an internal knowledge base of documents, either via PEFT to produce an enhanced LLM or by creating an “Embedding” vector database.

The resulting models can then be queried conversationally. But the trouble is that when querying the resulting LLM model you have no idea if the GPT is telling the truth or hallucinating a completely fictitious answer.

So another option available is known as Retrieval-Augmented Generation. In the RAG architecture the response is generated in two phases:

In the first the user’s query is read by an LLM and a search query is generated against your document store to find any documents that may be related to the query
In the second phase a new query is constructed by first reading in all the documents found in phase 1 and then instructing the LLM to produce an answer based on that content (though it will still use the base model as a language foundation, so hallucination is still possible).

This technique requires the least amount of pre-training of models or embeddings, but does require an advanced technique for prompt engineering. Another advantage of RAG is that if no documents are found in the search phase then the LLM can be instructed to respond with an “I don’t know” style answer rather than making something up from the contents of the base model.

Mechanised queries

An exciting area of AI co-pilot integrations is to provide a conversational interface to the task of building mechanical deterministic searches into a data store.

If you have a knowledge management system of any kind that you want to search or visualise in some way, there is likely a query language, report builder, dashboard builder, or some other tool that lets you do that. But often these tools, by necessity, are expressive and complex and require engineering expertise to use correctly.

But LLM GPT systems are able to construct correct code, queries, or configurations based on a conversational input from a user.

So a powerful integration is to use the LLM GPT to generate a query or report generation config and then use that to run queries on the knowledge base. In this way you at least know that the results of the query are deterministic and factually correct: the data in the knowledge base has not been summarised or re-interpreted by a model in any way, the query is being run on source data and the results returned verbatim.

It may be that the generated query isn’t quite what the user meant, but it’s possible to have a sequence of conversational exchanges with a GPT that refine the query until you do see what you need. The impact is like having an expert programmer as a personal assistant - one that’s happy to help in any way as you explore ideas.

Examples of this include:

Summary

Generative AI is an extremely new field in software development. There is a lot of hype around levels of intelligence of these systems, but equally their shortcomings may be overstated. The “Plateau of productivity” has not been reached yet and there are many exciting developments to be had.

Companies are pouring resources into research, snapping up top talent, and striking strategic partnerships to gain an edge. In this race, the winners aren’t just the ones with the flashiest technology; they’re the ones who can translate AI prowess into real-world impact. The AI race in PLG

If you have any comments or feedback about this article, please use the Linkedin thread