Artificial Intelligence and Inflation Forecasts
Download → Artificial Intelligence and Inflation Forecasts
Abstract
We explore the ability of Large Language Models (LLMs) to produce conditional inflation forecasts during the 2019-2023 period. We use a leading LLM (Google AI’s PaLM) to produce distributions of conditional forecasts at different horizons and compare these forecasts to those of a leading source, the Survey of Professional Forecasters (SPF). We find that LLM forecasts generate lower mean-squared errors overall in most years, and at almost all horizons. LLM forecasts exhibit slower reversion to the 2% inflation anchor. We argue that this method of generating forecasts is inexpensive and can be applied to other time series.
1 Introduction
Forecasting inflation lies at the heart of economic decision-making, shaping the financial planning of households, guiding strategic investments by firms, and driving monetary policy. It is, however, a difficult task as equilibrium inflation is the outcome of the aggregation of individual actions that depend, among other factors, on the expectations of inflation itself.
In this paper, we investigate the potential of state-of-the-art large language models (LLMs) in generating accurate inflation forecasts.To do so, we use Google's PaLM LLM to construct retrospective inflation forecasts, which we contrast with a leading source of inflation forecasts, the Survey of Professional Forecasters (SPF), as well as with actual inflation data. We find that the LLM's forecasts have lower mean-squared errors overall in most years, and at almost all horizons, than the SPF.
These findings suggest that LLM models may provide an inexpensive and accurate alternative approach to generating forecasts of inflation, with potential applicability to other macroeconomic variables. Traditional methods for inflation forecasting range from expert surveys, individual surveys, measures of market-based expectations, to model-based expectations. Each has its strengths but also comes with limitations.
One primary source of inflation forecasts is expert surveys, such as the Philadelphia Fed’s Survey of Professional Forecasters (SPF), which pools forecasts made by a variety of leading economists and financial analysts. A second source of forecasts is surveys of individuals, such as the Survey of Consumers of the University of Michigan or the New York Fed Survey of Consumer Expectations. These surveys provide a perspective from the economic agents themselves, capturing the expectations of individuals whose collective actions as consumers and workers drive the economy. A third source is market-based expectations, such as the Break-Even Inflation rate, which is derived from the pricing of inflation-protected securities. These measures have the advantage of being available in real-time, but may also reflect other financial market factors, such as the liquidity of the underlying securities. Lastly, there are model-based expectations, such as those generated by Dynamic Stochastic General Equilibrium (DSGE) models.
Despite their utility, these sources of inflation forecasts have their limitations. They often produce large forecast errors, as we show in this paper. Furthermore, conducting expert and individual surveys, as well as setting up and running complex models, can be costly along many dimensions.
In this paper, we turn our attention to a novel technology — state-of-the-art Large Language Models (LLMs), a type of generative artificial intelligence — to understand if they can offer a new approach to forecasting inflation. LLMs, such as OpenAI's GPT-4 or Google AI's PaLM, have the potential to capture extremely complex relationships due to their scale and sophistication. However, LLMs also present their own set of challenges: They operate as a ‘black-box,’ making it difficult to understand the mechanics that underpin the output they generate.
Our goal in this paper is to evaluate the potential of these novel technologies in generating accurate inflation forecasts. Our method consists of leveraging access to the PaLM API to generate forecasts at various dates in the past, using only the information that would have been available up to such dates. For instance, to obtain an inflation forecast for the year following April 2020, we would instruct the model to generate a forecast as if the current date was April 2020, restricting its analysis to information available up to that date.
As long as the training data used to estimate the LLM is current, this method allows us to generate forecasts in real time — this is indeed the case for PaLM's LLM. Then, our main goal in this paper is to evaluate the potential of LLMs as forecasting tools. By comparing these simulated forecasts with the actual inflation data and forecasts from other sources, we can assess the accuracy and usefulness of LLMs in inflation forecasting.
Our benchmark results suggest that LLMs generate conditional inflation forecasts with lower mean-squared errors (MSE) than a more traditional source of forecasts — the SPF — for the period of analysis, which runs from 2019 to the first quarter of 2023. Not only are the LLM forecasts better when evaluated over the entire period, they are also better for almost all of the individual years in analysis and forecast horizons. We run a series of robustness and sanity checks, and discuss some of the potential weaknesses of this method vis-á-vis more traditional forecast sources.
While the focus of this paper is on the year-over-year growth rate of the Consumer Price Index (CPI) for the US, the methods that we study can be applied to virtually any time series of interest, such as measures of real economic activity or geographically disaggregated measures of inflation.
This paper is structured as follows: Section 2 provides an overview of LLMs, and how can they be used to generate conditional forecasts. Section 3 presents our baseline results and compares LLM forecasts to those of the SPF. Section 4 presents some robustness checks, and section 5 concludes.
Related Literature: This paper relates to an emerging literature that analyzes the potential of LLMs to undertake macroeconomic and financial forecasting. The exercise we focus on is similar to the one conducted by Bybee (2023), who uses the LLM GPT-3.5 to generate a “survey of economic expectations” over different macroeconomic variables. Instead of directly asking the LLM for a conditional forecast, the author feeds Wall Street Journal news articles about macroeconomic variables and asks about the potential impact of that piece of news on a quantity. He finds that the results are similar to that of standard surveys, such as the SPF in which we focus, and also exhibit deviations from full-information rational expectations that are prevalent in standard surveys. This paper builds on a literature that exploits the fact that LLMs are designed and trained to simulate human cognitive processes and can therefore be used to simulate human behavior. This point is developed in a general manner by Aher et al. (2023), who argue that LLMs can be leveraged to simulate classical experiments in psychology and linguistics. Horton (2023) specializes this argument for questions and experiments related to behavioral economics, showing that the output of LLMs is consistent with that of classical experiments in behavioral economics.
......
First, please LoginComment After ~