# Models and surprise

Hadley Wickham is a statistician and programmer and the creator of popular R packages such as ggplot2 or dplyr. His status in the R community has risen to such mythical levels that the set of packages he created were called the hadleyverse (renamed to tidyverse).

In a talk, he describes what he considers a sensible workflow and explains the following dichotomy between data visualization and quantitative modeling:

But visualization fundamentally is a human activity. This is making the most of your brain. In visualization, this is both a strength and a weakness […]. You can see something in a visualization that you did not expect and no computer program could have told you about. But because a human is involved in the loop, visualization fundamentally does not scale.

And so to me the complementary tool to visualization is modeling. I think of modeling very broadly. This is data mining, this is machine learning, this is statistical modeling. But basically, whenever you’ve made a question sufficiently precise, you can answer it with some numerical summary, summarize it with some algorithm, I think of this as a model. And models are fundamentally computational tools which means that they can scale much, much better. As your data gets bigger and bigger and bigger, you can keep up with that by using more sophisticated computation or simply just more computation.

But every model makes assumptions about the world and a model – by its very nature – cannot question those assumptions. So that means: on some fundamental level, a model cannot surprise you.

That definition excludes many economic models. I think of the insights of models such as Akerlof’s Lemons and Peaches, Schelling’s segregation model or the “true and non-trivial” theory of comparative advantage as surprising.

# Term spreads and business cycles 2: The role of monetary policy

In the first part of this series I showed that term spreads can be used to predict real GDP about a year out. This pattern comes about, because investors expect the central bank to lower short term interest rates.

But we don’t know what’s causing what. Is the central bank driving business cycles or is it just responding to a change in the economic environment?

This matters for how we interpret the pattern we found. Investors could either have expectations about the business cycle or about arbitrary decisions by the central bank.

The central bank’s main tool is changing at the interest rate at which banks can lend, the federal funds rate. In this post, I will look at how the Fed Funds rate comoves with the term spread and how the unexpected component in that rate (the “shock”) is related to it.

## I.

First, run all the codes from the previous post.

Get the Fed Funds rate and calculate how it changes between this month and the same month next year:

Make the same scatterplot as before:

Which gets:

So the pattern is still there. The term spreads drop a year before the Fed Fund rate falls.

## II.

Identifying plausible exogenous variation in monetary policy is the gold standard of monetary economics. A host of other ways have been proposed, but basically every course on empirical macroeconomics starts with the shock series by Romer and Romer (2004).1 This paper filters out the endogenous response of monetary policy with respect to the movement in other economic variables using a regression of the fed funds rate on variables that are important for the central bank’s decision, such as GDP, inflation and the unemployment rate.

I won’t reproduce their analysis here, but just take their shock series from the journal page. For this, we also need the following package to read Excel data:

The following codes go to the AER website, download the files into a temporary folder (so we don’t have to manually delete them again), unzip the the codes and extract the relevant part:

Plot the shock series:

Merge the rr dataframe with our previous fd dataset:

Make the plot:

Which creates:

Now the pattern is gone.

What I’m learning from this is that term spreads are informative about the endogenous component of monetary policy. Investors have sensible expectations about when the central bank will lower interest rates due to a slowing economic activity.

## References

Bernanke, B., J. Boivin and P. Eliasz (2005). “Measuring the Effects of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach”, Quarterly Journal of Economics. (link)

Christiano, L., M. Eichenbaum and C. Evans (1996). “The Effects of Monetary Policy Shocks: Some Evidence from the Flow of Funds”, Review of Economics and Statistics. (link)

Nakamura, E. and J. Steinsson (2018). “High-Frequency Identification of Monetary Non-Neutrality: The Information Effect”, Quarterly Journal of Economics. (link)

Romer, C. D. and D. H. Romer (2004). “A New Masure of Monetary Shocks: Derivation and Implications”, American Economic Review. (link)

Uhlig H. (2005). “What Are the Effects of Monetary Policy on Output? Results from an Agnostic Identification Procedure”, Journal of Monetary Economics. (link)

1. Some other approaches are: Orderings in a vector autoregression (VAR), sign restrictions, high frequency identification and factor-augmented VARs.

# Cosine similarity

A typical problem when analyzing large amounts of text is trying to measure the similarity of documents. An established measure for this is cosine similarity.

## I.

It’s the cosine of the angle between two vectors. Two vectors have a maximum cosine similarity of 1 if they are parallel and the lowest cosine similarity of 0 if they are perpendicular to each other.

Say you have two documents $$A$$ and $$B$$ . Write these documents as vectors $$\boldsymbol{x} = (x_{1}, x_{2}, ..., x_{n})'$$, where $$n$$ is the length of the pooled dictionary of all words that show up in either document. An entry $$x_i$$ is the number of occurences of a particular word in a document. Cosine similarity is then (Manning et al. 2008):

\begin{align} \text{sim}(\boldsymbol{x_A}, \boldsymbol{x_B}) &= \frac{\boldsymbol{x_A}' \cdot \boldsymbol{x_B}}{\lVert \boldsymbol{x_A} \rVert \cdot \lVert \boldsymbol{x_B} \rVert} \nonumber \\ &= \frac{\sum_{i=1}^n x_{i,A} \, x_{i,B}}{\sqrt{\sum_{i=1}^n x_{i,A}^2} \cdot \sqrt{\sum_{i=1}^n x_{i,B}^2}} \label{cosine_sim} \end{align}

Given that entries can only be positive, cosine similarity will always take positive values. The denominator normalizes document lengths and bounds values between 0 and 1.

Cosine similarity is equal to the usual (Pearson’s) correlation coefficient if we first demean the word vectors.

## II.

Consider a dictionary of three words. Let’s define (in Matlab) three documents that contain some of these words:

Calculate the correlation between these:

Which gets us:

Documents 1 and 2 have the lowest possible correlation while 2 and 3 and 1 and 3 are somewhat correlated.

Define a function for cosine similarity:

And calculate the values for our word vectors:

Which gets us:

Documents 1 and 2 again have the lowest possible similarity. The association between documents 2 and 3 is especially high, as both contain the third word in the dictionary which also happens to be of particular importance in document 3.

Demean the vectors and then run the same calculation:

Producing:

They’re indeed the same as the correlations.

## References

Manning, C. D., P. Raghavan and H. Schütze (2008). Introduction to Information Retrieval. Cambridge University Press. (link)

# Leonardo da Vinci

In Walter Isaacson’s new Leonardo da Vinci biography:

The more than 7,200 pages now extant probably represent about one-quarter of what Leonardo actually wrote, but that is a higher percentage after five hundred years than the percentage of Steve Jobs’s emails and digital documents from the 1990s that he and I were able to retrieve.

I also liked this:

Leonardo’s Vitruvian Man embodies a moment when art and science combined to allow mortal minds to probe timeless questions about who we are and how we fit into the grand order of the universe. It also symbolizes an ideal of humanism that celebrates the dignity, value, and rational agency of humans as individuals. Inside the square and the circle we can see the essence of Leonardo da Vinci, and the essence of ourselves, standing naked at the intersection of the earthly and the cosmic.

This is the first part of a series of posts on term spreads and business cycles. There'll probably be three parts.

The term spread is the return differential between a long-term and a short-term safe bond. We can use this to learn about how market participants expect the economy to perform over the next year or so.

My explanation loosely follows Stephen Cecchetti and Kermit Schoenholtz’s book “Money, Banking, and Financial Markets”.

## I.

Consider an investor who either buys a long-run bond running for two years and pays $$i_{2,t}$$ per year or he invests in two subsequent short-run bonds. If he chooses the second option, the bond pays $$i_{1,t}$$ in the first year and he expects to earn $$i^e_{1,t+1}$$ in the second year. If we neglect any risk, it’s plausible to assume that interests rates adjust such that the investor earns the same using either strategy:

$(1 + i_{2,t})(1 + i_{2,t}) = (1 + i_{1,t})(1 + i^e_{1,t+1}).$

When we evaluate this expression and omit all terms that multiply two rates (these will typically be small), we get:

$i_{2,t} = \frac{i_{1,t} + i^e_{1,t+1}}{2}$

We can extend this for any years n:

$i_{n,t} = \frac{i_{1,t} + i^e_{1,t+1} + \dots + i^e_{1,t+n-1}}{n}$

So long-term interest rates are composed of expectations about future short term interest rates.

Combine this with what the Cecchetti and Schoenholtz call the Liquidity Premium Theory. Returns that accrue farther into the future are more risky, as the bond issuer may be bankrupt and we don’t know what inflation will be. This means that rates on bonds with longer maturities are usually higher.

The authors add a factor $$rp_n$$ (the risk premium of a bond running $$n$$ years) to the original equation:

$i_{n,t} = rp_n + \frac{i_{1,t} + i^e_{1,t+1} + \dots + i^e_{1,t+n-1}}{n}$

As $$rp_n$$ is higher for greater $$n$$, interest rates will tend to be higher for longer maturities.

The term spread $$ts_{n,t}$$ is then

\begin{align} ts_{n,t} &= i_{n,t} - i_{1,t} \nonumber \\ &= rp_n - rp_1 + \frac{i_{1,t} + i^e_{1,t+1} + \dots + i^e_{1,t+n-1}}{n} + i_{1,t} \label{term_spreads} \end{align}

A positive term spread can mean two things. Either we expect the average future short-term interest rate to rise or the difference between the two risk premia ($$rp_n - rp_1$$) has increased. We would expect this difference to be positive anyway, but it might widen even more when inflation becomes more uncertain or debt becomes riskier. But disentangling the two explanations is difficult.

It’s more interesting when the term spread turns negative. The difference between the risk premia probably stays positive, so investors expect short-term interest rates to decrease.

Short-term interest rates are mostly under the control of the central bank, so this probably means that people expect monetary policy to loosen and that the central bank lends more liberally to banks.

And why would the central bank do that? That’s usually to avert a looming recession and buffer negative shocks. Given that the central bank also responds to changes in the economic environment, it’s not clear what’s causing what here.

But either way, when term spreads turn negative, investors expect bad things. This is why the term spread tells us something about investors’ expectations.

## II.

Let’s look at the empirical evidence for this, as argued and presented by the authors. They kindly provide Fred codes with all their plots, so they’re easy to reproduce. First get some packages in R:

Insert your FRED API key below:

Pull data on 10 year and 3 month treasury bill rates:

Plot the two series:

Which produces:

The series on 3-month Treasury bond yields starts in 1934 and the other series on 10-year yields starts in 1953. Both series peak in the early 80s when inflation ran high. As argued before, long-run interest rates are usually above short-run interest rates. The term spread is the difference between the two:

Things become more interesting when we compare the behavior of the term spread with real GDP growth rates:

Make a plot of the two series:

We get:

The term spread often falls before GDP growth does. When the term spread turns negative, recessions tend to happen. Compare the lagged term spread to GDP growth:

Which gets us:

This is exactly the relationship that makes people think of term spreads as a good predictor of GDP in the near future.

## III.

Cecchetti and Schoenholz summarize it like this (p.180):

[…] [I]nformation on the term structure – particularly the slope of the yield curve – helps us to forecast general economic conditions. Recall that according to the expectations hypothesis, long-term interest rates contain information about expected future short-term interest rates. And according to the liquidity premium theory, the yield curve usually slopes upward. The key statement is usually. On rare occasions, short-term interest rates exceed long-term yields. When they do, the term stucture is said to be inverted, and the yield curve slopes downward.

[…] Because the yield curve slopes upward even when short-term yields are expected to remain constant – it’s the average of expected future short-term interest rates plus a risk premium – an inverted yield curve signals an expected fall in short-term interest rates. […] When the yield curve slopes downward, it indicates that [monetary] policy is tight because policymakers are attempting to slow economic growth and inflation.

We still don’t know what’s causing what. Is the central bank the driver of business cycles or is it just responding to a change in the economic environment?

Stay tuned for the next installment in this series in which I’ll look at the role of monetary policy.

## References

Cecchetti, S. G. and K. L. Schoenholtz (2017). “Money, Banking, and Financial Markets”. 5th edition, McGraw-Hill Education. (link)

# Presentations at the AEA

I presented at this year’s AEA in a session on automation. Apart from the papers in that session, I also enjoyed the following:

• Lifecycle of Inventors
• Susan Athey presented “Online Intermediaries and the Consumption of Polarized and Inaccurate News During the 2016 Presidential Election” in this session. They tried to estimate what the political leanings of media were that people consumed during the 2016 Presidential Election. They document that most of the media is left of center. But the strongest result is that they show a lack of reliable right-wing media. She explained that if they hadn’t coded Fox News as at least moderately accurate, then there would be no such right-wing media outlet.
• Automation and the Workforce
• John Horton’s “Labor Market Equilibration: Evidence from Uber” (pdf).
• Shopping in Macroeconomics” (best session title!)
• Credit Booms, Aggregate Demand, and Financial Crises”, included a new paper by Matthew Baron, Emil Verner and Wei Xiong. They painstakingly digitized historical bank equity returns to create a new financial crisis indicator for 47 countries since 1800.

1. Google Maps’s Moat, by Justin O’Beirne (who runs one of the best blogs I know):

At the rate it’s going, how long until Google has every structure on Earth?

2. John Horton simulates the new Goldsmith-Pinkham, Sorkin and Swift paper on Bartik instruments.
3. What have I learned from this? First, everything takes time. Coming up with compelling research takes time. Data collection takes time. Data analysis takes time. Writing takes time. Peer review takes time. Rejection takes time. Recovery from rejection takes time. Responding to reviewers takes time. Typesetting takes time. Email takes time. Writing this blog post takes time. Everything takes time. It’s been eight years but that time has brought a publication I’m quite proud of.

4. Brian Hayes provides a readable introduction to net neutrality:

In round numbers, the web has something like a billion sites and four billion users—an extraordinarily close match of producers to consumers. […] Yet the ratio for the web is also misleading. Three fourths of those billion web sites have no content and no audience (they are “parked” domain names), and almost all the rest are tiny. Meanwhile, Facebook gets the attention of roughly half of the four billion web users. Google and Facebook together, along with their subsidiaries such as YouTube, account for 70 percent of all internet traffic.

5. 100 Jahre DIN - Ein Urgestein der deutschen Wirtschaft (in German)
6. Where did all my probability go?

# Growing and shrinking

Stephen Broadberry and John Joseph Wallis have written an interesting new paper (pdf). They argue that much of modern economic growth is due to not only a higher trend growth rate, but also due to fewer periods of shrinking.

It’s easy to reproduce some of their results using the nice maddison package.

Get some packages:

Let’s concentrate on some countries with good data coverage, keep only data since 1800 and calculate real GDP growth rates. We also add a column for decades, to be able to look at some statistics for those separately.

Define a data frame with annotations for the World Wars:

Check out data for some countries:

Growth rates wobble around a mean slightly larger than zero, as expected. There are some extreme events around the World Wars. But it’s not immediately apparent from looking at these figures if shrinking has become less. If anything, it’s macroeconomic volatility that has decreased for most of these countries.

Next, for each decade we calculate the number of periods that countries were growing and shrinking and the average growth rates in those periods.

We’re interested in how much growing and shrinking years contributed to overall growth in a decade. So this is just the absolute value of changes by either shrinking and growing years divided by the total variation.

When we plot those contributions, we get the following picture:

It looks as if the contributions of shrinking have clustered closer to zero since WW2.

Let’s check out the trends in the contribution of shrinking periods:

And plot the regression coefficients of the trend line:

The trend was negative for many countries and for some there was no significant trend. None of the trends was significantly positive. It certainly looks as if episodes of growing output have become more important than episodes of shrinking episodes.

The question remains whether this might not just be mechanically driven by the fact that a higher trend real GDP growth rate reduces the probability of the growth rate hitting zero. And a fall in macroeconomic volatility would also make shrinking episodes less likely.

## References

Broadberry, S. and J. J. Wallis (2017). “Growing, Shrinking, and Long Run Economic Performance: Historical Perspectives on Economic Development”, NBER Working Paper No. 23343. doi: 10.3386/w23343

1. John Coltrane – Alabama
2. Is AI Riding a One-Trick Pony?” (through Kaiser Fung)
3. British Brexit Secretary, David Davis:

The assessment of that effect … is not as straightforward as many people think. And I’m not a fan of economic models, because they have all proven wrong. (link)

4. Consumer Inflation Uncertainty
5. Why Are Some Good Old Ideas Buried in the History of Statistical Graphics?
6. Instead of rushing to Software 2.0, lets view neural networks in proper context: they are models, not magic.

7. Roman Cheplyaka: “Explained variance in PCA
8. Project-oriented workflow or how not to “SET YOUR COMPUTER ON FIRE 🔥.”
9. Cat Person”, by Kristen Roupenian in The New Yorker. About which was written:

This past weekend, the biggest story on social media was not about a powerful man who had sexually assaulted someone, or something the president said on Twitter. Charmingly, as if we were all at a Paris salon in the 1920s, everyone had an opinion about a short story.

10. Beautiful Testing (seen here)
11. Sam Altman thinks he can speek more freely in Beijing than in San Francisco. And Tyler Cowen has a good explanation.