-NAME conda install PACKAGE
Introduction
Welcome to A Practical Guide to Macroeconomic Data with Python! As the name implies, the pages that follow aim to equip the reader with tools, code and data to analyze the macroeconomy. By book’s end, we hope you, the reader, will have the tools that you need to hit the ground running as a macro data scientist. That is our North Star: to provide you with tools. We do not spend a lot of time on theory or big ideas in macroeconomics. There are many other excellent books on those topics that have been written in the last several hundred years. This book is focused on modern data science and practical tools for everyday use. We include many many data visualizations in this book, along with the details of how to build them. We hope these will serve as a launching pad for your own creativity. We believe strongly in power of visualization and after reading this book, you should have the tools you need to imagine, build and deliver very effective messages using data.
You will build a function to identify market highs and lows, bear and bull markets, you pull data from FRED, you will understand the Federal Reserve better, you will reproduce the DXY dollar index, you will build a yield curve… in short you will become an unstoppable force of nature! But you won’t learn a ton of macroeconomic theory and philosophy.
For Python coders
This book uses the Python programming language to analyze data. Why Python? There are many reasons—it is powerful, free, open-source, etc.—but the short answer is that it is easy to learn the basics and you can get a lot of mileage from Python. It is by now the most popular programming language and there are thousands of libraries available that greatly simplify our tasks, that is, we don’t have to reinvent the wheel whenever we need to do something.
Python is probably the most popular computer language for data science but it is not the only one by any means. The R language is also quite popular in data analysis (and it is free and open-source as well), this is why we have an alternate version of this book written in R. If one’s goal is to get multilingual, having access to both books will help, they should serve as a bridge between languages. Did we just recommend both of our books? Well, yes, if you are want ot build solid foundations in the two most popular languages for data science with an added economics flavor, then we found the two books translation quite helpful.
For all
This book is mainly about data wrangling and visualization. Some readers might wonder why we do not cover more advanced subjects such as modeling or machine learning. While advanced topics are important (and mighty interesting) we view this book as helping a macroeconomics analyst take her first steps into data science, which generally involve getting comfortable with data, making charts, and gathering intuition. In our experience, sophisticated techniques cannot replace good fundamentals.
Outline
This book is structured in the following way:
The first section of the book covers GDP and the stock market generally but it also serves as an introduction to Python. It is not a comprehensive introduction because it focuses just on the functions and code we need to get our job done, but we do explain each step. If you are an experienced Python coder, you might want to use that first chapter on GDP as a code reference. We do walk through how to import an ugly excel spreadsheet and make it tidy, which will almost surely confront anyone who wants to work in the financial industry, but it’s not exactly macroeconomics. The next chapter is on market data, and importing that data from a source that requires an API key, a common occurrence in data science. We also cover an intermediate difficulty use case around programmatically identifying bear markets (and then visualizing them). Even experienced programmers should find this of value as it can be applied across other financial use cases (our main goal in this book is to supply flexible code, that can be used elsewhere).
The second section focuses on interest rates, inflation, the Federal Reserve, the job and housing markets. The first two may be the most important macroeconomic forces of recent times (and of all times, some would argue), while the Federal Reserve has emerged as not just a regulator but also a crucial player in financial markets.
The third section we end with a dive into the Senior Loan Officer Survey, which gauges credit conditions, and its relationship to market returns. This serves as a capstone case where we tie together different concepts showed in the book to create a market signal.
To summarize, we cover:
- GDP
- Market returns
- Interest Rates
- Inflation
- The Federal Reserve
- Employment
- Housing
- The U.S. Dollar
- Credit conditions from the Senior Loan Officer Survey
In every topic we will take little side trips that show how to create shades in a time series (very useful for recessions), how to identify a bear market, how to build dual Y-axis charts, how to index to any date, how to reconstruct the dollar index. Those are our specific tasks, but all of them provide you with the paradigm to build those same tools but for different economic series or construct you own tools—it’s up to your creativity and that’s our goal. We want to unleash the creativity in all of us, that’s how we view data science.
A common thread in this book is data wrangling. We will slowly create a dataframe that holds variables we wish to track and analyze regularly. You, reader, will almost certainly wish to include different variables than what’s covered in this book. Our goal is to provide the tools to do so!
Who should read this book
This book is intended for people who want to do work on the macroeconomy using data, and want to use R and/or Python to do so. We have two version of the book, one with R code and one with Python code. They cover substantially the same material and show useful tricks in both languages. It is our intention that the two books serve as a bridge between the languages.
What this book will cover
This books covers a lot of data, code and charts. We build things from scratch and show you all of our code. We do not spend a lot of time on theory or delving deep into concepts. There are an abundance of macroeconomic theory books out there, and any of them can be used to complement this book.
We do not cover Machine Learning or Artificial Intelligence in this book, though we love those tools and use them frequently. This book is aimed at very practical code and tools that you can use on day 1 of a job in industry to find insights around the macroeconomy. It’s possible that on day 1 you’ll be asked to start running machine learning models but, in our experience, very unlikely. And even in the event that this occurs, it would be nice to be able to communicate what’s happening in the broader economy based on solid data, even if a machine learning model is telling us the sky is green.
On Reproducibility
A simple way to break down this goal is after you read this book, if you see an interesting macro chart we give you the tools to reproduce it. Why reproduce a chart created by somebody else? A few reasons:
It’s a great way to learn. Have you ever recited a poem from a famous author? Poetry is a line by line exercise and once you catch on to its rhythm you can start to substitute your own words.
It’s an easy way to connect with smart people. You can reach out to the original author of the chart and say great work, I reproduced it and have a few questions.
When you understand, you can discuss and critique.
Most importantly, you can create your own work based on this. That is original and important. You advance your understanding one step at a time.
For example, in recent remarks by Chairman of the Federal Reserve Jay Powell, he referenced several charts (see here), by the end of this book you should know how those charts are created and be able to make your own variations.
Python Preliminaries
If you are reading this book, it is likely that you already have Python installed in your computer and are familiar with it. However, for those readers just starting with Python, we give brief instructions on installation here. A major strength of Python is that it is suitable for many different tasks and applications, unfortunately this means that there is no unique installation of Python. Perhaps the easiest way to set up Python for data science is to use the Anaconda installation (or its lightweight sister Miniconda) which comes with the conda
package manager, which is a fantastic tool to download Python libraries and make sure they play nice with each other. To install Anaconda (or Miniconda) we suggest following the instructions in Conda’s website at https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html
This book uses Python version 3.12 throughout, the most recent version as of this writing, but more advanced versions of Python should only make our code run even faster. The true power of Python lies in the libraries (or packages) created by its user community, this way we don’t have to reinvent the wheel and we take advantage of the combined power of skilled programmers.
We use many packages throughout the book, but the two we consider our workhorses are pandas
and altair
. The former is considered the gold standard in Python for data wrangling, it is fast, powerful, and versatile. Important for beginners, pandas
is so popular that we can search online for any question and get an answer quickly.
We use the altair
library to create visualizations. There are other more popular alternatives in Python, such as matplotlib
or plotly
, but we like altair
because it follows a consistent grammar of graphics approach which breaks plots into discrete parts (data, marks, encodings, and properties) and it is great at layering, or combining, charts. In altair
we are able to create visualizations by writing high level code that declares our data and their relationship, rather than lower level plotting instructions. Our purpose in the book is to create great looking visualizations with relatively less code, and this is where altair
excels.
Below we list the packages we use in the book, and their version in parenthesis, with a one-line description of what they do. Keep in mind that all of these packages have online documentations and some, such as pandas
and altair
, also have very useful user guides that showcase the full power of the library.
pandas
(v. 2.2.2): data wranglingaltair
(v. 5.3.0): visualizationspandas-datareader
(v. 0.10.0): for pulling data from FRED and the World Bankyfinance
(v. 0.2.40) : for pulling data from Yahoo! Financenumpy
(v. 1.26.4): for mathematical operationsopenpyxl
(v. 3.1.2): for reading Excel filesenvirons
(v. 10.3.0): for parsing environment variables such as passwords and ids
We install these packages using the conda
manager with the following command
The conda
manager will make sure our packages can interact with each other without breaking our Python installation. An alternate way to install packages is using the pip
manager with the code below, however, this manager does not make sure our packages play nice with each other. Thus, our preferred method is to use conda
.
-NAME pip install PACKAGE