Getting Started with the NYT API

March 1, 2022

Introduction

The web is chock full of valuable troves of data that can spawn an infinite number of social science research projects. However, not all data is easily accessible! While some data can be easily downloaded, access to some sources of data are dictated by what is known as an API. Standing for application programming interface, APIs are a set of defined protocols governing the terms of access to software and servers from programs created by third parties. Whenever you use your Gmail account as a way to sign in to another website or purchase an item on eBay with your PayPal account, you are using an API.

In the context of social science research, APIs are a valuable means to acquire critical data that is not directly available. For example, if you want to model the relationship between Twitter sentiment and the stock market, you’ll need access to APIs that provide Twitter and stock price data. Furthermore, you might even consider using an API to derive the sentiment scores. A project such as this one would be impossible to do without knowing how to retrieve data via APIs.

That’s why we here at the D-Lab have put together this handy tutorial for when you need to get your hands on data guarded by an API. In this post, we use the New York Times API as our example on how to acquire access to an API, query, and process data.

Acquiring Access

For most APIs, a key or other user credentials are required for any database querying. Generally, this requires that you register with the organization. Most APIs are set up for developers, so you'll likely be asked to register an "application". All this really entails is coming up with a name for your app/bot/project, and providing your real name, organization, and email. Note that some more popular APIs (e.g. Twitter, Facebook) will require additional information, such as a web address or mobile number.

Once you've successfully registered, you will be assigned one or more keys, tokens, or other credentials that must be supplied to the server as part of any API call you make. The generated keys provided to you are essentially a set of passcodes that grant you access to their servers. To make sure that users aren't abusing their data access privileges (e.g. by making many rapid queries), each set of keys will be given several rate limits governing the total number of calls that can be made over certain intervals of time. For the NYT Article API, we have relatively generous rate limits --- 10 calls per minute and 4,000 calls per day.

Before we can start playing with the NYT data we first need to follow their instructions on how to sign up for an account and generate access keys which can be found on the “Get Started” section of their developer website.

Once your account has been created, log into the “Get Started” page - if you are not already logged in. Then, register a new app (you can call it something like NYT Tutorial), and copy and save your keys to a secure file on your computer. You can enable one or more APIs that are available on the NYT developer website. To follow along, enable the Top Stories, Most Viewed/shared articles, and Article search APIs.

REMEMBER TO NEVER SHARE YOUR KEYS ACCIDENTALLY OR INTENTIONALLY

This line can never be repeated enough. You wouldn’t publish your password, the same logic should apply to your API keys. If you’re publishing your project to Github, make sure you remove your keys from your code and files. You wouldn’t want someone to use your keys and possibly get your access revoked.

All the Data That’s Fit to Query

Now that we’ve been given the green light to access the New York Times’ servers, let’s delve into what data goodies we’ve been given permission to query.

Here is the collection of the APIs the NYT gives us:

Top stories: Returns an array of articles currently on the specified section
Most viewed/shared articles: Provides services for getting the most popular articles on NYTimes.com based on emails, shares, or views.
Article search: Look up articles by keyword. You can refine your search using filters and facets.
Books: Provides information about book reviews and The New York Times Best Sellers lists.
Movie reviews: Search movie reviews by keyword and opening date and filter by Critics' Picks.
Times Wire: Get links and metadata for Times' articles as soon as they are published on NYTimes.com. The Times Newswire API provides an up-to-the-minute stream of published articles.
Tag query (TimesTags): Provide a string of characters and the service returns a ranked list of suggested terms.
Archive metadata: Returns an array of NYT articles for a given month, going back to 1851.

These descriptions alone should have you salivating at the possibilities of what you can do with this valuable data.

Establishing a Connection

We’ve been given access to their servers and we have an understanding of what we can retrieve. Now, it’s time to actually establish a connection and start pulling in data.

Our means to access the NYT’s server comes via a third-party Python package called PyNYTimes. This tool created by Micha den Heijer, makes querying the NYT API incredibly easy, much more so than if we were to write the code to directly access the API.

This means we need to install the library which can be done with either of the following bash/shell commands:

Linux/Mac users: pip install --upgrade pynytimes

Windows users: python -m pip install --upgrade pynytimes

Once the package is installed, open up a Jupyter Notebook or Python file and import the PyNYTimes library and connect to their servers with your API key.

from pynytimes import NYTAPI

api_key = "XXXXXXXXXXXXX"

nyt = NYTAPI(api_key, parse_dates=True)

Time to get querying!

API Tour

We’ll be taking a tour of three of the NYT’s APIs, with the first one being “Top Stories”. Reminder, you cannot query an NYT API if you did not enable it when you generated your API key.

Getting Started with the NYT API

Topics

George McIntire