Getting Started with the NYT API

March 1, 2022
  1. Introduction

The web is chock full of valuable troves of data that can spawn an infinite number of social science research projects. However, not all data is easily accessible! While some data can be easily downloaded, access to some sources of data are dictated by what is known as an API. Standing for application programming interface, APIs are a set of defined protocols governing the terms of access to software and servers from programs created by third parties. Whenever you use your Gmail account as a way to sign in to another website or purchase an item on eBay with your PayPal account, you are using an API.

In the context of social science research, APIs are a valuable means to acquire critical data that is not directly available. For example, if you want to model the relationship between Twitter sentiment and the stock market, you’ll need access to APIs that provide Twitter and stock price data. Furthermore, you might even consider using an API to derive the sentiment scores. A project such as this one would be impossible to do without knowing how to retrieve data via APIs.

That’s why we here at the D-Lab have put together this handy tutorial for when you need to get your hands on data guarded by an API. In this post, we use the New York Times API as our example on how to acquire access to an API, query, and process data. 

  1. Acquiring Access

For most APIs, a key or other user credentials are required for any database querying.  Generally, this requires that you register with the organization.  Most APIs are set up for developers, so you'll likely be asked to register an "application".  All this really entails is coming up with a name for your app/bot/project, and providing your real name, organization, and email.  Note that some more popular APIs (e.g. Twitter, Facebook) will require additional information, such as a web address or mobile number.

Once you've successfully registered, you will be assigned one or more keys, tokens, or other credentials that must be supplied to the server as part of any API call you make. The generated keys provided to you are essentially a set of passcodes that grant you access to their servers.  To make sure that users aren't abusing their data access privileges (e.g. by making many rapid queries), each set of keys will be given several rate limits governing the total number of calls that can be made over certain intervals of time.  For the NYT Article API, we have relatively generous rate limits --- 10 calls per minute and 4,000 calls per day.

Before we can start playing with the NYT data we first need to follow their instructions on how to sign up for an account and generate access keys which can be found on the “Get Started” section of their developer website.

Once your account has been created, log into the “Get Started” page -  if you are not already logged in. Then, register a new app (you can call it something like NYT Tutorial), and copy and save your keys to a secure file on your computer. You can enable one or more APIs that are available on the NYT developer website. To follow along, enable the Top Stories, Most Viewed/shared articles, and Article search APIs.

REMEMBER TO NEVER SHARE YOUR KEYS ACCIDENTALLY OR INTENTIONALLY

This line can never be repeated enough. You wouldn’t publish your password, the same logic should apply to your API keys. If you’re publishing your project to Github, make sure you remove your keys from your code and files. You wouldn’t want someone to use your keys and possibly get your access revoked.

  1. All the Data That’s Fit to Query

Now that we’ve been given the green light to access the New York Times’ servers, let’s delve into what data goodies we’ve been given permission to query.

Here is the collection of the APIs the NYT gives us:

  • Top stories: Returns an array of articles currently on the specified section 

  • Most viewed/shared articles: Provides services for getting the most popular articles on NYTimes.com based on emails, shares, or views.

  • Article search: Look up articles by keyword. You can refine your search using filters and facets.

  • Books: Provides information about book reviews and The New York Times Best Sellers lists.

  • Movie reviews: Search movie reviews by keyword and opening date and filter by Critics' Picks.

  • Times Wire: Get links and metadata for Times' articles as soon as they are published on NYTimes.com. The Times Newswire API provides an up-to-the-minute stream of published articles.

  • Tag query (TimesTags): Provide a string of characters and the service returns a ranked list of suggested terms.

  • Archive metadata: Returns an array of NYT articles for a given month, going back to 1851.

These descriptions alone should have you salivating at the possibilities of what you can do with this valuable data.

  1. Establishing a Connection

We’ve been given access to their servers and we have an understanding of what we can retrieve. Now, it’s time to actually establish a connection and start pulling in data.

Our means to access the NYT’s server comes via a third-party Python package called PyNYTimes. This tool created by Micha den Heijer, makes querying the NYT API incredibly easy, much more so than if we were to write the code to directly access the API.

This means we need to install the library which can be done with either of the following bash/shell commands:

Linux/Mac users: pip install --upgrade pynytimes 

Windows users: python -m pip install --upgrade pynytimes

Once the package is installed, open up a Jupyter Notebook or Python file and import the PyNYTimes library and connect to their servers with your API key.

from pynytimes import NYTAPI

api_key = "XXXXXXXXXXXXX"

nyt = NYTAPI(api_key, parse_dates=True)

Time to get querying!

  1. API Tour

We’ll be taking a tour of three of the NYT’s APIs, with the first one being “Top Stories”. Reminder, you cannot query an NYT API if you did not enable it when you generated your API key.

Top Stories

In a single line of code we can download data on the trending stories from the Times’ home page.

top_stories = nyt.top_stories()

By simply call the top_stories method on nyt we got instant access to a list of dictionaries containing data on the trending stories of the data.

Let’s take a look at the actual data so we can understand the format it’s delivered.

#Grab the first data item in top_stories and view it

top_story = top_stories[0]

top_story

The screenshot above is an example of what one news article’s worth of data looks like at the time when I made the API call. This output is pretty typical of API calls in that you’ll usually see it delivered in dictionary form. 

Does anything about the data stand out to you? What bits of information could be useful to you and your research needs?

If we are interested in a specific section then we can pass tags such as “arts”, “politics” or “sports” to the section argument inside the top_stories method

# Grab the trendings sports stories

section = "sports"

top_sports_stories = nyt.top_stories(section=section)

Most Viewed/Shared Articles

With the next API, we’ll be pulling in data about the most popular articles based on views and shares.

#Retrieve the most viewed articles for today.

most_viewed_today = nyt.most_viewed()

Inside the most_viewed() method, there is an argument called days which allows you to retrieve the most viewed articles for the past day, week or month — it only accepts possible values: 1, 7, 30.

most_viewed_week = nyt.most_viewed(days=7)

most_viewed_month = nyt.most_viewed(days=30)

If we look at the data in most_viewed_today you’ll notice it’s different in its content from what we see before using top_stories().


The meaning of some data attributes, like "copyright" is obvious, whereas for others, like "per_facet", it is not.   This is why developers often provide a schema, which is essentially a guide to the data.

The schema for the Most Viewed Articles API can be found here and in the following table.

This table lays out what each attribute means and its datatype, all of which is crucial information for processing and analyzing data.

Most Shared

The most_shared method is similar to most_viewed except that has an argument called method which is used to pull the most shared articles based on emails or Facebook posts. Learn more about data pulled from this method from its schema,

#Grab the most shared email and facebook articles for the past month

email = nyt.most_shared(days = 30, method = 'email')

facebook = nyt.most_shared(days = 30, method = 'facebook')

Article Search

Time to take it up a notch and use the NYT Article search API to retrieve a set of articles about a particular topic in a chosen period of time.

articles = nyt.article_search(query = "Elon Musk", results = 20)

#Assign the data in the first item of articles to a variable

article = articles[0].copy()

#Recommended deleting the multimedia key to reduce clutter in the data

del article["multimedia"]

The above line retrieves the twenty most recent articles featuring Elon Musk, a snippet of which is shown below.

Again this data appears to be quite different from what we’ve seen before and that’s why it’s recommended to consult the relevant schemas. The NYT not only provides a schema for this particular API but for sections of the data stored in it.

The following list of schemas hold the details on what things such as keyword and byline mean in the context of this data dump.

Let's try this operation again but this time for a specific time period. For example, how would we retrieve all the articles about the first two months of the George Floyd protests?

We need to pass a dictionary to the dates argument which contains keys named "begin" and "end". Those two keys point to datetime objects that we'll use as time markers to limit the returned results.

We're also going to use the “options” key to filter and sort our results.

#Import datetime function from datetime library which allows us to create a datetime object

from datetime import datetime

#Set up start and end date objects

begin = datetime(2020, 5, 23)

end = datetime(2020, 7, 23)

#Create dictionary containing dates data

date_dict = {"begin":begin, "end":end}

#Create options dictionary

options_dict = {

#Sort from earliest to latest"sort": "oldest",

#Return only articles from New York Times, filters out other sources such as AP and Reuters"sources": [

"New York Times"        ],

#Return only straight-forward news in the form of articles

        "type_of_material": [

            "News Analysis", "News", "Article"

        ]    }

articles = nyt.article_search(

    query = "George Floyd protest",

    results = 100,

    dates =date_dict,

    options = options_dict)

If you are following along, try running that code. Explore the returned results by changing the date filters.

Next Steps

While all APIs are different in terms of how they’re constructed and how they output their data, this tutorial should give you a good sense of what to expect and how to get started. We definitely encourage you to further explore the New York Times API or test out some other ones such as Twitter and Reddit

But always remember these four things when it comes to acquiring data from the web.

  1. If there’s an API, use that instead of web scraping - the process of writing code to automate copying and pasting information from a web page.

  2. Be aware of your rate-limits. Budget your calls accordingly, you don’t want to hit your limit halfway grabbing your data.

  3. Most APIs, including those from the NYT, have a Terms of Service (or Terms of Use) page on their developer website. Please review that carefully and use the APIs accordingly. 
  4. And of course SAFELY SECURE AND NEVER SHARE YOUR API KEYS!

To get the most out of an API, you will need to know how to manipulate the returned data in the programming language of your choice. In this tutorial, we use Python, so the ability to manipulate Python dictionaries is essential to using the API data effectively. You may also want to learn about and apply computational methods like text analysis, data visualization, and machine learning to explore and analyze the data.

The code featured in this blog post can be found on the D-Lab’s Web APIs and Web Scraping Github repo. We encourage you to register for our upcoming March 7th workshop on retrieving data from the web using Python. You can also subscribe to the D-Lab's newsletter to stay abreast of future iterations of this workshop and others that can help you manipulate and explore API data.