install.packages("devtools")
::install_github("mkearney/nytimes") devtools
Data acquisition: APIs
Learning goals
After this lesson, you should be able to:
- Explain what an API is
- Set up an API key for a public API
- Develop comfort in using a wrapper package or URL-method of calling a web API
- Recognize the structure in a URL for a web API and adjust for your purposes
- Explore and subset complex nested lists
You can download a template Quarto file to start from here. Save this template within the following directory structure:
your_course_folder
apis
code
10-apis.qmd
APIs
In this lesson you’ll learn how to collect data from websites such as The New York Times, Zillow, and Google. While these sites are primarily known for the information they provide to humans browsing the web, they (along with most large websites) also provide information to computer programs.
API stands for Application Programming Interface, and this term describes a general class of tool that allows computers, rather than humans, interact with an organization’s data.
- Humans use browsers such as Firefox or Chrome to navigate the web. Behind the scenes, our browsers communicate with web servers using a technology called HTTP or Hypertext Transfer Protocol to get information that is formatted into the display of a web page.
- Programming languages such as R can also use HTTP to communicate with web servers. We’ll see next time how we can use R to “scrape” data from almost any static web page. However, it’s easiest to interact with websites that are specifically designed to communicate with programs. These Web APIs, or Web Application Programming Interfaces, focus on transmitting raw data, rather than images, colors, or other appearance-related information that humans interact with when viewing a web page.
A large variety of web APIs provide data accessible to programs written in R (and almost any other programming language!). Almost all reasonably large commercial websites offer APIs. Todd Motto has compiled an excellent list of Public Web APIs on GitHub. Browse the list to see what kind of information is available.
Wrapper packages
Extra resources:
- NY Times API
- NY Times Blog post announcing the API
- Working with the NY Times API in
R
- nytimes pacakge for accessing the NY Times’ APIs from
R
- Video showing how to use the NY Times API
- rOpenSci has a good collection of wrapper packages
In R, it is easiest to use Web APIs through a wrapper package, an R package written specifically for a particular Web API. The R development community has already contributed wrapper packages for most large Web APIs. To find a wrapper package, search the web for “R Package” and the name of the website. For example, a search for “R Reddit Package” returns RedditExtractor and a search for “R Weather.com Package” surfaces weatherData.
This activity will build on the New York Times Web API, which provides access to news articles, movie reviews, book reviews, and many other data. Our activity will specifically focus on the Article Search API, which finds information about news articles that contain a particular word or phrase.
We will use the nytimes wrapper package that provides functions for some (but not all) of the NYTimes APIs. You can install the package by running the following in the Console:
Next, take a look at the Article Search API example on the package website to get a sense of the syntax.
Exercise: What do you think the nyt_search()
function below does? How does it communicate with the NY Times? Where is the data about articles stored?
<- nyt_search(q = "gamergate", n = 20, end_date = "20150101") res
To get started with the NY Times API, you must register and get an authentication key. Signup only takes a few seconds, and it lets the New York Times make sure nobody abuses their API for commercial purposes. It also rate limits their API and ensures programs don’t make too many requests per day. For the NY Times API, this limit is 1000 calls per day. Be aware that most APIs do have rate limits — especially for their free tiers.
Once you have signed up, verified your email, log back in to https://developer.nytimes.com. Under your email address, click on Apps and Create a new App (call it First API) and enable Article Search API, then press Save. This creates an authentication key, which is a 32 digit string with numbers and the letters a-e.
Store this in a variable as follows (this is just an example ID, not an actual one):
# Change value to your personal API key
<- "c935b213b2dc1218050eec976283dbbd" times_key
Now, let’s use the key to issue our first API call. We’ll adapt the code we see in the vignette to do what we need.
library(nytimes)
# Tell nytimes what our API key is
Sys.setenv(NYTIMES_KEY = times_key)
# Issue our first API call
<- nyt_search(q = "gamergate", n = 20, end_date = "20150101")
res
# Convert response object to data frame
<- as.data.frame(res) res
Something magical just happened. Your computer sent a message to the New York Times and asked for information about 20 articles about Gamergate starting at January 1, 2015 and going backwards in time. Thousands of public Web APIs allow your computer to tap into almost any piece of public digital information on the web.
Let’s take a peek at the structure of the results:
colnames(res)
[1] "id" "abstract" "byline" "document_type"
[5] "headline" "keywords" "lead_paragraph" "multimedia"
[9] "news_desk" "print_page" "print_section" "pub_date"
[13] "section_name" "snippet" "source" "subsection_name"
[17] "type_of_material" "uri" "web_url" "word_count"
head(res)
id
1 nyt://article/08765e5b-8d12-54dd-be58-39c6d33125c1
2 nyt://article/5e97a537-4e5d-51b4-9571-736913a6e5c4
3 nyt://article/ebad4be5-8e52-5490-a3bb-f5d8f684b902
4 nyt://interactive/26986d5d-2854-5484-86b8-04dbbfea0b27
5 nyt://article/fe201e9b-ea3b-5e9c-bb5a-ef4b0bbd30c4
6 nyt://article/7574b532-e51f-5695-80aa-a5b24ee4e7a2
abstract
1 A service lets a person monitor his or her Facebook or Twitter account, for more awareness of one’s online image.
2 Get recommendations from New York Times reporters and editors, highlighting great stories from around the web. Today, great reads from Dean Baquet, Susan Chira and others.
3 The answers for our ninth annual Op-Ed quiz.
4 Highlights from the year, as chosen by the editors of The New York Times.
5 The women’s-magazine editor on the Internet, feminism and reading the comments.
6 Social media companies are often reluctant to become arbiters of appropriate and inappropriate speech online.
byline document_type
1 By Farhad Manjoo article
2 By The New York Times article
3 By Ben Schott article
4 <NA> multimedia
5 Interview by Susan Dominus article
6 By Jenna Wortham article
headline
1 ThinkUp Helps the Social Network User See the Online Self
2 What We’re Reading: Great Times Reads of 2014 From Our Top Editors
3 2014: The Year in Questions – Quiz Answers
4 The Best of 2014
5 Jane Pratt on Why Writing for Young Women Never Gets Old
6 Trying to Swim in a Sea of Social Media Invective
keywords
1 Social Media+ThinkUp LLC+Facebook Inc+Twitter+Dash, Anil+Trapani, Gina
2 News and News Media+Newspapers+Baquet, Dean+Barry, Ellen+Chira, Susan+Duenes, Steve+Fisher, Ian+Lacey, Marc+Slackman, Michael+New York Times+AFRICA+Delhi (India)+Iraq
3 <NA>
4 <NA>
5 Pratt, Jane+xoJane.com+Women and Girls+Magazines
6 Computers and the Internet+Cyberharassment+Social Media+Facebook Inc+Twitter+Yik Yak Inc
lead_paragraph
1 Anil Dash, a longtime tech entrepreneur and blogger, was recently studying a list of the top words he had used on Twitter over the course of a month during the fall. Mr. Dash has half a million followers on Twitter, and like a lot of people in tech and media circles, he uses the social network to chat with colleagues, to pontificate about technology, politics and pop culture, and to participate in a lot of in-jokes.
2 Get recommendations from New York Times reporters and editors, highlighting great stories from around the web. What We’re Reading emails are sent twice a week. Sign up »
3 1. B – Uber
4 Highlights from the year, as chosen by the editors of The New York Times.
5 The editor talks with Susan Dominus about navigating the pitfalls of online publishing.
6 Over the last few months, I’ve watched friends and colleagues endure endless harassment on Twitter. Strangers have hurled offensive, racist names and gendered insults, relentlessly and with little fear of consequence. I’ve come across blog posts that capture similarly awful experiences.
multimedia
1 images/2015/01/01/technology/personaltech/01state-illo/01state-illo-thumbWide.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoThumb.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-mediumThreeByTwo210.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-mediumThreeByTwo225.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-mediumThreeByTwo440.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-facebookJumbo.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-miniMoth.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-articleLarge.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-blog480.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-blog427.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-tmagArticle.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-jumbo.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-blog225.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-master180.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-popup.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-blog533.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-tmagSF.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-slide.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-superJumbo.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-master495.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-master315.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-square320.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-filmstrip.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-moth.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-mediumSquare149.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-articleInline.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-hpSmall.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-blogSmallInline.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-mediumFlexible177.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-sfSpan.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-largeHorizontal375.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-hpLarge.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-largeWidescreen573.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSmall.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoHpMedium.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine600.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine540.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine495.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine390.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine480.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine310.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine225.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine96.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-videoSixteenByNine150.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-thumbStandard.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-thumbLarge.jpg+images/2015/01/01/technology/personaltech/01state-illo/01state-illo-blogSmallThumb.jpg
2 images/2014/12/31/business/wwrn-ebola/wwrn-ebola-thumbWide.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoThumb.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoLarge.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-mediumThreeByTwo210.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-mediumThreeByTwo225.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-mediumThreeByTwo440.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-facebookJumbo.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoFifteenBySeven1305.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoFifteenBySeven2610.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-miniMoth.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-windowsTile336H.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-articleLarge.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-blog480.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-blog427.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-tmagArticle.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-jumbo.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-blog225.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-master675.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-master180.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-popup.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-blog533.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-tmagSF.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-slide.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-superJumbo.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-master1050.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-master495.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-master315.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-square320.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-filmstrip.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-square640.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-moth.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-mediumSquare149.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-articleInline.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-hpSmall.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-blogSmallInline.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-mediumFlexible177.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-sfSpan.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-largeHorizontal375.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-hpLarge.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-largeWidescreen573.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-largeWidescreen1050.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSmall.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoHpMedium.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine600.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine540.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine495.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine390.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine480.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine310.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine225.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine96.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine768.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine150.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-videoSixteenByNine1050.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-thumbStandard.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-thumbLarge.jpg+images/2014/12/31/business/wwrn-ebola/wwrn-ebola-blogSmallThumb.jpg
3 images/2014/12/30/opinion/30schott/30schott-thumbWide.jpg+images/2014/12/30/opinion/30schott/30schott-videoThumb.jpg+images/2014/12/30/opinion/30schott/30schott-videoLarge.jpg+images/2014/12/30/opinion/30schott/30schott-mediumThreeByTwo210.jpg+images/2014/12/30/opinion/30schott/30schott-mediumThreeByTwo225.jpg+images/2014/12/30/opinion/30schott/30schott-mediumThreeByTwo440.jpg+images/2014/12/30/opinion/30schott/30schott-facebookJumbo.jpg+images/2014/12/30/opinion/30schott/30schott-videoFifteenBySeven1305.jpg+images/2014/12/30/opinion/30schott/30schott-miniMoth.jpg+images/2014/12/30/opinion/30schott/30schott-windowsTile336H.jpg+images/2014/12/30/opinion/30schott/30schott-articleLarge.jpg+images/2014/12/30/opinion/30schott/30schott-blog480.jpg+images/2014/12/30/opinion/30schott/30schott-blog427.jpg+images/2014/12/30/opinion/30schott/30schott-tmagArticle.jpg+images/2014/12/30/opinion/30schott/30schott-jumbo.jpg+images/2014/12/30/opinion/30schott/30schott-blog225.jpg+images/2014/12/30/opinion/30schott/30schott-master675.jpg+images/2014/12/30/opinion/30schott/30schott-master180.jpg+images/2014/12/30/opinion/30schott/30schott-popup.jpg+images/2014/12/30/opinion/30schott/30schott-blog533.jpg+images/2014/12/30/opinion/30schott/30schott-tmagSF.jpg+images/2014/12/30/opinion/30schott/30schott-slide.jpg+images/2014/12/30/opinion/30schott/30schott-superJumbo.jpg+images/2014/12/30/opinion/30schott/30schott-master1050.jpg+images/2014/12/30/opinion/30schott/30schott-master495.jpg+images/2014/12/30/opinion/30schott/30schott-master315.jpg+images/2014/12/30/opinion/30schott/30schott-square320.jpg+images/2014/12/30/opinion/30schott/30schott-filmstrip.jpg+images/2014/12/30/opinion/30schott/30schott-square640.jpg+images/2014/12/30/opinion/30schott/30schott-moth.jpg+images/2014/12/30/opinion/30schott/30schott-mediumSquare149.jpg+images/2014/12/30/opinion/30schott/30schott-articleInline.jpg+images/2014/12/30/opinion/30schott/30schott-hpSmall.jpg+images/2014/12/30/opinion/30schott/30schott-blogSmallInline.jpg+images/2014/12/30/opinion/30schott/30schott-mediumFlexible177.jpg+images/2014/12/30/opinion/30schott/30schott-sfSpan.jpg+images/2014/12/30/opinion/30schott/30schott-largeHorizontal375.jpg+images/2014/12/30/opinion/30schott/30schott-hpLarge.jpg+images/2014/12/30/opinion/30schott/30schott-largeWidescreen573.jpg+images/2014/12/30/opinion/30schott/30schott-largeWidescreen1050.jpg+images/2014/12/30/opinion/30schott/30schott-videoSmall.jpg+images/2014/12/30/opinion/30schott/30schott-videoHpMedium.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine600.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine540.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine495.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine390.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine480.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine310.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine225.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine96.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine768.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine150.jpg+images/2014/12/30/opinion/30schott/30schott-videoSixteenByNine1050.jpg+images/2014/12/30/opinion/30schott/30schott-thumbStandard.jpg+images/2014/12/30/opinion/30schott/30schott-thumbLarge.jpg+images/2014/12/30/opinion/30schott/30schott-blogSmallThumb.jpg
4 images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-thumbWide.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoThumb.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoLarge.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-mediumThreeByTwo210.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-mediumThreeByTwo225.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-mediumThreeByTwo440.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-facebookJumbo.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-miniMoth.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-windowsTile336H.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-articleLarge.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-blog480.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-blog427.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-tmagArticle.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-jumbo.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-blog225.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-master675.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-master180.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-popup.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-blog533.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-tmagSF.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-slide.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-superJumbo.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-master495.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-master315.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-square320.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-filmstrip.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-square640.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-moth.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-mediumSquare149.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-articleInline.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-hpSmall.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-blogSmallInline.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-mediumFlexible177.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-sfSpan.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-largeHorizontal375.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-hpLarge.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-largeWidescreen573.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSmall.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoHpMedium.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine600.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine540.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine495.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine390.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine480.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine310.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine225.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine96.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine768.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-videoSixteenByNine150.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-thumbStandard.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-thumbLarge.jpg+images/2014/12/22/multimedia/the-best-of-2014-1419258019643/the-best-of-2014-1419258019643-blogSmallThumb.jpg
5 images/2014/12/21/magazine/21talk/21talk-videoSmall-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoHpMedium-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine600-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine540-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine495-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine390-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine480-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine310-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine225-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine96-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine768-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine150-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoSixteenByNine1050-v3.jpg+images/2014/12/21/magazine/21talk/21talk-hpLarge-v3.jpg+images/2014/12/21/magazine/21talk/21talk-largeWidescreen573-v3.jpg+images/2014/12/21/magazine/21talk/21talk-largeWidescreen1050-v3.jpg+images/2014/12/21/magazine/21talk/21talk-articleLarge-v2.jpg+images/2014/12/21/magazine/21talk/21talk-blog480-v2.jpg+images/2014/12/21/magazine/21talk/21talk-blog427-v2.jpg+images/2014/12/21/magazine/21talk/21talk-tmagArticle-v2.jpg+images/2014/12/21/magazine/21talk/21talk-jumbo-v2.jpg+images/2014/12/21/magazine/21talk/21talk-blog225-v2.jpg+images/2014/12/21/magazine/21talk/21talk-master675-v2.jpg+images/2014/12/21/magazine/21talk/21talk-master180-v2.jpg+images/2014/12/21/magazine/21talk/21talk-popup-v2.jpg+images/2014/12/21/magazine/21talk/21talk-blog533-v2.jpg+images/2014/12/21/magazine/21talk/21talk-tmagSF-v2.jpg+images/2014/12/21/magazine/21talk/21talk-slide-v2.jpg+images/2014/12/21/magazine/21talk/21talk-superJumbo-v2.jpg+images/2014/12/21/magazine/21talk/21talk-master1050-v2.jpg+images/2014/12/21/magazine/21talk/21talk-master495-v2.jpg+images/2014/12/21/magazine/21talk/21talk-master315-v2.jpg+images/2014/12/21/magazine/21talk/21talk-miniMoth-v3.jpg+images/2014/12/21/magazine/21talk/21talk-windowsTile336H-v3.jpg+images/2014/12/21/magazine/21talk/21talk-sfSpan.jpg+images/2014/12/21/magazine/21talk/21talk-largeHorizontal375.jpg+images/2014/12/21/magazine/21talk/21talk-verticalTwoByThree735-v2.jpg+images/2014/12/21/magazine/21talk/21talk-square320.jpg+images/2014/12/21/magazine/21talk/21talk-filmstrip.jpg+images/2014/12/21/magazine/21talk/21talk-square640.jpg+images/2014/12/21/magazine/21talk/21talk-moth.jpg+images/2014/12/21/magazine/21talk/21talk-mediumSquare149.jpg+images/2014/12/21/magazine/21talk/21talk-articleInline-v2.jpg+images/2014/12/21/magazine/21talk/21talk-hpSmall-v2.jpg+images/2014/12/21/magazine/21talk/21talk-blogSmallInline-v2.jpg+images/2014/12/21/magazine/21talk/21talk-mediumFlexible177-v2.jpg+images/2014/12/21/magazine/21talk/21talk-facebookJumbo-v2.jpg+images/2014/12/21/magazine/21talk/21talk-thumbStandard.jpg+images/2014/12/21/magazine/21talk/21talk-thumbLarge.jpg+images/2014/12/21/magazine/21talk/21talk-blogSmallThumb.jpg+images/2014/12/21/magazine/21talk/21talk-thumbWide-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoThumb-v3.jpg+images/2014/12/21/magazine/21talk/21talk-videoLarge-v3.jpg+images/2014/12/21/magazine/21talk/21talk-mediumThreeByTwo210-v3.jpg+images/2014/12/21/magazine/21talk/21talk-mediumThreeByTwo225-v3.jpg+images/2014/12/21/magazine/21talk/21talk-mediumThreeByTwo440-v3.jpg+images/2014/12/21/magazine/21talk/21talk-mediumThreeByTwo252-v2.jpg+images/2014/12/21/magazine/21talk/21talk-mediumThreeByTwo378-v2.jpg+images/2014/12/21/magazine/21talk/21talk-watch308-v2.jpg+images/2014/12/21/magazine/21talk/21talk-watch268-v2.jpg
6 images/2014/12/14/business/14-BITS/14-BITS-thumbWide.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoThumb.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoLarge.jpg+images/2014/12/14/business/14-BITS/14-BITS-mediumThreeByTwo210.jpg+images/2014/12/14/business/14-BITS/14-BITS-mediumThreeByTwo225.jpg+images/2014/12/14/business/14-BITS/14-BITS-mediumThreeByTwo440.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoFifteenBySeven1305.jpg+images/2014/12/14/business/14-BITS/14-BITS-miniMoth.jpg+images/2014/12/14/business/14-BITS/14-BITS-windowsTile336H.jpg+images/2014/12/14/business/14-BITS/14-BITS-articleLarge.jpg+images/2014/12/14/business/14-BITS/14-BITS-blog480.jpg+images/2014/12/14/business/14-BITS/14-BITS-blog427.jpg+images/2014/12/14/business/14-BITS/14-BITS-tmagArticle.jpg+images/2014/12/14/business/14-BITS/14-BITS-jumbo.jpg+images/2014/12/14/business/14-BITS/14-BITS-blog225.jpg+images/2014/12/14/business/14-BITS/14-BITS-master675.jpg+images/2014/12/14/business/14-BITS/14-BITS-master180.jpg+images/2014/12/14/business/14-BITS/14-BITS-popup.jpg+images/2014/12/14/business/14-BITS/14-BITS-blog533.jpg+images/2014/12/14/business/14-BITS/14-BITS-tmagSF.jpg+images/2014/12/14/business/14-BITS/14-BITS-slide.jpg+images/2014/12/14/business/14-BITS/14-BITS-superJumbo.jpg+images/2014/12/14/business/14-BITS/14-BITS-master1050.jpg+images/2014/12/14/business/14-BITS/14-BITS-master495.jpg+images/2014/12/14/business/14-BITS/14-BITS-master315.jpg+images/2014/12/14/business/14-BITS/14-BITS-square320.jpg+images/2014/12/14/business/14-BITS/14-BITS-filmstrip.jpg+images/2014/12/14/business/14-BITS/14-BITS-square640.jpg+images/2014/12/14/business/14-BITS/14-BITS-moth.jpg+images/2014/12/14/business/14-BITS/14-BITS-mediumSquare149.jpg+images/2014/12/14/business/14-BITS/14-BITS-articleInline.jpg+images/2014/12/14/business/14-BITS/14-BITS-hpSmall.jpg+images/2014/12/14/business/14-BITS/14-BITS-blogSmallInline.jpg+images/2014/12/14/business/14-BITS/14-BITS-mediumFlexible177.jpg+images/2014/12/14/business/14-BITS/14-BITS-sfSpan.jpg+images/2014/12/14/business/14-BITS/14-BITS-largeHorizontal375.jpg+images/2014/12/14/business/14-BITS/14-BITS-hpLarge.jpg+images/2014/12/14/business/14-BITS/14-BITS-largeWidescreen573.jpg+images/2014/12/14/business/14-BITS/14-BITS-largeWidescreen1050.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSmall.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoHpMedium.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine600.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine540.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine495.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine390.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine480.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine310.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine225.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine96.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine768.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine150.jpg+images/2014/12/14/business/14-BITS/14-BITS-videoSixteenByNine1050.jpg+images/2014/12/14/business/14-BITS/14-BITS-thumbStandard.jpg+images/2014/12/14/business/14-BITS/14-BITS-thumbLarge.jpg+images/2014/12/14/business/14-BITS/14-BITS-blogSmallThumb.jpg
news_desk print_page print_section pub_date
1 Business 1 B 2014-12-31 22:31:54
2 <NA> <NA> 2014-12-30 22:06:27
3 OpEd 19 A 2014-12-29 23:20:06
4 Multimedia/Photos <NA> <NA> 2014-12-22 18:25:37
5 Magazine 14 MM 2014-12-19 15:24:37
6 Business 4 BU 2014-12-13 17:37:26
section_name
1 Technology
2 Blogs
3 Opinion
4 Multimedia/Photos
5 Magazine
6 Technology
snippet
1 A service lets a person monitor his or her Facebook or Twitter account, for more awareness of one’s online image.
2 Get recommendations from New York Times reporters and editors, highlighting great stories from around the web. Today, great reads from Dean Baquet, Susan Chira and others.
3 The answers for our ninth annual Op-Ed quiz.
4 Highlights from the year, as chosen by the editors of The New York Times.
5 The women’s-magazine editor on the Internet, feminism and reading the comments.
6 Social media companies are often reluctant to become arbiters of appropriate and inappropriate speech online.
source subsection_name type_of_material
1 The New York Times Personal Tech News
2 The New York Times <NA> News
3 The New York Times <NA> Op-Ed
4 The New York Times <NA> Interactive Feature
5 The New York Times <NA> News
6 The New York Times <NA> News
uri
1 nyt://article/08765e5b-8d12-54dd-be58-39c6d33125c1
2 nyt://article/5e97a537-4e5d-51b4-9571-736913a6e5c4
3 nyt://article/ebad4be5-8e52-5490-a3bb-f5d8f684b902
4 nyt://interactive/26986d5d-2854-5484-86b8-04dbbfea0b27
5 nyt://article/fe201e9b-ea3b-5e9c-bb5a-ef4b0bbd30c4
6 nyt://article/7574b532-e51f-5695-80aa-a5b24ee4e7a2
web_url
1 https://www.nytimes.com/2015/01/01/technology/personaltech/thinkup-helps-the-social-network-user-see-the-online-self.html
2 https://news.blogs.nytimes.com/2014/12/30/what-were-reading-great-times-reads-of-2014-from-our-top-editors/
3 https://www.nytimes.com/2014/12/30/opinion/2014-the-year-in-questions-quiz-answers.html
4 https://www.nytimes.com/interactive/2014/multimedia/the-best-of-2014.html
5 https://www.nytimes.com/2014/12/21/magazine/jane-pratt-on-why-writing-for-young-women-never-gets-old.html
6 https://bits.blogs.nytimes.com/2014/12/13/trying-to-swim-in-a-sea-of-social-media-invective/
word_count
1 1211
2 429
3 588
4 0
5 716
6 1236
Accessing web APIs directly
Wrapper packages such as nytimes
provide a convenient way to interact with Web APIs. However, many Web APIs have incomplete wrapper packages, or no wrapper package at all. Fortunately, most Web APIs share a common structure that R
can access relatively easily. There are two parts to each Web API:
- The request: this amounts to calling a function that gets sent to a web server
- In our
nyt_search(q = "gamergate", n = 20, end_date = "20150101")
example, theq
,n
, andend_date
are arguments to an article search function.
- In our
- The response: the web server computes the result to the function call and returns the response
- The web server uses runs a search with the
q
,n
, andend_date
arguments to get the search results.
- The web server uses runs a search with the
As mentioned earlier, a Web API call differs from a regular function call in that the request is sent over the Internet to a web server, which performs the computation and calculates the return result, which is sent back over the Internet to the original computer.
Web API requests
For a deeper dive, consult the following readings:
The request for a Web API call is usually encoded through the URL (short for uniform resource locator), the web address associated with the API’s web server. Let’s look at the URL associated with the first nytimes
nyt_search
example we did. Open the following URL in your browser (you should replace MY_KEY
with the API key you were given earlier).
http://api.nytimes.com/svc/search/v2/articlesearch.json?q=gamergate&api-key=MY_KEY
The text you see in the browser is the response data. We’ll talk more about that in a bit. Right now, let’s focus on the structure of the URL. You can see that it has a few parts:
http://
— The scheme, which tells your browser or program how to communicate with the web server. This will typically be eitherhttp:
orhttps:
.api.nytimes.com
— The hostname, which is a name that identifies the web server that will process the request./svc/search/v2/articlesearch.json
— The path, which tells the web server what function you would like to call.?q=gamergate&api-key=MY_KEY
— The query parameters, which provide the parameters for the function you would like to call. Note that the query can be thought of as a table, where each row has a key and a value (known as a key-value pair). In this case, the first row has keyq
and valuegamergate
and the second row has valueMY_KEY
. The query parameters are preceded by a?
. Rows in the key-value table are separated by&
, and individual key-value pairs are separated by an=
.
key | value |
---|---|
q | gamergate |
api-key | MY_KEY |
Typically, each of these URL components will be specified in the API documentation. Sometimes, the scheme, hostname, and path (http://api.nytimes.com/svc/search/v2/articlesearch.json
) will be referred to as the endpoint for the API call.
We will use the urltools
package to build up a full URL from its parts. Start by creating a string with the endpoint. Then add the parameters one by one using param_set
and url_encode
:
library(urltools)
<- "http://api.nytimes.com/svc/search/v2/articlesearch.json"
url <- param_set(url, "q", url_encode("gamergate"))
url <- param_set(url, "api-key", url_encode(times_key))
url url
[1] "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=gamergate&api-key=xy9oy1eczTOTGAFjAfnrmZJO2mpSPvXQ"
Copy and paste the resulting URL into your browser to see what the NY Times response looks like!
You may be wondering why we need to use param_set()
and url_encode()
instead of writing the full URL by hand. The following exercise will illustrate why we need to be careful.
Pair programming exercise: Work through the two exercises below in pairs (or triples as needed). Whoever has visited more countries in their lifetime will be driver first.
Exercise: Write a function that generalizes our URL construction steps above so that the user can input any search query (q
).
- Use your function to create a URL that finds articles related to
Ferris Bueller's Day Off
(note the apostrophe). What is interesting about how the title appears in the URL? - Repeat for the query
Penn & Teller
(make sure you use the punctuation mark&
). What do you notice?
Take a look at the Wikipedia page describing percent encoding. Explain how the process works in your own words.
Solution
# Note that this function uses a times_key object that has already been created
# Another choice is to allow input of the API key as an additional argument
<- function(query) {
create_url <- "http://api.nytimes.com/svc/search/v2/articlesearch.json"
url <- param_set(url, "q", url_encode(query))
url <- param_set(url, "api-key", url_encode(times_key))
url
url
}
create_url(query = "Ferris Bueller's Day Off")
[1] "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=Ferris%20Bueller%27s%20Day%20Off&api-key=xy9oy1eczTOTGAFjAfnrmZJO2mpSPvXQ"
create_url(query = "Penn & Teller")
[1] "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=Penn%20%26%20Teller&api-key=xy9oy1eczTOTGAFjAfnrmZJO2mpSPvXQ"
Exercise: Write out the pseudocode for a function that takes a data frame of arbitrarily many key-value pairs and constructs the URL. Then write the function itself. Example data frame of key-value pairs:
<- tibble(
key_val_pairs key = c("q", "api-key", "begin_date", "end_date"),
value = c("economy", "API_KEY", "20230101", "20231001")
) key_val_pairs
# A tibble: 4 × 2
key value
<chr> <chr>
1 q economy
2 api-key API_KEY
3 begin_date 20230101
4 end_date 20231001
Solution
<- function(df) {
create_url_from_df <- "http://api.nytimes.com/svc/search/v2/articlesearch.json"
url for (i in 1:nrow(df)) {
# get the key in the ith row...and the value
<- df$key[i]
this_key <- df$value[i]
this_value <- param_set(url, this_key, url_encode(this_value))
url
}
url
}
create_url_from_df(df = key_val_pairs)
[1] "http://api.nytimes.com/svc/search/v2/articlesearch.json?q=economy&api-key=API_KEY&begin_date=20230101&end_date=20231001"
Web API responses
For a deeper dive, consult the following readings:
- A Non-Programmer’s Introduction to JSON
- Getting Started With JSON and jsonlite
- Fetching JSON data from REST APIs
Let’s discuss the structure of the web response, the return value of the Web API function. Web APIs generate string responses. If you visited the earlier New York Times API link in your browser, you would be shown the string response from the New York Times web server:
{"status":"OK","copyright":"Copyright (c) 2023 The New York Times Company. All Rights Reserved.","response":{"docs":[{"abstract":"Who would have guessed that magic’s most recognizable buddy pair would produce the classiest reality show on television?","web_url":"https://www.nytimes.com/2019/11/26/magazine/letter-of-recommendation-penn-teller-fool-us.html","snippet":"Who would have guessed that magic’s most recognizable buddy pair would produce the classiest reality show on television?","lead_paragraph":"“Penn & Teller: Fool Us” is a reality-TV competition shown on the CW, which is a broadcast network, which is something like a streaming service that’s always on. The show was recently renewed for its seventh season. The only other person I know who watches it is a skilled amateur magician and general magic geek who lives in Chicago. For him, the show is a chance to be exposed to some of the world’s greatest magicians and get an insight into their arcane techniques. For me, who doesn’t particularly like magic and has no intention of trying to do it, the show has a different appeal: It makes me a better person.","print_section":"MM","print_page":"24","source":"The New York Times","multimedia":
If you stared very hard at the above response, you may be able to interpret it. However, it would be much easier to interact with the response in some more structured, programmatic way. The vast majority of Web APIs, including the New York Times, use a standard called JSON (Javascript Object Notation) to take data and encode it as a string.
To understand the structure of JSON, take the NY Times web response in your browser, and copy and paste it into this online JSON formatter. The formatter will add newlines and tabs to make the data more human-readable. You’ll see the following:
{
"status":"OK",
"copyright":"Copyright (c) 2023 The New York Times Company. All Rights Reserved.",
"response":{
"docs":[
# A HUGE piece of data, with one object for each of the result articles
],
"meta":{
"hits":1755,
"offset":0,
"time":51
}
}
}
You’ll notice a few things in the JSON above:
- Strings are enclosed in double quotes, for example
"status"
and"OK"
. - Numbers are written plainly, like
2350
or72
. - Some data is enclosed in square brackets
[
and]
. These data containers can be thought of as R lists. - Some data is enclosed in curly braces
{
and}
. These data containers are called Objects. An object can be thought of as a single case or observation in a table.- The columns or variables for the observation appear as keys on the left (
hits
,offset
, etc.). - The values appear after the specific key separated by a colon (
2350
, and0
, respectively).
- The columns or variables for the observation appear as keys on the left (
Thus, we can think of the meta
object above as:
hits | offset | time |
---|---|---|
1755 | 0 | 51 |
Let’s repeat the NY Times search for “gamergate”, but this time we will perform the Web API call by hand instead of using the nytimes
wrapper package. We will use the jsonlite
package to retrieve the response from the web server and turn the string response into an R
object. The fromJson
function sends our request out over and across the web to the NY Times web server, retrieves it, and turns it from a JSON-formatted string into R data.
library(jsonlite)
Attaching package: 'jsonlite'
The following object is masked from 'package:purrr':
flatten
# Rebuild the URL
<- "http://api.nytimes.com/svc/search/v2/articlesearch.json"
url <- param_set(url, "q", url_encode("gamergate"))
url <- param_set(url, "api-key", url_encode(times_key))
url
# Send the request to the webserver over the Internet and
# retrieve the JSON response. Turn the JSON response into an
# R Object.
<- fromJSON(url) gamergate_json
gamergate_json
is a list. A list is a useful structure for storing elements of different types. Data frames are special cases of lists where each list element has the same length (but where the list elements have different classes).
Lists are a very flexible data structure but can be very confusing because list elements can be lists themselves!
We can explore the structure of a list in two ways:
- Entering
View(list_object)
in the Console. The triangle buttons on the left allow you to toggle dropdowns to explore list elements. - Using the
str()
(structure) function.
Exercise: Explore the information in the gamergate_json
using both View()
and str()
. When using str()
, look up the documentation and experiment with the max.level
and vec.len
arguments to control how the output is displayed. Look back and forth between the View()
and str()
output to find correspondences in how object structure is displayed.
We can access elements of a list in three ways:
- By position with double square brackets
[[
:
# This gets the first element of the list
1]] gamergate_json[[
[1] "OK"
- By name with double square brackets
[[
: (note that list elements are not always named, so this won’t always be possible)
# Accessing by name directly
"status"]] gamergate_json[[
[1] "OK"
# Accessing via a variable
<- "status"
which_element gamergate_json[[which_element]]
[1] "OK"
- By name with a dollar sign
$
: (Helpful tip: For this mode of access, RStudio allows tab completion to fill in the full name)
$status gamergate_json
[1] "OK"
We can retrieve these nested attributes by sequentially accessing the object keys from the outside in. For example, the meta
element would be accessed as follows:
$response$meta gamergate_json
$hits
[1] 145
$offset
[1] 0
$time
[1] 34
Exercise: In the gamergate_json
object, retrieve the data associated with:
- the
copyright
key - the number of
hits
(number of search results) within themeta
object - the abstracts and leading paragraphs of the articles found in the search
Solution
$copyright gamergate_json
[1] "Copyright (c) 2024 The New York Times Company. All Rights Reserved."
$response$meta$hits gamergate_json
[1] 145
$response$docs$abstract gamergate_json
[1] "Here’s what you need to know."
[2] "Intel’s decision added to a controversy that has focused attention on the treatment of women in the games business and the power of online mobs."
[3] "The atmosphere has become so toxic that critics and developers are urging big companies in the $70-billion-a-year video game industry to do more to stop it."
[4] "How online mobs harassed the targets of Gamergate, Christine Blasey Ford and me."
[5] "The precursors to Gamergate were disinformation campaigns targeting women of color."
[6] "The legacy of Gamergate."
[7] "The powerful lesson of a 5-year-old harassment campaign: How to wage a post-truth information war."
[8] "If their bids at motherhood fail, they can then regrow their brains."
[9] "A day after SXSW Interactive canceled two video game panels related to the so-called GamerGate movement over threats of violence, two digital media organizations threatened to pull out of the tech conference."
[10] "The future of video games is threatened by the ugly culture around them."
$response$docs$lead_paragraph gamergate_json
[1] "(Want to get this briefing by email? Here’s the sign-up.)"
[2] "For a little more than a month, a firestorm over sexism and journalistic ethics has roiled the video game community, culminating in an orchestrated campaign to pressure companies into pulling their advertisements from game sites. "
[3] "Anita Sarkeesian, a feminist cultural critic, has for months received death and rape threats from opponents of her recent work challenging the stereotypes of women in video games. Bomb threats for her public talks are now routine. One detractor created a game in which players can click their mouse to punch an image of her face."
[4] "How online mobs harassed the targets of Gamergate, Christine Blasey Ford and me."
[5] "The precursors to Gamergate were disinformation campaigns targeting women of color."
[6] "The legacy of Gamergate."
[7] "The powerful lesson of a 5-year-old harassment campaign: How to wage a post-truth information war."
[8] "The Indian jumping ant, Harpegnathos saltator, has many talents. This inch-long arthropod, found in flood plains across India, has a four-inch vertical leap and the ability to take down prey nearly twice its size. If that wasn’t enough, these amazing ants can also adjust the size of their own brains."
[9] "SXSW Interactive, the annual gathering of technology tastemakers and thought leaders in Austin, Tex., is facing a growing backlash over a decision to cancel two panels on video game culture, with two digital media organizations threatening to pull out of the event."
[10] "FOR more than five years, almost every word that I’ve written professionally has been about video games. I used to cover things like presidential campaigns and prison reform. But at some point, video games began to seem as consequential as those subjects, if not more so."
# Both (abstract and leading paragraph) at once
$response$docs[c("abstract", "lead_paragraph")] gamergate_json
abstract
1 Here’s what you need to know.
2 Intel’s decision added to a controversy that has focused attention on the treatment of women in the games business and the power of online mobs.
3 The atmosphere has become so toxic that critics and developers are urging big companies in the $70-billion-a-year video game industry to do more to stop it.
4 How online mobs harassed the targets of Gamergate, Christine Blasey Ford and me.
5 The precursors to Gamergate were disinformation campaigns targeting women of color.
6 The legacy of Gamergate.
7 The powerful lesson of a 5-year-old harassment campaign: How to wage a post-truth information war.
8 If their bids at motherhood fail, they can then regrow their brains.
9 A day after SXSW Interactive canceled two video game panels related to the so-called GamerGate movement over threats of violence, two digital media organizations threatened to pull out of the tech conference.
10 The future of video games is threatened by the ugly culture around them.
lead_paragraph
1 (Want to get this briefing by email? Here’s the sign-up.)
2 For a little more than a month, a firestorm over sexism and journalistic ethics has roiled the video game community, culminating in an orchestrated campaign to pressure companies into pulling their advertisements from game sites.
3 Anita Sarkeesian, a feminist cultural critic, has for months received death and rape threats from opponents of her recent work challenging the stereotypes of women in video games. Bomb threats for her public talks are now routine. One detractor created a game in which players can click their mouse to punch an image of her face.
4 How online mobs harassed the targets of Gamergate, Christine Blasey Ford and me.
5 The precursors to Gamergate were disinformation campaigns targeting women of color.
6 The legacy of Gamergate.
7 The powerful lesson of a 5-year-old harassment campaign: How to wage a post-truth information war.
8 The Indian jumping ant, Harpegnathos saltator, has many talents. This inch-long arthropod, found in flood plains across India, has a four-inch vertical leap and the ability to take down prey nearly twice its size. If that wasn’t enough, these amazing ants can also adjust the size of their own brains.
9 SXSW Interactive, the annual gathering of technology tastemakers and thought leaders in Austin, Tex., is facing a growing backlash over a decision to cancel two panels on video game culture, with two digital media organizations threatening to pull out of the event.
10 FOR more than five years, almost every word that I’ve written professionally has been about video games. I used to cover things like presidential campaigns and prison reform. But at some point, video games began to seem as consequential as those subjects, if not more so.
$response$docs %>%
gamergate_jsonselect(abstract, lead_paragraph)
abstract
1 Here’s what you need to know.
2 Intel’s decision added to a controversy that has focused attention on the treatment of women in the games business and the power of online mobs.
3 The atmosphere has become so toxic that critics and developers are urging big companies in the $70-billion-a-year video game industry to do more to stop it.
4 How online mobs harassed the targets of Gamergate, Christine Blasey Ford and me.
5 The precursors to Gamergate were disinformation campaigns targeting women of color.
6 The legacy of Gamergate.
7 The powerful lesson of a 5-year-old harassment campaign: How to wage a post-truth information war.
8 If their bids at motherhood fail, they can then regrow their brains.
9 A day after SXSW Interactive canceled two video game panels related to the so-called GamerGate movement over threats of violence, two digital media organizations threatened to pull out of the tech conference.
10 The future of video games is threatened by the ugly culture around them.
lead_paragraph
1 (Want to get this briefing by email? Here’s the sign-up.)
2 For a little more than a month, a firestorm over sexism and journalistic ethics has roiled the video game community, culminating in an orchestrated campaign to pressure companies into pulling their advertisements from game sites.
3 Anita Sarkeesian, a feminist cultural critic, has for months received death and rape threats from opponents of her recent work challenging the stereotypes of women in video games. Bomb threats for her public talks are now routine. One detractor created a game in which players can click their mouse to punch an image of her face.
4 How online mobs harassed the targets of Gamergate, Christine Blasey Ford and me.
5 The precursors to Gamergate were disinformation campaigns targeting women of color.
6 The legacy of Gamergate.
7 The powerful lesson of a 5-year-old harassment campaign: How to wage a post-truth information war.
8 The Indian jumping ant, Harpegnathos saltator, has many talents. This inch-long arthropod, found in flood plains across India, has a four-inch vertical leap and the ability to take down prey nearly twice its size. If that wasn’t enough, these amazing ants can also adjust the size of their own brains.
9 SXSW Interactive, the annual gathering of technology tastemakers and thought leaders in Austin, Tex., is facing a growing backlash over a decision to cancel two panels on video game culture, with two digital media organizations threatening to pull out of the event.
10 FOR more than five years, almost every word that I’ve written professionally has been about video games. I used to cover things like presidential campaigns and prison reform. But at some point, video games began to seem as consequential as those subjects, if not more so.
Exercise: Your own article search
Select your own article search query (any topic of interest to you). You may want to play with NY Times online search or the API web search console to find a query that is interesting, but not overly popular. You can change any part of the query you would like. Your query should have at least 30 matches.
Retrieve data for the first three pages of search results from the article search API, and create a data frame that joins together the
docs
data frames for the three pages of results. (Read the “Multiple pages of search results” section below to see how to combine multiple pages of results withbind_rows()
.)Make a plot of the number of search results over time in your result set (likely by day or month). This will involve some data wrangling. It will be helpful to have the
lubridate
reference page open.
Multiple pages of search results
Here is some code to generate queries on NY Times articles about the Red Sox. It fetches the first thirty entries in batches of 10.
<- "http://api.nytimes.com/svc/search/v2/articlesearch.json"
url <- param_set(url, "q", url_encode("Red Sox"))
url <- param_set(url, "api-key", url_encode(times_key))
url <- param_set(url, "page", 0)
url Sys.sleep(1)
<- fromJSON(url)
res1
# This pauses for 1 second.
# It is required when knitting to prevent R from issuing too many requests to
# The NYT API at a time. If you don't have it you will get an error that
# says "Too Many Requests (429)"
Sys.sleep(1)
<- param_set(url, "page", 1)
url <- fromJSON(url)
res2
Sys.sleep(1)
<- param_set(url, "page", 2)
url <- fromJSON(url)
res3
<- res1$response$docs
docs1 <- res2$response$docs
docs2 <- res3$response$docs docs3
Each of these docs variables is a table with ten entries (articles) and the same 18 variables:
colnames(docs1)
colnames(docs2)
colnames(docs3)
Now we want to stack the tables on top of each other to get a single table with 30 rows and 18 variables. We can use:
bind_rows(docs1,docs2,docs3)
Extra practice: Create your own public API visualization
Browse toddomotos’ list of Public APIs and abhishekbanthia’s list of Public APIs. Select one of the APIs from the list. Here are a few criteria you should consider:
- Use the JSON approach we illustrated above; not all APIs support JSON. (If you want to use an API that does not support JSON, you can check if there is an
R
wrapper package.) - Stay away from APIs that require OAuth for Authorization unless you are prepared for extra work before you get data! Most of the large social APIs (Facebook, LinkedIn, Twitter, etc.) require OAuth. toddomoto’s page lists this explicitly, but you’ll need to dig a bit if the API is only on abhishekbanthia’s list.
- You will probably need to explore several different APIs before you find one that works well for your interests and this exercise.
- Beware of the
rate limits
associated with the API you choose. These determine the maximum number of API calls you can make per second, hour or day. Though these are not always officially published, you can find them by Google (for example)GitHub API rate limit
. If you need to slow your program down to meet the API insert calls toSys.sleep(1)
as is done in the example below. - Sketch out one interesting visualization that relies on the public API you selected earlier. Make sure the exact data you need is available. If it’s not, try a new visualization or API.
- If a wrapper package is available, you may use it, but you should also try to create the request URL and retrieve the JSON data using the techniques we showed earlier, without the wrapper package.
- Visualize the data you collected and describe the results.