# Caching Response Content¶

You haven’t experienced it yet, but if you get complicated data back from a REST API, it may take you many tries to compose and debug code that processes that data in the way that you want. (See the Nested Data chapter.) It is a good practice, for many reasons, not to keep contacting a REST API to re-request the same data every time you run your program.

To avoid re-requesting the same data, we will use a programming pattern known as caching. It works like this:

1. Before doing some expensive operation (like calling requests.get to get data from a REST API), check whether your cache contains the results that would be generated by that operation.
2. If so, return the cached results.
3. If not, perform the expensive operation and save the results (e.g. the complicated data) in your cache so you won’t have to perform it again th enext time.

There are three reasons why caching is a good idea during your software development using REST APIs:

• It reduces load on the website that is providing you data. It is always nice to be courteous when using other people’s resources. Moreover, some websites impose rate limits: for example, after 15 requests in a 15 minute period, the site may start sending error responses. That will be confusing and annoying for you.
• It will make your program run faster. Connections over the Internet can take a few seconds, or even tens of seconds, if you are requesting a lot of data. It might not seem like much, but debugging is a lot easier when you can make a change in your program, run it, and get an almost instant response.
• It is harder to debug the code that processes complicated data if the content that is coming back can change on each run of your code. It’s amazing to be able to write programs that fetch real-time data like airport conditions or the latest tweets from Twitter. But it can be hard to debug that code if you are having problems that only occur on certain Tweets (e.g. those in foreign languages). When you encounter problematic data, it’s helpful if you save a copy and can debug your program working on that saved, static copy of the data.

In our implementation of the caching pattern, we will use a python dictionary to store the results of expensive operations (the calls to requests.get()). Behind the scenes, when requests.get() is executed, it takes the url_path (the first argument to the requests.get function) and the parameters dictionary (the params argument to the requests.get function), turns them into a full url, and then fetches data from a website based on that full url. We will use that full url that gets created as a key in our caching dictionary, and the returned text from the call to requests.get() as the associated value.

Note

In the revised version of the function requestURL below, we provide the items from the parameters dictionary as a list of tuples. This allows us to provide them in an order that we control (remember that when we extract the items or keys from a dictionary, they might come out in any order). In this case, in the canonical_order function, we provide a list of the key-value pairs, with the keys in alphabetic order.

The function requestURL can be useful in some other situations as well. Notably, when a call to requests.get() fails, and you don’t know why, call that function to print out the url to see exactly what it is. You can then copy and paste it into a browser, edit the URL and test that, and thus see what change might be needed to your request parameters. This was discussed in a previous chapter.

The code below implements the caching pattern described above.

def canonical_order(d):
# This function accepts a dictionary as input and returns a sorted list of tuples that represent its key-value pairs.
alphabetized_keys = sorted(d.keys())
# accumulate key-value pairs, in order of alphabetized keys
res = []
for k in alphabetized_keys:
res.append((k, d[k]))
return res

def requestURL(baseurl, params = {}):
# This function accepts a URL path and a params diction as inputs.
# It calls requests.get() with those inputs,
# organizes the keys in a canonical order using the function above,
# and returns the full URL of the data you want to get.
req = requests.Request(method = 'GET', url = baseurl, params = canonical_order(params))
prepped = req.prepare()
return prepped.url

def get_with_caching(base_url, params_diction, cache_diction):
# This function performs steps 1, 2, and 3 of the caching pattern
# detailed above.
full_url = requestURL(base_url, params)
# step 1
if full_url in cache_diction:
# step 2
return cache_diction[full_url]
else:
# step 3
response = requests.get(base_url, params = params_diction)
cache_diction[full_url] = response.text
return response.text


Now, the only problem with the code above is that the cache will disappear at the end of the execution of the python program in which we invoke get_with_caching(). In order to preserve the cache between between multiple invocations of our program, we will dump that dictionary to a file and reload from that file.

The python module pickle makes it easy to save the dictionary (or any other python object) in a file. (If you’re interested, you can read more about it in the formal Python documentation here.)

Note

Data that has been pickled and saved to a file is saved in a specific format that makes it easy to unpickle and reuse in a Python program. So it’s important that you don’t edit cached_data.txt (or any file you save pickled data in) in a text editor, because that can lead you to problems in the code that relies on the pickled object.

Here’s a version of the above code that uses the pickle module, along with an example of how we could use it with the FAA’s REST API. This is the format for caching that you should always use for getting complex data from a REST API.

Try saving this code in a file and running it multiple times. The first time, you’ll see the logging output telling you the item was retrieved from the FAA; subsequent times, it will say that it was retrieved from the cache. If you want to reset the cache to empty, so that you will have not have cached API data saved on your computer, just delete the file “cached_results.txt” from your file system. Or change the variable fname to a different value in the code, which will cause this code to cache your data in a different file. And if you run this code with a different URL, it will save a new key-value pair in your pickled cache dictionary!

import requests
import json
import pickle

cache_fname = "cached_results.txt"
try:
fobj = open(cache_fname, 'r')
saved_cache = pickle.load(fobj)
fobj.close()
except:
saved_cache = {}

def canonical_order(d):
alphabetized_keys = sorted(d.keys())
res = []
for k in alphabetized_keys:
res.append((k, d[k]))
return res

def requestURL(baseurl, params = {}):
req = requests.Request(method = 'GET', url = baseurl, params = canonical_order(params))
prepped = req.prepare()
return prepped.url

def get_with_caching(base_url, params_diction, cache_diction, cache_fname):
full_url = requestURL(base_url, params_diction)
# step 1
if full_url in cache_diction:
# step 2
print "retrieving cached result for " + full_url
return cache_diction[full_url]
else:
# step 3
response = requests.get(base_url, params=params_diction)
print "adding cached result for " + full_url
# add to the cache and save it permanently
cache_diction[full_url] = response.text
fobj = open(cache_fname, "w")
pickle.dump(cache_diction, fobj)
fobj.close()
return response.text

dest_url = 'http://services.faa.gov/airport/status/DTW'
d = {'format': 'json'}
result_text = get_with_caching(dest_url, d, saved_cache, cache_fname)
print json.loads(result_text)

Next Section - Searching for tags on flickr