Sunday, November 23, 2014

Not your father's price index: the Billion Prices Project

The price of 52 Samsung TVs gathered by the BPP, April 2008 - November 2009 (Cavallo)

In a previous post, I mentioned that the Billion Prices Project (BPP) contradicts the claims of those who believe that the government understates inflation data. The BPP crawls major US retailers' websites and scrapes them for price data, compiling an overall US Daily Index that is available on its website. The deviation between this index and the official CPI is minimal, as the above link shows.

The BPP isn't your father's price index—it shouldn't be viewed as a perfect substitute for the CPI. So use it wisely. What follows are a few details that I've gleaned from several papers on the topic of online price indexes as well my correspondence with Roberto Rigobon, one of the project's founders.

The most obvious difference between it and the CPI is in the datasets:

1) Online vs offline: The price data to generate the CPI is harvested by Bureau of Labour Statistics (BLS) inspectors who trudge through brick & mortar retailers. Rigobon and his co-founder Alberto Cavallo get their data by sending out lightning fast algorithms to scrape the websites of online retailers.

2) Wide vs Narrow: BLS inspectors compile prices on a wide range of consumer goods and services. According to Cavallo, only 60% of the items that are in the CPI are available online. The ability to track service prices online is particularly limited given the fact that most large retailers' websites only sell goods.

Let's get into some more specifics about what is included in the BPP, because there seems to be some confusion about this in the online discussion. Some commentators have mentioned that the BPP doesn't include gasoline prices. Rigobon informs me that this is wrong, gas prices are included in the US Daily Index. As for the cost of housing, my understanding is the BPP does track real estate data. It incorporates these prices using the same methodology as the BLS. So any deviation between the BPP and CPI should not be attributed to the BPP's lack of either gas prices or housing.

Lastly, despite the fact that service prices are under-represented online, the BPP's US Daily Index does include a number of services. According to Rigobon, the easiest ones to track are things like health insurance, transportation, restaurants, hotel, and haircuts. Others are hard to track, like the cost of public education. My understanding is that Rigobon and Cavallo may use proprietary methods to calculate service prices by referring to various goods' prices as proxies (see here). For instance, in this BIS comment on the BPP, it is noted that the price of education can be computed from prices of text books, uniforms, energy and construction materials, all of which represent 75% of cost of education.

3) Often vs rare: The BPP's algorithms trawl retailer websites every day. BLS inspectors stroll through the malls just once each month.

Another big difference is in the publication of the data:

4) Now vs later: The BPP is reported three days after the data has been gathered, and ten days for non-subscribers. CPI is reported with a long delay, usually the second or third week following the month being covered.

The next few differences are a little more technical:

5) Fixed vs Responsive: Both indexes measure entirely different consumption baskets. The BLS surveys U.S. households every few years in order to gather information about their spending habits. It uses this information to construct a fixed representative basket of goods & services consumed by Americans, and then proceeds to fill in the data each month. This survey approach results in a CPI basket that takes time to adjust to new products. Should a revolutionary device, say a universal mind reader, suddenly becomes popular, it won't be reflected in the CPI till the next survey.

Think of the BPP as capturing a dynamic market-determined consumption basket. The BPP basket is comprised of whatever goods retailers happen to be selling online that day in order to meet customer demand. Because retailers are constantly updating their websites, August 7's basket could be different from August 8's. This means that new goods will be quickly incorporated into Rigobon and Alvarez's inflation calculation. In other words, when universal mind readers do catch on, the BPP will incorporate this data way before the BLS will.

One of the most interesting differences is the difference in methodology:

6) Small vs Large Sample size: The BLS delicately samples offline prices whereas the BPP bulldozes through a large percentage of the entire population of online retailers' prices.

There are millions of goods sold in the US, and it would be cruel to force BLS inspectors to collect prices for all of them. To simplify the calculation, the BLS brain trust chooses individual products to serve as ideal representatives for given product categories. Take dishwashers. To represent the category, they might select the Whirlpool WTD-10 or some such model. A BLS data collector in New York City will go every month to a specific store, say Macy's on West 34th St, and grab that specific model's price. The repetitive use of the same product and location ensures that the New York City dishwasher price index is not corrupted by changes that have little to do with purchasing power. (The alternating collection of prices from Macy's on West 34th and Nordstrom's on Union Square might introduce price changes having little to do with inflation.)

Because their algorithms are whip fast and don't require salaries, Alvarez and Rigobon can afford to send them out each day to Macy's website to gather the price of every single dishwasher. They do this for each of the major online retailers, say Walmart, Target, and Best Buy. The final assemblage of prices represents something close to the entire population of online US dishwasher prices on every single day!

This segues into the thorny problem of adjusting for quality changes. They both use different techniques:

7) Statistical vs market-based quality adjustments: As I pointed out, the BLS samples one good to represent a given category rather than canvassing the full range of products within that category. This causes some difficulties in accounting for quality changes when that one good is replaced by another product.

Let's return to the Macy's example. Say Macy's stops stocking the Whirlpool WTD-10. On arriving at Macy's a few weeks later, our flummoxed BLS data collector has to find a replacement in order to keep the dishwasher price category up to date. Let's say she grabs the price of a General Electric XK-400 from across the aisle. The GE is priced $50 higher than the missing Whirlpool was during the inspector's previous visit. The problem is this: how does the BLS determine how much of that $50 increase is due to changes in quality and how much is due to changes in inflation? If the GE is the same in every way to the Whirlpool except its boasts a turbo wash option, then some portion of the $50 increase is due to the higher quality of the GE. But how much?

Because Cavallo and Rigobon's tireless algorithms regularly retrieve multiple product prices for each category rather than single monthly representatives, they can use the overlapping nature of the data to seamlessly splice in new products. Let's say that the expensive GE dishwasher is introduced to Macy's website. It is sold on the same page as the existing and cheaper Whirlpool for a few days at which point the latter is removed. On the day the GE first appears, the BPP ascribes its higher price to its superior quality. While the GE drives up the average price of dishwashers on Macy's dishwasher page, the purchasing power of a Macy's shopper hasn't been altered, rather, a given dollar buys more 'dishwashing services' than before. Only on day 2, after the GE's price has been retrieved a second time by the algorithms, is it allowed to start affecting the index, since any price change thereafter is considered to be due to inflation, not quality.

The assumption that the GE's premium is due entirely to quality is based on the idea that market prices are accurate measures of all that is known by producers and consumers about a given set of products.

Because CPI collectors have limited resources and typically only collect the price of one representative dishwasher, they usually can't rely on the overlap between dishwasher model prices to measure quality changes. One method they have developed to compute quality changes is hedonic regression. In brief, a dishwasher is conceptually broken up into a package of characteristics, including its size, time per run, energy efficiency, etc. When the Whirlpool is suddenly dropped by Macy's and the GE added, CPI data collectors try to determine what sorts of new characteristics have been incorporated in the GE and then use regression methods to determine the dollar value of that characteristic.

So to sum up, to calculate quality changes, the BPP piggy backs on the power of the market to price differences in quality. The BLS uses econometric methods (among other tools) to control for quality changes.

Here is a big one, the difference in ownership of the indexes:

8) Private vs public: The CPI is compiled by the BLS and funded by taxpayers, whereas Rigobon and Cavallo have incorporated a private company called Pricestats to compile the BPP US Daily index and its many other indexes. PriceStats work in partnership financial-giant State Street to distribute the data to paying subscribers.

Which leads into the last major difference:

9) Transparent vs opaque: There is loads of documentation on the CPI. If you have any questions, call up the BLS and a researcher will walk you through it—it's your right as a taxpayer. PriceStats can only reveal so much information because their methods are proprietary (although Dr Rigobon was kind enough to answer a number of my questions). I suspect they are hesitant to reveal too much of information because the retailers on which they have gathered data might view this as a potentially threatening action. Not so with the CPI.

So those are some of the features of each index. In the case of the BPP, the difficulty of getting public information on their methodology is probably the biggest bug, although the founders are forthcoming on general questions. Maybe if national statistics agencies start adopting BPP data collection methods, the transparency problem will be solved, since public agencies have no competitive reasons (and less legal ones) to hold back information on methodology. There seem to be rumblings in this direction: Statistics New Zealand says that they are in the early stages of a collaboration with with PriceStats to develop online price indexes (link).

For now the public is lucky to get access to the US Daily Index, even on a 10-day delay. When CPI numbers are reported, the bond market quakes. For hedge funds, getting a hint of what the upcoming government inflation print will be before anyone else is probably worth a lot of money. No doubt that's why they are willing to pay to subscribe to get PriceStat's numbers. These funds would probably prefer if the public were not privy to the US Daily Index as it reduces the information's value. The amount they'd be willing to pay PriceStats to yank the US Daily Index from the public domain would be a good indicator of the value the public gains by getting free access to it. It could be a substantial number.


Back to initial reason for writing about the BPP; the gold bugs (Gulp, you thought I'd forgotten about you, right?). Your typical gold bug will sagely mention some esoteric price that has risen at an incredible rate over the last few years, like the price of shitaake mushrooms or a 1982 GI-Joe Snake Eyes collectors action figure (here is Peter Schiff using the Big Mac). A gold bug is convinced that their preferred data series is sufficiently strong evidence to justify declaring inflation to be stratospheric and the entire CPI null and void.

What makes gold bugs think that their one or two pet prices are a superior measure of the dollar's purchasing power than the BPP US Daily Index? Crickets. That sums up the gold bug response to the BPP's existence. If not crickets, then desperate attempts to change the subject.

Gold bugs don't like to talk about the BPP because they don't want to be dissuaded from their views—they find too much comfort in them. With the BPP continuing to move in line with the CPI, the gold bug community's cognitive dissonance is growing. At some point, the squirm-level will get large enough that they'll have to do something about it. No doubt the easiest route will be to come up with a fiction that discredits the BPP US Daily index. Well, hey gold bugs, here's a conspiracy theory you can use to save yourselves some painful cognitive dissonance... the Billion Prices Index went offline for a period of time, just when it appeared to be showing a break with the CPI index. When it went back online, the two started to converge. Could it be that Rigobon and Alvarez were brought into some FBI dungeon and re-programmed, the BPP moving more in line with the party line after they emerged? Yeah, that's it.


  1. Another good informative post JP.

    On 5: but how does BPP decide on the weighting of goods in the basket? BLS uses consumer expenditure to weight the items. Does BPP just weight them all equally?

    1. Hi Nick, my understanding is that the BPP adopts the CPI's weightings.

  2. Very nice post. But I wonder what you think about the argument advanced by Alchian and Klein that an inflation rate measurement must include asset prices?

    1. I agree that if you really want to measure the purchasing power of a dollar, you need to include asset prices. More specifically, you should measure all transactions, with (financial) assets being an important constituent of that population, as are intermediate goods and used goods.

      Measuring these prices is the difficult part. Also tricky is accounting for quality changes in assets like stocks, a point brought up in the Alchian paper. Are the costs of compiling such an index large enough to justify the expense? Alchian and Klein thought so:

      "...we believe that the marginal cost of improving a price index along these
      lines is less than the marginal gains of improved monetary and fiscal policy consequent to less misleading indicators of inflation"

  3. Thought you might like this. I can't believe I never once realized that Star Wars is a Western. An "oh, duh" moment.

    Btw, any thoughts on XRP--someone painting the tape, or just speculative mania? Also, can you explain coinmarketcap's $450m "market cap value" for XRP? On Ripple Charts, it says $1.4B. I get that it includes all 99B XRP. But the pie chart just shows a few million dollars worth being held on exchanges. Are most of the other 31B XRP "in the wild" being held in user accounts at Ripple Trade?

  4. I"m not a cricket.
    You want to know why they don't diverge?
    First show me exactly how the bpp does it. No black boxes that replicate the bls, ok? That is, not uncoincidentaly, just like the global warming scam.