Allen Tran


The US Economy via the Production Graph

Dec 21, 2014

At its heart, any economy is a massive network of agents, often formed into groups known as firms, who interact with each other. A firm buys from one group of firms, sells to others (think of consumers as another firm also supply labor services) in a massive convoluted way. But because we can't keep track of agents and their connections, we tend to think of the economy as sectors linearly feeding into one another. Agriculture/mining goods processed into final goods by some manufacturing sector.

In line with this, government agencies like the US Census Bureau define a standard bunch of industries (e.g NAICS, SIC), aggregating establishments by primary business activity. Although aggregation makes for easier analysis, (1) these aggregate industries often make little sense (all services are arbitarily lumped together) and more importantly, (2) information from the links between firms are lost. Maintaining information from these links is important as one often needs to be able to trace out the impact of an event on other firms (e.g What sectors are most exposed to the Tech bubble bursting?).

Instead of grouping by primary business activity, here is a better approach that directly utilizes data on firm linkages. Represent the economy as an edge-weighted graph, with units of production as nodes and edges measuring the extent of trade between the two. Group the nodes by clusters such that you minimize inter-group links and maximize intra-group links. In other words, spectral clustering on the production consumption graph.

  1. Create a symmetric adjacency matrix from the production/consumption graph (I use a combination of use and make input-output matrices from the BEA)
  2. Calculate the Laplacian of the graph
  3. Cluster on the eigenvectors corresponding to the lowest eigenvalues greater than the first trivial eigenvalue

Figure 1: Industries from clustering on production/consumption networks

The results of aggregating 6-digit industries into 10 clusters are shown above with colors indicating membership of both row and column industries in a clustered sector (black cells are mismatched industries). The percentage values reflect the degree of overlap between the two industries. A few fun things to note about the results which are shown above.

The one downside of this exercise is that I had to start with 6 digit NAICS industries to begin with. Check back and I'll have a version soon with the Commodity Flow Survey, which measures actual shipments between industries in geographic areas. Fun times.