Part 3: Why Language(policy) + Data (structured) = Predictive Analytics

Updated: May 14, 2019


Part 1 of this series explained why and how the language of public policy is profoundly different from the vernacular. The short version: every time policymakers communicate, they do so with a specific intention of taking action and/or signaling the action they want to take. Their words are action. Every time.

Part 2 focused on how natural language processing provides the mechanism to translate words into integers (structured data) representing the type of action represented by the words policymakers use. This is not just alternative data, it is radically new kinds of data.

This Part 3 explains how viewing language-based data visualizations delivers predictive analytics regarding public policy. Skip to the bottom for a fun video illustrating in under 2 minutes the operational efficiencies and strategic benefits associated with this technology.

Superforecasting & Nowcasting

People tend to fall into two camps when it comes to assessing public policy. They either believe that policymaking is random and unpredictable or they believe that the only way to anticipate a policy outcome is to be “in the room where it happens” as a lobbyist or a policymaker. In general, capital market participants tend to fall in the first camp and lobbyists tend to fall in the second camp.

Alternative data providers regarding public policy tend to feed these perceptions by finding ways to generate data regarding campaign contributions, voting records, policymaker relationships, and meeting schedules. The problem with these approaches is that they all sit on a cynical foundation. All these approaches discount or ignore what policymakers actually say and do. The cynical expectation is that a policymaker can be manipulated into taking a decision.

If these approaches were correct, then the capital allocating to the lobbying process would have a higher success rate. And if these approaches were correct, then it would be impossible to anticipate a policy outcome using only publicly available information.

There is a better way to anticipate policy outcomes with accuracy and without dabbling in material non-public information.

Welcome to superforecasters and nowcasting.

It is well known that non-experts can vastly outperform experts with classified information in predicting a wide range of events. The proof comes from a 20-year government study published as a popular book a few years ago (Superforecasters), reviewed here by a Scientific American blog and recently highlighted again in the upcoming issue of The Atlantic.

The point is not that superforecasters are idiots savants. As Dr. Tetlock made clear in his book, the art and science of generating highly accurate predictions requires three key components: (i) access to a constant stream of relevant data, (ii) an objective mindset that accepts new information; and (iii) daily assessments or course corrections. The Atlantic adds a nuance, hinting that being an expert can become a hindrance because experts are biased towards seeing the world through their expert lens. Non-experts are not wedded to a specific construct and so can shift mental gears nimbly when the facts require.

Superforecasters, then, are much like capital market participants who engage in “nowcasting.” They take current (not historical) data in order to make judgements about likely trajectories over the near term. Capital market participants learn from the beginning to accept market realities (“don’t fight the tape” or “the market is never wrong”) and adjust their positions accordingly.

Let’s be clear that this is all about anticipating an outcome rather than predicting one. A prediction implies that a specific outcome is inevitable regardless of the decisions individuals make. Anticipating an outcome, by contrast, implies that changes in trajectory are still possible. Free will exists and the policy trajectory can change. The goal is to spot the inflection point in the language in order to anticipate the outcome.

Superforecasting using data derived from policy language therefore looks at what policymakers say each day before a final decision (a vote, a communique issuance, etc.) to see what direction they are taking right now. Because policymakers signal publicly their intended action, the language they use is highly reliable for use in supporting policy forecasts.

The analytical process can also be automated using our patented Metadata tagging process. Natural language processing – and the translation of words into numbers – makes it possible for every word, statement, report, hearing, and final decision (law, regulation, treaty, communique, etc.) to be read, tagged, and quantified....automatically Automated systems perform these tasks far faster than humans. The output accelerates the human analytical function by enabling a human being to offload to the computer the initial intake and categorization functions.

Process automation means that current superforecasters can anticipate outcomes more accurately and faster. These benefits accrue even before machine learning and artificial intelligence processes have been deployed. Process automation also means that a larger proportion of people can become superforecasters merely by accessing the data stream on a daily basis.

The resulting predictive analytics take two forms: DIY and Automated. The only difference between the two is the capital available to build the appropriate system.

The DIY Option

In the DIY Option, a superforecaster – and anyone seeking to become a superforecaster –accesses the daily inputs used by the automated system to compile the language quantification. In a DIY Option, the person effectively reads the same original sources as the automated system and connects the dots at the human level.

The system still delivers operational efficiencies and enhanced cognition by providing readers with the tools necessary to conduct information triage. Rather than read original sources as they are released, the automated system makes it easier for a person to determine which items are a higher priority to read at any given moment in time. Because the language information is presented first as quantitative data visualizations, analytical objectivity is further enhanced by creating a barrier between the language (which can generate an emotional response) and the reader at the beginning of the analytical process.

The Automated Option

The current machine learning and artificial intelligence craze accelerates the prospects for superforecasting, of course. At a basic level, these processes identify correlations and covariances within any given data set faster than humans. Data derived from policymaker language should theoretically deliver superior automated forecasts as well. But the superiority of the forecast in this context is not so much about speed to a conclusion (the outputs) as it is about precision with a highly curated lexicon and a highly curated list of individuals generating the language (the inputs).

In other words, automated superforecasting in the policy risk context is not about the system ingesting every blog post, editorial, and polling question. To generate accurate results, automated policy risk superforecasting requires a very specialized dataset focused on the language policymakers themselves are using.


We are excited to be on the frontier of this part of the data revolution. Our patented metadata tagging process focuses like a laser on the language of policy. This means we are daily generating entirely new data that will form the foundation for superior training data (when enough observations have been collected). The system will not have to learn by trail-and-error which lexicon elements matter for public policy formation; we will already have that data. Machine learning and artificial intelligence based on our patented data will initiate their processes from a better, more accurate starting point.

We are making the data available to users on a DIY basis (our Early Adopter Program) because we believe that people will make better decisions regarding their exposure to public policy risk if they can see the data even at this early stage. To see how the system delivers operational efficiencies and strategic benefits, check out our newest video on YouTube: