top of page

AI Training Data for Global Macro -- Text-based Data

  • Writer: BCMstrategy, Inc.
    BCMstrategy, Inc.
  • Jul 16
  • 3 min read

Generative AI and natural language processing open up entirely new categories of data and analysis for global macro investment strategists: Language Data. Geopolitical risks present first verbally. These words move markets. Significant informational advantages accrue to those that can deploy generative AI technology effectively.


So much language data exists in so many places that choosing the right starting point can be challenging. This post provides some high level orientation to help global macro strategists configure their optimal language data feeds to power their predictive analytics and generative AI.


Top 4 Text-based Sources of AI Training Data for Global Macro


Teal hexagon with a news icon, titled "News & Social Media." Lists: Institutional Newsfeeds, Fact-checked Journalism, Email newsletters.

News and Social Media

Markets have always chased the news cycle for advanced notice of policy shifts that impact securities issuers and trading markets. From the tickertape and the telephone to the Bloomberg Terminal, the Blackberry, and institutional news feeds, and now to social media, information moves at the speed of light.


Automated headline and article readers have been triggering capital market trading activity long before "agentic AI" became a term.


Accessing the language, however, requires a data mining license.



Blue diagram with a podium icon and text: "Official Sector Action, Written content, Transcripts, Broadcasts." Number 2 at the bottom.

Official Sector Action

Signals regarding potential future policy moves appear in more places than the media.


Technology makes public policy language data more accessible than ever. However, configuring correct feeds for signal generation require knowing both where to look for the language and how to listen for the signal. Neither one of these activities is straightforward. Transforming that official sector language into machine-readable information is not a trivial exercise.


The award-winning, patented PolicyScope process provides the premier mechanism for accessing fully structured official sector language from a broad range of government sources with a robust ontology designed specifically to align with global macro and other capital market trading needs.


Megaphone icon with blue hexagonal arrow pointing to "Analysis, Opinion" text, listing documents like blogs and filings. Number 3 below.

Analysis and Opinion

Text-based data in this category is always foundational Golden Source data because the publisher always speaks for himself/herself.


However, not all the language is relevant to a specific global macro investment strategy. Not all the language has an impact on market prices.


Adding too much of this category of content also carries a few risks. First, it can dilute the strength of the signal from more direct sources of public policy risk (namely, news and official sector action). Second, it can import direct or indirect bias into the training data. It requires a fair amount of substantive knowledge regarding the policy process and the portfolio priority to craft the right feed.


Purple graphic with gear icon and text: "Structured Text, Machine-readable, Standard format, Expert-led Ontology." Number 4 in a circle below.

Structured Text

Arguably, this category of data did not exist 18 months ago. It transforms disparate language inputs into machine-readable text with a standard format and expert-led ontologies that organize the content with systematically-attached metadata tags. It makes text data fungible.


Data consumers no longer need to spend time and compute resources gathering disparate language inputs and then spend more time and compute resources structuring the content. They identify the stream of language data important to their global macro or other investment priority (e.g., climate and energy data) and receive a daily datafeed. Data consumers spend more time on higher-value activities, like identifying mispriced or underpriced risks, and less time collecting information.


Rapid evolution in language technology and generative AI is generating significant increased demand for language training data. Innovation and new business models are likely to emerge in the near-term to meet the demand for language training data. Technological superiority alone will not be sufficient to address market needs, at least with respect to public policy language data important to investment decisions.


It requires a high degree of knowledge and non-verbal experience to know how to structure public policy language for the purpose of policy trend projection and charting market reaction functions. The first step for capital market participants and advocates on the innovation frontier is to begin distinguishing the different kinds of language data so that their generative AI models can be trained correctly.

Infographic illustrating Language Data Inputs: News, Official Action, Opinion, Structured Text. Color-coded hexagons, icons, and text.

BCMstrategy, Inc. generates quantitative time series data and structured language data from public policy using a patented, award-winning process. Designed from the beginning to be used as ML/AI training data to support automated policy trend projection, the data is optimally structured to support deployment into automated research assistant applications powered by Generative AI. The company currently generates data within three thematic verticals: Monetary Policy (macroVS) | Climate/Energy Policy (CRRM3) | Digital Currency Policy (DCVS). More thematic verticals are planned for 2025.


Awards for BCMstrategy, Inc.'s ML/AI training data for renewable energy crypto and monetary policy alternative data

(c) 2025 BCMstrategy, Inc.

bottom of page