LLM Training Data: Meet Poli

BCMstrategy, Inc.
Jul 31, 2025
2 min read

Updated: Aug 1, 2025

The Generative AI revolution in finance continues to gather momentum. During July alone, leading AI companies like Anthropic and OpenAI released dedicated applications focused solely on the finance sector. Trained on classic inputs like corporate filings and the news cycle, their initiatives promise to accelerate reliance on automated research,

At BCMstrategy, Inc., we are fond of saying that "the language is the data." Summer 2025 is when we take the next step by making the language data generated by our patented process available to enterprise buyers access to the .json files with their rich metadata tagging. Now enterprise buyers can access enterprise-grade LLM training data focused on public policy activities that move markets.

BCMstrategy, Inc. is delighted to announce in this context that we are making our stored, structured language data available to enterprise clients effective immediately. Meet POLI: a datafeed consisting of richly tagged .json objects covering key public policy issues. Now your generative AI can read, write, and think like a senior policymaker.

A New Kind of Alternative Data -- LLM Training Data

BCMstrategy, Inc. uses award-winning, patented metadata tagging technology to structure automatically a wide range of public policy text. Our principal purpose is to generate quantitative volume-based golden source data so that portfolio managers and investment advisors can measure risks related to volume, velocity, and volatility in public policy on a par with similar risk measures in the capital markets.

This is language that moves markets and changes peoples' lives. Now capital market AI applications can use PolicyScope data to train their generative AI models in addition to their quantitative risk models.

Blue-hued code on screen with text overlay: "Expert-led Ontology, Fewer training runs, Higher Accuracy." Tech-focused, data-driven mood.

The ontology and lexicons have been crafted by subject matter experts. They don't just have multiple advanced degrees from prestigious universities. They have potentially more valuable knowledge. As former government officials, they have deep experience in how policy language operates in the wild. This expert-crafted knowledge helps power our patented, automated metadata tagging process.

Futuristic digital room with blue data screens, tree-shaped hologram in center. Text: "Robust Metadata, in JSON." Tech ambiance.

Expert-led ontology translates into rich metadata: quantitative, conceptual, demographic, lexical. That metadata creates additional layers of context that provides grounding for modern language processing. From vector databases and knowledge graphs to RAG processes, your generative AI will learn faster and deliver more accurate answers regarding public policy issues using PolicyScope Data. Since we update twice a day, your LLM will always be working with the most up-to-date, objective facts.

Bank of monitor screens showing global data and graphs, with Earth in the center. Text: Extensible - the patented process applies to news.

Our patented process can be applied to a broad range of inputs. If you think it is important to the public policy process, we can score it. When we score it, you end up with a consistent measurement across all your input sources. You end up with a multi-factor approach to measuring not only public policy risks but also the size, duration, and decay rate of your informational advantage on a per-issue basis.

And since our metadata tags include stock market tickers, every quantitative analysis your AI system performs can also be expressed in terms of tradeable stock market assets.

Because the alternative data generated by the PolicyScope process bridges the boundary between quantitative and language elements, enterprise clients can use the data across their agentic AI flows seamlessly between text-based generative AI deployments and more traditional predictive analytics focused on mathematical calculations.

Learn More