Text is extracted from underlying PDF filings (annual and interim reports) and unwanted elements are identified and removed.
Text is manually labelled using a custom taxonomy to enable nuanced analysis and like-for-like text comparison.
A hybrid human machine process allows for highly accurate identification of elements, including:
● Management Discussion & Analysis
● Risk Disclosures
● Speaker Information
The data has been designed with the quantitative investor in mind, key features include:
● Long & Bias-free History
● Point-in-time entity identifiers
● Full versioning of all changes
● Easy access via AWS S3
To set up a trial or backtest the data, please reach out