Our partner is a private-sector financial software firm. Facing a growing volume of quarterly fund reports processed entirely by hand, they partnered with Dreamix to automate the extraction and transformation of complex financial data. The result: over 90% accuracy, parallel multi-fund processing, and the elimination of a resource-intensive manual workflow.
The Story of our Partner
Our partner is a leading provider of AI-powered cloud software for the alternative investments ecosystem. For over 20 years, the company has partnered with general partners, limited partners, and service providers across private equity, venture capital, hedge funds, real estate, and infrastructure to deliver a comprehensive platform connecting front-to-back office operations in one unified system.
Trusted by thousands of customers worldwide, our partner gives investment teams everything they need to streamline workflows and make more informed decisions.
The Challenge
Each of our partner’s managed funds produces detailed quarterly reports in PDF format containing granular financial data: balance sheets, investment breakdowns, performance metrics, and schedule of investments. Clients rely on this data to stay informed, and delivering it accurately and on time is a core service obligation.
To meet this, the financial service provider maintained a dedicated internal department that manually read through each PDF, extracted the relevant figures, and populated Excel spreadsheets for distribution. They were doing an admirable job, but the manual process was naturally slower, more resource-intensive and prone to human error. Given the sensitivity of financial data, they recognised an opportunity to perfect their data processing.
The Solution
Discovery and Proof of Concept
We started on-site: shadowed the manual team, analyzed live PDFs, and gathered stakeholder input. We extracted our partner’s business rules and domain knowledge, then delivered an extensive report with requirements, architecture, benchmarks, and roadmap.
Building the Pipeline
We built the core solution as a Python-based document processing pipeline. It takes a fund PDF, scans it page by page, identifies the sections that matter (balance sheets and investor schedules), extracts and summarises the data, and writes the output to a structured Excel file.
Deduplication and Business Rules
Funds often split data across multiple PDFs, so we added intelligent deduplication: contextual reconciliation (add, subtract, or discard duplicates). A post-processing layer then applies formalized business rules - running validations, ensuring consistency, and auto-filling gaps from surrounding data.
Transparency and Evaluation
Our partner is a regulated financial institution, and they were understandably cautious about handing a team significant access to their systems and data. We responded by keeping them close throughout the process: weekly live demos, full visibility into what we were building and why.
We also built an automated evaluation framework into the pipeline itself. Every time the system generates a report, it compares the output against the corresponding historical report and logs an accuracy score. That gave both teams a running, objective measure of how well the system was performing, not just at delivery but continuously.
The Results
Our work together has resulted in the following outcomes:
- 90%+ accuracy in automated report generation, validated against historical data
- Automation of the data extraction and Excel generation process
- Parallel processing of multiple funds simultaneously
- Intelligent deduplication across multi-PDF fund submissions
- Post-processing validation enforcing business rules and filling data gaps automatically
- Automated accuracy monitoring built into the pipeline
- All processing hosted entirely within our partner’s secure cloud infrastructure
- Automated flagging and notifications when a data discrepancy cannot be resolved
