1 minute read

In many actuarial workflows, consolidating annual claims exports is a manual, error-prone task. I built this R-based pipeline to automate the aggregation of disparate Excel files into structured loss development triangles.

The Objective

The project focuses on creating a reproducible, auditable environment for:

  • ETL: Automatically extracting data from multiple years of “dirty” Excel claims listings.
  • Triangulation: Transforming transactional data into Accident Year vs. Development Period triangles.
  • Projection: Calculating weighted average link ratios to identify loss development trends.

The Tech Stack

Component Tool
Language R (Tidyverse)
Data Extraction Readxl, Here
Processing Plyr (ldply)
Actuarial Methods Chain Ladder / Age-to-Age

Challenges Faced

  • Inconsistent Schemas: Historical claims exports often featured varying column headers and metadata; I had to engineer a flexible mapping function to standardize these during ingestion.
  • Large-Scale Merging: Using traditional loops for hundreds of claims files was inefficient, requiring a shift toward functional programming and plyr for faster execution.

What I Learned

  • Scalable Data Pipelines: I learned how to build a “drop-in” system where new data files automatically update the entire analysis without code changes.
  • Reproducible Reserving: Developing this script reinforced the importance of script-based workflows over spreadsheets to ensure a transparent audit trail for actuarial peer review.

🔗 GitHub Repository

Updated: