Automating Loss Triangle Construction in R
In many actuarial workflows, consolidating annual claims exports is a manual, error-prone task. I built this R-based pipeline to automate the aggregation of disparate Excel files into structured loss development triangles.
The Objective
The project focuses on creating a reproducible, auditable environment for:
- ETL: Automatically extracting data from multiple years of “dirty” Excel claims listings.
- Triangulation: Transforming transactional data into Accident Year vs. Development Period triangles.
- Projection: Calculating weighted average link ratios to identify loss development trends.
The Tech Stack
| Component | Tool |
|---|---|
| Language | R (Tidyverse) |
| Data Extraction | Readxl, Here |
| Processing | Plyr (ldply) |
| Actuarial Methods | Chain Ladder / Age-to-Age |
Challenges Faced
- Inconsistent Schemas: Historical claims exports often featured varying column headers and metadata; I had to engineer a flexible mapping function to standardize these during ingestion.
- Large-Scale Merging: Using traditional loops for hundreds of claims files was inefficient, requiring a shift toward functional programming and
plyrfor faster execution.
What I Learned
- Scalable Data Pipelines: I learned how to build a “drop-in” system where new data files automatically update the entire analysis without code changes.
- Reproducible Reserving: Developing this script reinforced the importance of script-based workflows over spreadsheets to ensure a transparent audit trail for actuarial peer review.