Auto Liability Loss Triangle Generator
Project Type: Actuarial Data Science | Loss Reserving | P&C Insurance | ETL Pipeline
Tools Used: R, Tidyverse, Plyr, Readxl, Here
Dataset: Simulated Multi-Year Auto Liability Claims Data
Objective
Automate the process of aggregating annual claims exports into a structured format for actuarial analysis. The goal was to build a reproducible R environment that extracts key variables from disparate Excel files to generate loss development triangles and calculate historical age-to-age factors.
Key Features
- Dynamic File Ingestion:
- Utilized the
hereandlist.filespackages to create a flexible file path system that automatically detects and loads new Excel claims listings without manual path updates. - Capable of processing hundreds of files simultaneously, ensuring scalability for large-scale loss runs.
- Utilized the
- Vectorized Data Extraction:
- Engineered a custom function,
extract_needed_excel_data, to selectively pull only necessary columns (file_year,accident_year,paid) from raw exports. - Implemented vectorized operations to avoid the performance overhead of traditional โforโ loops, significantly reducing processing time for large datasets.
- Engineered a custom function,
- Efficient Data Merging:
- Leveraged the
plyrpackage andldply(List-to-Data-Frame) to seamlessly combine distinct annual snapshots into a single, analysis-ready master dataframe.
- Leveraged the
- Actuarial Triangle Construction:
- Developed logic to transform transactional claims data into standard actuarial triangles.
- Calculated weighted average link ratios (Age-to-Age factors) to analyze loss development trends over time.
Technical Highlights
- Functional Programming Approach: Prioritized functional programming and mapping over iterative loops to ensure the pipeline remains performant and readable.
- Data Normalization: Handled inconsistent column names and metadata across different file years to maintain data integrity during the merge process.
- Reproducibility: The entire workflow is scripted in R, allowing for an auditable trail from raw โdirtyโ data to the final reserving output, meeting actuarial peer-review standards.
Usage
To generate the triangles, simply drop new Excel claims listings into the designated source folder and run the master script. The system dynamically updates all calculations based on the most recent file year detected.
๐ View Blog Post: Link
๐ View Code on GitHub: GitHub Link