Project Type: Actuarial Data Science | Loss Reserving | P&C Insurance | ETL Pipeline
Tools Used: R, Tidyverse, Plyr, Readxl, Here
Dataset: Simulated Multi-Year Auto Liability Claims Data

Link to Project Documentation

Objective

Automate the process of aggregating annual claims exports into a structured format for actuarial analysis. The goal was to build a reproducible R environment that extracts key variables from disparate Excel files to generate loss development triangles and calculate historical age-to-age factors.

Key Features

  • Dynamic File Ingestion:
    • Utilized the here and list.files packages to create a flexible file path system that automatically detects and loads new Excel claims listings without manual path updates.
    • Capable of processing hundreds of files simultaneously, ensuring scalability for large-scale loss runs.
  • Vectorized Data Extraction:
    • Engineered a custom function, extract_needed_excel_data, to selectively pull only necessary columns (file_year, accident_year, paid) from raw exports.
    • Implemented vectorized operations to avoid the performance overhead of traditional โ€˜forโ€™ loops, significantly reducing processing time for large datasets.
  • Efficient Data Merging:
    • Leveraged the plyr package and ldply (List-to-Data-Frame) to seamlessly combine distinct annual snapshots into a single, analysis-ready master dataframe.
  • Actuarial Triangle Construction:
    • Developed logic to transform transactional claims data into standard actuarial triangles.
    • Calculated weighted average link ratios (Age-to-Age factors) to analyze loss development trends over time.

Technical Highlights

  • Functional Programming Approach: Prioritized functional programming and mapping over iterative loops to ensure the pipeline remains performant and readable.
  • Data Normalization: Handled inconsistent column names and metadata across different file years to maintain data integrity during the merge process.
  • Reproducibility: The entire workflow is scripted in R, allowing for an auditable trail from raw โ€œdirtyโ€ data to the final reserving output, meeting actuarial peer-review standards.

Usage

To generate the triangles, simply drop new Excel claims listings into the designated source folder and run the master script. The system dynamically updates all calculations based on the most recent file year detected.

๐Ÿ”— View Blog Post: Link
๐Ÿ”— View Code on GitHub: GitHub Link

Updated: