Personal Project

Parcours Analytics: A Self-Hosted Web Analytics Platform

A production-style analytics app built with Django, FastAPI, PostgreSQL, Docker, and AWS

Parcours Analytics: A Self-Hosted Web Analytics Platform

Project Summary

Parcours Analytics is a self-hosted web analytics platform I designed and built to track visitor behavior across websites without depending on a third-party analytics SaaS. It collects browser events through a lightweight JavaScript tracker, validates and buffers incoming traffic through a FastAPI ingestion service, processes events asynchronously with a Python worker, stores raw and derived analytics in PostgreSQL, and presents the results through an authenticated Django dashboard.

The system is intentionally split into distinct services: nginx acts as the public reverse proxy, Django owns users and web property management, one FastAPI service handles metrics ingestion, another FastAPI service serves dashboard query APIs, and a background worker performs enrichment, sessionization, and rollups. This architecture separates user-facing dashboard traffic from write-heavy analytics ingestion and slower background processing.

From an SRE perspective, Parcours is built around production-style concerns: containerized services, isolated database roles, separate application and metrics databases, persistent volumes, AWS-hosted deployment, secrets loaded from AWS Systems Manager Parameter Store, and a file-backed event buffer between ingestion and processing. The project demonstrates full-stack application design, backend data pipeline design, and practical operational thinking for hosting a multi-service web application in AWS.

Why I Built It

I built Parcours Analytics because Google Analytics felt far more complicated than what I actually wanted from a website analytics tool. For many small sites, blogs, and content projects, the important questions are straightforward: how many people visited, where did they come from, what pages did they read, how long did they stay, and what content performed well. Google Analytics can answer those questions, but it often feels like using an enterprise marketing platform when all you need is clear traffic and engagement data.

I also looked at simpler analytics products like Clicky and Fathom, which influenced the direction of the project. I liked the idea of a focused analytics tool with a smaller, more understandable feature set: fast dashboards, useful visitor/session data, referrers, countries, devices, and content performance without requiring a lot of configuration.

The other reason I built it was technical. I wanted a project that was more substantial than a typical CRUD app and closer to a real production system. Web analytics has interesting engineering problems: collecting high-volume browser events, validating tracked properties, buffering ingestion, enriching raw events, assigning sessions, aggregating data, and presenting it back through a dashboard. It gave me a practical way to design a multi-service application and operate it in AWS using patterns I care about as a site reliability engineer.

System Architecture

Parcours Analytics is built as a small multi-service application. I split the system into separate components because analytics traffic has different workloads: collecting events should be fast and lightweight, processing events can happen asynchronously, and dashboard queries should be isolated from the ingestion path.

At a high level, the system looks like this:

Tracked Website
-> Parcours JavaScript Tracker
-> nginx
-> Metrics Ingestion API
-> File-backed Event Buffer
-> Background Worker
-> Metrics PostgreSQL Database
-> Dashboard API
-> Django Dashboard

The public entry point is nginx. It acts as the reverse proxy for the application and routes requests to the correct internal service. Normal dashboard traffic goes to Django, tracking events go to the metrics ingestion API, dashboard data requests go to the dashboard API, and generated browser tracking scripts are served as static JavaScript files.

Django is responsible for the user-facing application. It handles user accounts, authentication, web property management, and the dashboard pages. When a user adds a new website, Django creates a unique property ID and generates a custom tracking script for that property. That script is then served by nginx and installed on the tracked website.

Data Collection Flow

The data collection flow starts when a user adds a website inside the Parcours dashboard. Django creates a new WebProperty record for that site and assigns it a unique property_id. That property_id becomes the key that ties the tracked website, incoming browser events, stored metrics, and dashboard queries together.

When the property is created, Django also generates a custom JavaScript tracking file for it. The script is based on a template, with the property ID and metrics endpoint injected into the final file. nginx serves these generated scripts from a static route, so a tracked site can load a script like:

<script defer src="https://app.useparcours.com/livelongandprosper/<property_id>.js"></script>

Once installed on a website, the tracker sends events back to Parcours using the browser’s sendBeacon API. It records basic page and engagement events, including:

Each event includes information such as the property ID, visitor ID, page URL, page title, referrer, browser language, locale, duration on page, scroll depth, and timestamp. The visitor ID is generated in the browser from a lightweight fingerprint using values like user agent, screen size, language, timezone, hardware concurrency, and device memory.

For WordPress sites, the Parcours plugin adds extra page metadata before loading the tracking script. This includes the page type, author, categories, tags, and WordPress page flags such as whether the page is a home page, single post, page, or archive. That allows the analytics system to report not only on URLs, but also on content structure and topic performance.

Incoming events are sent to the /metrics endpoint. nginx forwards those requests to the FastAPI metrics ingestion service and passes along useful request metadata, including the original IP address and user agent. The ingestion API validates the submitted property_id against the Django database before accepting the event, so random or invalid property IDs are rejected.

After validation, the ingestion service normalizes and sanitizes the payload. It cleans the referrer URL, bounds scroll depth to a valid percentage, captures the visitor IP address, and records the event timestamp in UTC. Instead of writing directly to PostgreSQL, it writes each accepted event as a JSON file into a shared visitor_data directory.

That file-backed buffer is an intentional design choice. It keeps the public tracking endpoint fast and reduces the amount of synchronous work required during event collection. The ingestion API’s job is simply to validate, clean, and durably stage the event. The heavier work, such as user-agent parsing, GeoIP lookup, bot detection, database inserts, sessionization, and rollups, is handled later by the background worker.

Event Processing Pipeline

Dashboard And Query Layer

Database Design

AWS And Deployment

Reliability-Oriented Design Choices

WordPress Integration

Security And Privacy Considerations

Technical Challenges

What I Learned

Future Improvements

Technologies

  • AWS Elastic Container Service (ECS)
  • Linux
  • Golang, gRPC, Protocol Buffers
  • QuickBooks, SAGE50

Key Takeaway: We reduced deployment complexity and improved reliability by helping replace a customer-managed networking solution with a more secure cloud-connected architecture.