Personal Project

Parcours Analytics: A Web Analytics Platform

A production-style website analytics app built with Django, FastAPI, PostgreSQL, Docker, and AWS

Project Summary

Parcours Analytics is a self-hosted web analytics platform I designed and built to track visitor behavior across websites without relying on a third-party analytics SaaS.

The project combines a lightweight browser tracking script, a multi-service backend, an authenticated dashboard, and a background processing pipeline for turning raw visitor activity into useful analytics. It tracks things like visits, page views, referrers, devices, countries, scroll depth, bounce rate, and content performance.

I built Parcours as both a practical analytics tool and a production-style engineering project. It gave me a chance to design a real event ingestion pipeline, operate multiple containerized services, separate application and analytics workloads, and host the system in AWS.

Why I Wanted to Build This

I built Parcours Analytics because Google Analytics felt far more complicated than what I actually wanted from a website analytics tool. But basically I thought it would be fun.

For many small sites, blogs, and content projects, the important questions are straightforward:

How many people visited?
Where did they come from?
What pages did they read?
How long did they stay,
And what content performed well.

Google Analytics can answer those questions, but its just too complicated to use. It feels like using an enterprise marketing platform when all you need is clear data on your sites traffic.

I really liked how simpler analytics products like Clicky and Fathom are built. These influenced the direction of the project. One thing I especially appreciated was how fast their dashboards feel.

For Parcours, I wanted the same kind of experience: open the dashboard, get useful numbers immediately, and avoid making the user wait while the system figures out basic traffic data. Computers should do the work for us, not turn simple questions into a loading screen.

I wanted a project that was more substantial than a typical CRUD app and closer to a real production system.

The other reason I built it was technical. Web analytics has interesting engineering problems: collecting high-volume browser events, validating tracked properties, buffering ingestion, enriching raw events, assigning sessions, aggregating data, and presenting it back through a dashboard.

I tried hard not to over-engineer upfront, but I knew that breaking everything out as services now would pay dividends later.

How Parcours Is Put Together

Parcours Analytics is built as a small multi-service application. I split the system into separate components because analytics traffic has different workloads: collecting events should be fast and lightweight, processing events can happen asynchronously, and dashboard queries should be isolated from the ingestion path.

At a high level, the system looks like this:

Parcours JavaScript tracker runs in the tracked website
nginx
Metrics ingestion API
File-backed event buffer
Background worker
Metrics PostgreSQL database
Dashboard API
Django dashboard

The public entry point is nginx. It acts as the reverse proxy for the application and routes requests to the correct internal service. Normal dashboard traffic goes to Django, tracking events from the browser go to the metrics ingestion API, dashboard charts make requests to the dashboard API, and generated browser tracking scripts are served as static JavaScript files.

Django is responsible for the user-facing application. It handles user accounts, authentication, web property management, and the dashboard pages. When a user adds a new website, Django creates a unique property ID and generates a custom tracking script for that property. The user installs the Parcours tracking plug-in on their WordPress site which fetches the Javascript hosted by nginx on each WordPress page load.

The metrics ingestion API is a FastAPI service that receives browser events from tracked websites. It validates that the submitted property ID exists, captures request metadata such as IP address and user agent, sanitizes the incoming payload, and writes the event to a file-backed buffer.

I intentionally kept this service lightweight so that the public tracking endpoint does as little work as possible. It just captures the API call and writes the JSON to disk.

A Python worker runs separately in the background. It wakes up every x minutes to read new .JSON event files, enriches the data, writes raw events into the metrics database, assigns events to sessions, and builds derived records used by the dashboard. This includes things like visitor sessions, scroll-depth data, referrer information, device/browser details, country lookup, and content metadata.

The dashboard API is a second FastAPI service focused only on reporting queries. Django renders the authenticated dashboard pages, but the charts and tables call this API for analytics data. The dashboard API validates the user’s Django session, checks that the user owns the requested web property, and then queries the metrics database for visitor counts, referrers, devices, countries, landing pages, visitor journeys, and content performance.

The data layer is split into two PostgreSQL databases. The Django database stores application data such as users, sessions, and web properties. The metrics database stores analytics data such as raw visitor events, processed sessions, scroll events, referrer information, and aggregate reporting tables. This separation keeps application state and analytics workloads from interfering with each other and makes the service boundaries clearer.

In production, the services run as Docker containers on AWS, with nginx in front, persistent volumes for PostgreSQL data, and secrets loaded from AWS Systems Manager Parameter Store. This gave me a deployment model that was close to how I would approach a real internal service: isolated containers, explicit service boundaries, durable storage, and externalized configuration.

Data Collection Flow

The data collection flow starts when a user adds a website inside the Parcours dashboard. Django creates a new WebProperty record for that site and assigns it a unique property_id. That property_id becomes the key that ties the tracked website, incoming browser events, stored metrics, and dashboard queries together.

When the property is created, Django also generates a custom JavaScript tracking file for it. The script is based on a template, with the property ID and metrics endpoint injected into the final file. nginx serves these generated scripts from a static route, so a tracked site can load a script like:

<script defer src="https://app.useparcours.com/livelongandprosper/<property_id>.js"></script>

I’m a huge Star Trek nerd so I had to sprinkle a little bit of Spock in that URL. 🖖🏻

Once installed on a website, the tracker sends events back to Parcours using the browser’s sendBeacon API. It records basic page and engagement events, including:

page_view when the page loads
page_exit when the visitor leaves
ping after the visitor has stayed on the page for a period of time
scroll depth as the visitor moves through the page

Each event includes information such as the property ID, visitor ID, page URL, page title, referrer, browser language, locale, duration on page, scroll depth, and timestamp. The visitor ID is generated in the browser from a lightweight fingerprint using values like user agent, screen size, language, timezone, hardware concurrency, and device memory.

For WordPress sites, the Parcours plugin adds extra page metadata before loading the tracking script. This includes the page type, author, categories, tags, and WordPress page flags like whether the page is a home page, single post, a page, or an archive. That allows Parcours to report not only on URLs, but also on WordPress-specific content structure and topic performance.

The incoming events are sent to the /metrics endpoint. nginx forwards those requests to the FastAPI metrics ingestion service and passes along useful request metadata, including the original IP address and user agent. The ingestion API validates the submitted property_id against the Django database before accepting the event, so random or invalid property IDs are rejected.

After validation, the ingestion service then sanitizes the payload. It cleans the referrer URL, converts the reported scroll depth to a valid percentage, captures the visitor IP address, and records the event timestamp in UTC. Instead of writing directly to PostgreSQL, it writes each accepted event as a JSON file into a shared visitor_data directory. Though the IP is stored in the DB, its only used for geo lookup to identify the visitor’s country.

That file-backed buffer is an intentional design choice. It keeps the public tracking endpoint fast and reduces the amount of synchronous work required during event collection. The ingestion API’s job is simply to validate, clean, and durably stage the event. The heavier work, such as user-agent parsing, GeoIP lookup, bot detection, database inserts, grouping events into visitor sessions, and building rollups, is handled later by the background worker.

Event Processing Pipeline

After events are accepted by the metrics ingestion API, they aren’t written directly into the analytics database. Instead, each event is stored as a JSON file in a shared visitor_data directory. A separate Python worker is responsible for turning those staged files into structured analytics data.

The worker runs periodically and processes event files in modification-time order. For each JSON file, it validates that the required fields are present, parses the payload, and extracts optional fields such as scroll depth, page type, author, categories, tags, and WordPress page flags. If the file is malformed or missing required data, the worker leaves it in place so it can be inspected instead of silently dropping the event.

Once an event file is parsed, the worker enriches it before inserting it into PostgreSQL. This enrichment includes:

parsing the user agent into browser and operating system fields
normalizing the referrer into a referrer domain
detecting likely bot traffic
looking up the visitor’s country from their IP address using GeoIP
preserving page metadata such as title, page type, categories, tags, and author
carrying through scroll-depth and duration data

The enriched event is then written into the metrics database as the system’s canonical raw event record. It stores the original tracking data alongside derived fields such as browser, operating system, country, referrer domain, bot status, content metadata, duration, and scroll depth. This gives the reporting layer a reliable source of raw analytics data while still preserving enough context for filtering, debugging, and future aggregation.

After a file is successfully written to the database, the worker moves it into an archive_data directory. This gives the system a simple audit trail and a practical recovery path: I can basically replay processed event files if needed. If a database write fails, the file remains in visitor_data and can be retried on the next worker run.

Once raw events are inserted, the worker performs additional processing passes. One major step is session assignment. Since browser events arrive independently, the worker needs to group events into sessions using the visitor ID and a 30-minute inactivity window. Events from the same visitor are stitched into the same session unless the time gap between events exceeds 30 minutes, in which case a new session ID is created.

For example, if a visitor clicks a few pages, leaves for an hour and returns to visit more pages, that’s counted as 2 different sessions.

The worker then builds higher-level session records from those grouped events. Each session includes start and end time, duration, total events, pageview count, bounce status, entry page, exit page, referrer domain, country, browser, operating system, locale, and entry page type.

This gives the dashboard a clean session-level view without having to recompute sessions from raw events on every request.

The pipeline also extracts specialized reporting data. For example, scroll-depth events are stored separately so the dashboard can report how far visitors read on each page. The worker also syncs each web property’s timezone into the metrics database, which allows reporting queries to calculate day and hour boundaries in the site owner’s local timezone.

The overall goal of the pipeline is to keep collection simple and fast while moving expensive work into the background. The ingestion API only validates and stages events. The worker handles enrichment, database writes, sessionization, and derived reporting tables. This makes the system easier to operate because failures in processing don’t immediately break event collection, and each stage has a clear responsibility.

How the Dashboard Gets Its Data

The dashboard side of Parcours is split between Django and a separate FastAPI service. Django owns the authenticated web application: user login, signup, password reset, account preferences, web property management, and the HTML dashboard pages. I used pre-existing Django functionality for this — I didn’t want to reinvent the wheel in this are. The dashboard API owns the analytics queries that power the charts and tables.

This split keeps the responsibilities clear. Django is good at user-facing application concerns like sessions, templates, forms, and authentication. The FastAPI dashboard service is focused on reading from the metrics database and returning structured JSON for the dashboard UI.

When a user logs in, Django loads the web properties they own and stores that list in the session. The user can then open an overview dashboard, add a new property, edit a property’s settings, or drill into a detailed dashboard for a specific property.

The detailed dashboard page is rendered by Django, but the analytics data itself is loaded through API calls. Browser-side JavaScript calls the dashboard API through nginx under the /dashboard/... route. nginx forwards those requests to the FastAPI dashboard service.

Before returning analytics data, the dashboard API performs two important checks. First, it validates the Django session cookie by looking up the session in the Django database. Second, it verifies that the requested property_id belongs to the authenticated user. This prevents a user from querying analytics for a property they don’t own, even if they know or guess another property ID.

The dashboard API then queries the metrics database for the requested report. It exposes endpoints for visitor counts, browser breakdowns, device breakdowns, locale and country data, traffic sources, visitor lists, visitor timelines, top referrers, landing pages, content lists, most viewed pages, average time on page, scroll depth, and topic performance by category or tag.

Example call for visitor counts:

{
  "requested_range": "today",
  "requested_data": {
    "total_visitors": 25,
    "total_actions": 25,
    "avg_time_on_site_seconds": 29,
    "avg_bounce_rate": 75.65,
    "total_visitors_change": -11,
    "total_actions_change": -11,
    "avg_time_on_site_seconds_change": 61,
    "avg_bounce_rate_change": -8
  },
  "related_range": "yesterday",
  "related_data": {
    "total_visitors": 28,
    "total_actions": 28,
    "avg_time_on_site_seconds": 79,
    "avg_bounce_rate": 89.29
  },
  "errors": []
}

Most of these queries are built around pre-processed analytics records created by the worker. This means the dashboard doesn’t have to reconstruct every metric from raw browser events on each page load. Some reports use session-level or pre-aggregated data for speed, while other reports can still drill into the underlying raw events when needed.

The dashboard API also handles common reporting concerns like date ranges, comparison periods, filters, time zones, referrer grouping, browser/device grouping, and percentage-change calculations. This keeps the Django templates focused on presentation while the API service owns the reporting logic.

Database Design

Parcours uses two PostgreSQL databases: one for application data and one for analytics data. I made that split because the two workloads are very different. The application database stores relatively small, transactional data such as users, sessions, and web properties. The metrics database stores higher-volume event data and derived analytics tables.

The Django database is the source of truth for user-facing application state. It stores user accounts, authentication sessions, password reset state, and web property records. The most important application-level table is the web property table, which maps a user to a tracked website and its generated property_id. That property_id is the identifier used throughout the rest of the system.

The metrics database stores the output of the analytics pipeline.

Once an event is accepted by the ingestion API, the worker enriches it with details like browser, operating system, country, referrer, bot status, content metadata, duration, and scroll depth, then stores it for reporting.

The worker also turns those raw events into cleaner session-level records. That way, the dashboard does not have to rebuild every visit from scratch each time it loads. It can work with ready-to-query information like visit duration, pageview count, bounce status, entry page, exit page, referrer, country, browser, operating system, locale, and entry page type.

Some data is stored in a more specialized form for specific reports. Scroll-depth data, for example, is kept in a way that makes it easy to show how far visitors read on each page. Property timezones are also synced into the analytics side of the system so reports like “today,” “yesterday,” and hourly charts line up with the site owner’s local timezone.

The database permissions are split by service. Django owns user and web property data. The ingestion API only needs to check that a property exists. The worker writes processed analytics data. The dashboard API reads analytics data and checks user sessions and property ownership. That keeps each service limited to the access it actually needs.

This design gives Parcours both a raw event history and query-friendly reporting tables. The raw table is useful for debugging, replaying, and building new reports later. The derived tables make the dashboard faster and simpler because common concepts like sessions, scroll events, and hourly totals are already materialized.

AWS And Deployment

Parcours is deployed on AWS as a Dockerized application running on a single EC2 instance. I kept the first production version intentionally simple: one host runs the full stack with Docker Compose, including the web app, APIs, worker, nginx, and PostgreSQL containers.

Even though it currently runs on one instance, the services are separated in a way that makes future scaling more straightforward. The ingestion API, dashboard API, Django app, and worker all run as distinct containers, so they could be moved into ECS and scaled independently later.

That matters most for the analytics pipeline. If traffic increased, I could scale the ingestion API separately from the dashboard, and run additional worker containers to process events faster. The current deployment keeps the operational model simple while still leaving a clear path toward horizontal scaling with ECS.

WordPress Integration

I also built a small WordPress plugin for Parcours so that a site owner can install tracking without manually editing theme files. The plugin adds a settings page where the user enters their Parcours property ID, then injects the correct tracking script into public pages.

The plugin also adds a small JSON metadata block alongside the script. That metadata includes WordPress-specific context such as page type, author, categories, tags, and whether the current page is a post, page, home page, or archive. The browser tracker includes this data with each event, which lets Parcours report on content performance by category, tag, author, and page type instead of only reporting raw URLs.

Security And Privacy Considerations

Parcours ties analytics data to authenticated users and the web properties they own. Dashboard requests are checked before analytics data is returned, and incoming tracking events are validated before they are accepted.

Secrets are kept outside the repository and loaded at runtime from AWS Systems Manager Parameter Store. Service access is also separated so each part of the system only has the permissions it needs.

The ingestion API sanitizes incoming data before it is processed, including referrer URLs and scroll-depth values. Parcours is also focused on first-party analytics rather than sending visitor behavior to a third-party analytics platform.

There is still more I would improve here over time, including IP anonymization, stronger retention controls, rate limiting, and clearer user-facing privacy settings.

Technical Challenges

The hardest technical challenge was getting time-based reporting to feel correct.

Parcours receives events from visitors anywhere in the world, but the dashboard needs to reflect the site owner’s timezone. “Today” should mean today for the person reading the dashboard, not UTC and not the visitor’s local time.

That sounds simple, but it affects almost every report: today, yesterday, hourly charts, last 7 days, and comparison periods. Getting those boundaries right was one of the trickier parts of making the analytics feel trustworthy instead of just technically correct.

What I Learned

This project reinforced how important clear service boundaries are in a production-style application. Keeping ingestion, processing, dashboard rendering, and dashboard queries separate made the system easier to reason about and gave each component a specific failure mode.

I also learned that analytics systems are much more about data modeling than charts. Raw events are useful, but dashboards need sessions, time ranges, rollups, referrer groups, scroll-depth records, and timezone-aware reporting to feel accurate and useful.

The biggest practical lesson was around time. Storing events in UTC is the right foundation, but user-facing reports still need to be calculated in the property owner’s timezone. Getting that right affected the worker, the database design, and the dashboard query layer.

Technologies

AWS and Docker
Python, Django, FastAPI
PostgreSQL

Visit Parcours