Learning Grafana, Loki, and Prometheus: Building Observability Into My Side Projects Before I Needed It

I'm a solo developer working on two Node.js applications: Oggram and Surge. Neither has a massive user base. The decision to implement a proper observability stack was proactive — a deliberate choice to build good habits now rather than scramble to retrofit them later.

In my day job as a support engineer, I work with Grafana regularly. The product I support — a document viewer — ships logs that are designed to be reviewed in Grafana, and I use it to help debug customer issues, trace through log events, and understand what was happening in a user's session when something went wrong. I knew Grafana as a powerful tool, what I didn't know was how to build the pipeline behind it — how data actually gets collected, stored, and made queryable. Implementing Loki and Prometheus on my own projects was my chance to understand the full picture.

This is a write-up of that learning process: what I set up, what I didn't understand at first, and what clicked along the way.

The Stack: Prometheus, Loki, and Grafana

Before diving into implementation, it's worth laying out what each tool actually does, because I had to get clear on this before things started making sense.

Prometheus is a metrics collection system. It operates on a pull model — you configure it with a list of targets, and it periodically reaches out to each one to scrape numerical data. Things like how many requests your server has handled, how long those requests took, how much memory your process is consuming, or how many errors you've returned. This data is stored as time-series — stamped snapshots of a number at a point in time — and queried using a language called PromQL.

Loki is a log aggregation system, and it's where my prior Grafana experience mapped most cleanly. In my support work, I was always on the consuming side of Loki — reading logs that had already been collected and shipped. Setting up Loki meant stepping to the other side and understanding how those logs get there in the first place. Loki pairs with an agent called Promtail that runs alongside your application, watches your log files, and ships entries to the Loki server. Critically, Loki doesn't index log content — only the labels attached to each log stream. This makes it lightweight and fast compared to heavier solutions like Elasticsearch.

Grafana is the visualization and alerting layer that sits on top of everything. It can connect to both Prometheus and Loki as data sources and present their data in unified dashboards. This is the part I was already familiar with from work — building panels, writing queries, interpreting results. Getting comfortable with the collection and storage side of the stack was the real learning curve for me personally.

The three tools are conceptually a matched set. Prometheus answers "what is my system doing numerically right now?" Loki answers "what was my application saying when that happened?" Grafana answers "show me all of that together, visually, in one place."

Getting Prometheus Running

My first hands-on step was getting Prometheus scraping metrics from my Node.js applications.

The instrumentation side is done through a library called prom-client. You import it into your Node.js application, and it exposes a /metrics endpoint that Prometheus can poll. Out of the box, prom-client gives you a useful set of default metrics with almost no configuration — event loop lag, garbage collection stats, heap memory usage, and more. For a first pass, this alone is meaningful visibility you wouldn't otherwise have.

const client = require('prom-client');
const collectDefaultMetrics = client.collectDefaultMetrics;
collectDefaultMetrics();

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.end(await client.register.metrics());
});

I also defined a few custom metrics specific to each application. For Oggram and Surge, I added an HTTP request duration histogram and a request counter, broken out by route, status code, and method. This is a common pattern and there are middleware packages that handle it automatically, but I wired it up manually at first just to understand what was being recorded and why.

const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.05, 0.1, 0.3, 0.5, 1, 2, 5]
});

On the Prometheus side, configuration lives in a prometheus.yml file. You define scrape jobs that tell Prometheus where to find your /metrics endpoints and how often to poll them. For local development I ran Prometheus as a Docker container, which kept the setup self-contained and easy to iterate on.

One thing that took me a moment to internalize: Prometheus's pull model means Prometheus reaches out to your app, not the other way around. Coming from a logging background where applications push data somewhere, this felt backwards at first. But it makes sense once you understand it — the collection schedule is owned and controlled centrally, and you can scrape any service that exposes a compatible endpoint without modifying how the service itself behaves.

Setting Up Loki and Promtail

This is where my support engineering background created an interesting reversal. In my job, I look at logs in Grafana every week. But I'd never thought much about how they got there. Setting up Loki and Promtail made that pipeline concrete.

Promtail is an agent you run on the same host as your application. Its job is simple: watch log files or other log sources, apply labels, and forward entries to Loki. The Promtail configuration specifies what to watch and how to label it:

scrape_configs:
  - job_name: oggram
    static_configs:
      - targets:
          - localhost
        labels:
          app: oggram
          env: production
          __path__: /var/log/oggram/*.log

  - job_name: surge
    static_configs:
      - targets:
          - localhost
        labels:
          app: surge
          env: production
          __path__: /var/log/surge/*.log

Labels are the key organizational concept in Loki. Every log stream is identified by its set of labels, and those labels are what you filter on in LogQL queries. It's a more constrained model than full-text indexing, but for the kind of debugging I do — "show me all error logs from Oggram in the last hour" or "find every log line associated with this request ID" — it works extremely well.

Before setting this up, I also took the time to standardize the log format across both Oggram and Surge. Both applications now emit structured JSON logs, each line containing a timestamp, log level, a message, and relevant context fields like request ID and route. This wasn't a huge change — Node.js logging libraries like pino or winston make structured logging straightforward — but it made a meaningful difference in how queryable the logs became.

Unstructured logs in Loki are still searchable, but structured logs unlock the ability to extract fields on the fly using LogQL's JSON parser. Instead of grepping for patterns, you can filter directly on level="error" or route="/api/documents" as if they were native fields. Once you see how much cleaner this is, it's hard to go back.

Building Dashboards in Grafana

Once Prometheus and Loki were both feeding data, Grafana was the most immediately familiar part of the stack for me — and also where my day job experience paid off the most.

In my support work, I read dashboards that other people built. Now I was building them myself, and it's a different skill. You have to make decisions about what matters, what belongs together, and how to arrange information so that someone (even if that someone is just future me) can load a dashboard and immediately understand what's happening.

I built a main overview dashboard for each application showing the metrics I care most about: request volume, error rate, response time distribution, and memory usage. These four together give a good snapshot of application health at a glance. Below those, I have a log panel pulling from Loki that shows recent log entries for that application, with a variable filter for log level so I can quickly narrow to errors or warnings.

The part that has been most useful in practice is having metrics and logs in the same dashboard, synchronized on the same time range. I'm still in early stages with these applications, so I haven't had a production incident to debug in anger. But I've used the setup during development to trace through what was happening in the application during test runs, which is already helping me build better intuitions about performance and behavior.

Having a consumer's perspective on Grafana — understanding that whoever is looking at these dashboards needs to quickly orient themselves and find the signal in the noise — informed a lot of my design decisions. Things like clear panel titles, sensible time range defaults, and not cluttering dashboards with every metric just because you can.

What I've Taken Away From This

A few reflections after going through this process:

Understanding the full stack changes how you use the tools. In my job I use Grafana to read logs. Having now built the collection pipeline myself, I understand why the logs are structured the way they are, why certain labels exist, and why some queries are fast and others are slow. That context makes me better at my support work, not just at running my own applications. When I open a customer's logs in Grafana now, I have a much clearer mental model of what I'm actually looking at.

Doing this proactively is a different kind of motivation. There's no crisis forcing me to get this right, which means I had the space to actually understand things rather than just make them work. I'd recommend this approach if you have the opportunity — setting up observability when nothing is on fire means you can learn the concepts properly instead of cargo-culting a configuration you don't really understand.

Structured logging is worth the upfront effort. Getting both applications to emit consistent, structured JSON before sending logs to Loki paid immediate dividends in how queryable everything became. If you're starting a new project, define your log format before you write your first log line.

The stack is approachable for a solo developer. I went into this expecting it to feel heavyweight and complex. The reality is that for one developer running a couple of Node.js applications, the setup is manageable, the documentation is solid, and the tooling has clearly been designed with composability in mind. You don't need a team or dedicated infrastructure experience to get this running — you need an afternoon and a willingness to read the docs.

Oggram and Surge aren't big applications yet. But they'll be better applications for having observability built in from the beginning — and I'll be a better developer and support engineer for understanding how all the pieces fit together.