Virtual queues from scratch, Part 2: implementing the Lobby Service with Redis

In the first article of this series, we explored the core concepts of virtual queues and made a critical architectural decision: to build our system around a Pull model. This approach, as we discussed, offers greater resilience and efficiency, especially in modern cloud environments.

With our architecture defined, it’s time to write the first lines of code. The heart of our waiting room is the Lobby Service, the entry point for all users. In this post, we’ll bootstrap this service using Python and introduce its most important companion: Redis, which will act as our high-performance, in-memory state manager for the queue.

By the end of this article, you will have:

A running ASGI application for our lobby service.
A Redis client connected and ready to manage data.
The core logic to add a user to the queue and retrieve their status.
A solid, testable foundation for the more advanced features we will build in Part 3.

As mentioned in our tech stack breakdown, we are starting with a minimal ASGI server to keep latency low and using Redis for its speed—two key ingredients for a performant queue.

Let’s dive in and get our lobby up and running.

Part B – The “How”: core queue logic

Part B – The “How”: core queue logic

1. Building the Lobby Service

We expose only three hot‑path endpoints, so we can drop the heavy router, middlewares, and exception hierarchy that frameworks like FastAPI carry. A 50‑line map‑based router plus msgspec structs is enough:

POST /join: issue token, return JoinQueueResponse
GET /position/{token}: return PositionResponse
GET /qsize: return QsizeResponse

import re
from typing import Awaitable, Callable
from typing import Optional
from msgspec.json import Encoder

Handler = Callable[..., Awaitable[None]]

class Router:
    def __init__(self):
        # For just 3 endpoints, a list is faster than a dict
        self.routes = []
        self.encoder = Encoder()
        # pre-encode errors
        self._not_found_body = self.encoder.encode({"detail": "Not found"})
        self._not_found_headers = [(b"content-type", b"application/json"), (b"content-length", str(len(self._not_found_body)).encode()),]
        # ...

    def route(self, method: str, path_pattern: str):
        """ Register a regex route handler"""
        def decorator(fn: Handler):
            self.routes.append((method.upper(), re.compile(f"^{path_pattern}$"), fn))
            return fn
        return decorator

    async def dispatch(self, scope, receive, send):
        """
        Dispatch the request to the target handler, if found, else return HTTP 404.
        HTTP 404 and 500 error messages are pre-encoded.
        """
        method = scope["method"]
        path = scope["path"]
        for route_method, pattern, handler in self.routes:
            if method == route_method:
                match = pattern.match(path)
                if match:
                    # Error handling omitted for brevity
                    return await handler(scope, receive, send, **match.groupdict())
        return await self._not_found(send)

    async def _not_found(self, send):
        await send(
            { "type": "http.response.start", "status": 404, "headers": self._not_found_headers }
        )
        await send(
            {"type": "http.response.body", "body": self._not_found_body}
        )

    async def respond(
        self, send, status_code, data, extra_headers: Optional[list] = None
    ):
        body = self.encoder.encode(data)
        headers = [
            (b"content-type", b"application/json"),
            (b"content-length", str(len(body)).encode()),
        ]

        # Add any extra headers passed in
        if extra_headers:
            headers.extend(extra_headers)

        # HTTP headers and body must be sent separately
        # https://asgi.readthedocs.io/en/latest/specs/www.html
        await send(
            { "type": "http.response.start", "status": status_code, "headers": headers }
        )
        await send(
            { "type": "http.response.body", "body": body }
        )

Hence, we can bootstrap our virtual lobby with the following straightforward design:

Join queue (/join): Users are assigned a unique ticket (token) and placed at the back of the queue.
Queue position (/position/<token>): users periodically check their position in the queue, along with an estimate of their waiting time. This also extends the token time-to-live (TTL).
Admission control: requests can be rejected if the estimated waiting time becomes too large, protecting the backend from overload.
Ticket expiry: users may get distracted, change their mind, or face technical issues. Tokens expire after a certain time if users abandon the queue, ensuring accurate queue management.

Note that as part of the ASGI specification, HTTP headers and body must be sent separately (see code snippet above). Also, we have to implement the lifespan protocol:

from msgspec import Struct
from router import Router

class JoinQueueResponse(Struct):
    id: str
    position: int
    wait: float

class PositionResponse(Struct):
    position: int
    wait: float
    variance: float

class QsizeResponse(Struct):
    len: int

class AsgiLobby:
    def __init__(self):
        self.router: Router = Router()
        # ...

        # Register routes
        self.router.route("POST", "/join")(self.handle_join_queue)
        self.router.route("GET", r"/position/(?P<token>[a-zA-Z0-9\-]+)")(self.handle_position)
        self.router.route("GET", "/qsize")(self.handle_qsize)

    async def __call__(self, scope, receive, send):
        if scope["type"] == "lifespan":
            await self.handle_lifespan(scope, receive, send)
        elif scope["type"] == "http":
            await self.router.dispatch(scope, receive, send)

    async def handle_lifespan(self, scope, receive, send):
        """
        Lifespan protocol https://asgi.readthedocs.io/en/latest/specs/lifespan.html
        """
        while True:
            message = await receive()
            if message["type"] == "lifespan.startup":
                # 1. Compile math code via Numba by triggering the relevant functions
                # 2. Setup Redis connection pools and metrics server
                # 3. Load Lua scripts
                # 4. Initialize lobby state
                # 5. Create background tasks
                await send({"type": "lifespan.startup.complete"})
            elif message["type"] == "lifespan.shutdown":
                # 1. Stop all tasks
                # 2. Close Redis connection pools
                await send({"type": "lifespan.shutdown.complete"})
                break

    # ... handle_join_queue(), handle_position() and handle_qsize() functions

2. Leveraging Redis for managing queue state

Redis excels at fast, atomic operations—perfect for managing our queue state reliably. Here is how we will use Redis in our virtual waiting room:

List (lobby:queue): Tracks users waiting in line. Workers use blocking pop operations to efficiently handle requests.
Hash (lobby:seqmap): Maps tokens to their sequence numbers, enabling quick position lookups.
Sorted Sets:
- Pending (lobby:pending): Maintains an ordered list of active tickets, allowing precise position checks even when some users leave or timeout.
- Expiry (lobby:expiry): Manages ticket expiration timestamps, automatically handling abandoned tickets without expensive full-queue scans.
Counters (lobby:head, lobby:tail): Keep track of queue progress efficiently and consistently.

While these data structures form the core of our state management, a robust, high-performance implementation requires several key optimizations. To reduce latency on frequent reads, we’ll introduce a local cache. To ensure data consistency during writes, we’ll rely on atomic Lua scripts. And to prevent bottlenecks, we’ll isolate traffic with dedicated connection pools.

The diagram below puts all these pieces together, illustrating the complete flow of a user request through our optimized lobby service. You will find a detailed breakdown of how we implement each of these optimizations in the section that follows.

Flow of user requests — Figure 1 - Request flow through the Lobby Service, showing its interaction with Redis data structures and key optimizations like local caching, atomic Lua scripts, and isolated connection pools.

💡 Does the Redis variant matter?

For our demo we use Redis 8, but large-scale systems often consider alternatives. Here is what matters for a virtual lobby:

Feature	Redis 8	KeyDB	Dragonfly	Valkey
`BLPOP`, `RPUSH`, `ZADD`, etc.	✅	✅	✅	✅
Lua scripting / atomic pipelines	✅	✅	✅	✅
`TDIGEST.*` (RedisBloom)	✅ builtin	⚠️ (RedisBloom module)	❌ not yet	❌ not yet
Multi-threaded ops	✅ I/O threading	✅ (all commands)	✅ (core architecture)	✅ I/O threading
Performance at 10k+ ops/sec	✅	✅	✅	✅
Pub/Sub / WebSocket support	✅	✅	✅	✅
Actively maintained	✅ (Redis Inc.)	⚠️ slower/stalled	✅ very active	✅ (Linux Foundation)
License	AGPL v3	BSD-3	BSL 1.1	BSD-3

To summarize:

Redis 8.0+: Ideal choice with built-in t-digest support, simplifying percentile computations (we use them for statistic purposes) without additional modules.
KeyDB: Offers high performance but lacks native t-digest support; integrating RedisBloom may be complex.
Dragonfly: compelling new engine — if t-digest support lands, it could be ideal for virtual queues that need ultra-low tail latency.
Valkey: Potentially viable with module support; ensure necessary modules are available and stable.

3. Implementation notes: optimizing for performance and reliability

Beyond the fundamental data structures, several implementation details are crucial for a robust and performant queue. We can group these into four key areas: ensuring data consistency, accurately estimating wait times, monitoring the system, and managing resources effectively.

3.1. Atomicity and performance in Redis

To maintain data integrity under load, operations must be atomic.

Ensuring atomicity with Lua scripts: Redis pipelines are excellent for performance as they group commands to reduce network round-trips—however they don’t ensure atomicity. For operations that must be atomic, like updating shared counters, we use Lua scripts. Redis guarantees a Lua script will execute without interruption, which is akin to a stored procedure in a relational database, where the entire logical unit of work is processed atomically on the server. This guarantees consistency while keeping our Redis load low.
Caching Lua scripts: To avoid the small overhead of loading a script on every execution, we use the LOAD command at startup to cache required scripts. We then trigger the cached versions using EVALSHA commands, which is much more efficient in high-traffic scenarios.
Local caching for frequently accessed data: To minimize Redis queries under heavy load, we implement a local cache for frequently accessed, relatively stable data such as arrival rates, service rates, or queue size. These values are refreshed periodically (e.g., once per second) by a background task. This approach strikes a balance between data freshness and system performance:
- It significantly reduces the number of Redis queries, lowering network overhead and Redis server load.
- It provides near-instant access to these statistics for rapid decision-making in the application layer.
- The slight delay in data freshness (up to 1 second) is typically negligible for these metrics, especially under high load conditions where large-scale changes are unlikely to occur within sub-second intervals.
- This trade-off between absolute real-time accuracy and system performance is justified, as the benefits of reduced load on Redis and faster local access outweigh the minimal staleness of data in most scenarios.
By implementing this local caching strategy, we ensure that our system can handle high traffic volumes efficiently while maintaining a good approximation of the current state, only incurring a small, acceptable delay in reflecting the most recent changes in the monitored metrics.

Balancing write pressure: For metrics that are self-contained, like service times, we can record them and periodically flush them to Redis in batches using a background task. However, data like inter-arrival times requires reading and updating a shared timestamp atomically, making a per-event Lua script the correct choice to preserve correctness:

-- Lua script updating the interarrival interval statistics
-- Current time
local now = tonumber(ARGV[1])
local last = redis.call("GET", "arr:last_ts")
if not last or now > tonumber(last) then
    if last then
        local delta = now - tonumber(last)
        -- Statistics used to compute ca2
        redis.call("INCRBYFLOAT", "arr:sum", delta)
        redis.call("INCRBYFLOAT", "arr:sum2", delta * delta)
        redis.call("INCR", "arr:count")
    end
    -- Update timestamp of the last arrival
    redis.call("SET", "arr:last_ts", now)
end

3.2. Accurate wait time estimation

Using the right math: The t-digest data structure is excellent for calculating percentiles, but it does not give you the mean. To estimate the average service time, we compute a simple running total in Redis (average = sum / count)

Accounting for traffic variation: Real-world traffic is rarely perfectly uniform. To account for this, we compute the squared coefficient of variation for both service times (cs²) and inter-arrival gaps (ca²) using Redis counters, which lets us approximate wait times more accurately without storing historical samples:

async def cv2(self, min_samples: int = 1000) -> float:
    s, s2, c = await self.r.mget(self.sum_key, self.sum2_key, self.count_key)
    c = int(c or 0)
    if c < min_samples:
        return 1.0  # assume exponential
    mean = float(s) / c
    mean_sq = float(s2) / c
    mean2 = mean**2
    variance = mean_sq - mean2
    return max(variance / mean2, 0.01)

Avoiding early noise: To prevent volatile estimates when the service first starts, we assume standard Markovian traffic (cs²=1, ca²=1) until we have collected at least 1,000 samples.

3.3. Monitoring and observability

You can’t manage what you can’t see.

Collecting metrics: We run a background task every second that collects key metrics (like the current queue length) and sends them to a time-series database like VictoriaMetrics or Prometheus. This task uses the prometheus_client library to format the data:

from redis.asyncio import Redis
from prometheus_client import CollectorRegistry, Gauge

metrics_reg = CollectorRegistry() # private registry
queue_len_g = Gauge("lobby_queue_length", "Tokens waiting in Redis list", registry=metrics_reg)
start_http_server(9100, registry=metrics_reg) # expose the private registry

async def queue_sampler(ctx: AppContext, stop_event: asyncio.Event) -> None:
    lobby: LobbyState = ctx.lobby
    r: Redis = lobby.redis # internal Redis pool
    while not stop_event.is_set():
        queue_len = await r.zcard(ctx.lobby.keys.pending)
        queue_len_g.set(queue_len)
        # ... other metrics

        await asyncio.sleep(1)

Handling hot-reloading: When using Uvicorn’s reloader, you must create a private CollectorRegistry (see code snippet above). If you don’t, each reload will attempt to re-register the same metric, causing a “Duplicated timeseries” error in Prometheus.

3.4. Resource management and stability

A mismanaged resource can bring the whole system down.

Isolating Redis connections: A key bottleneck can emerge from Redis connections. A worker performing a blocking list pop (BLPOP) holds a Redis socket open. With just 10 workers, a single connection pool of 10 would be exhausted, bringing the whole system to a halt. To prevent this system-wide stall, we must isolate the connection pools based on their function.
```
from typing import NamedTuple
from redis.retry import Retry
from redis.backoff import ExponentialBackoff
from redis.asyncio import BlockingConnectionPool, Redis

class RedisPools(NamedTuple):
    r_user: Redis # Redis pool for /join
    r_internal: Redis # Redis pool for /position
    r_tasks: Redis # Redis pool for background tasks

def create_redis_pools(debug=False) -> RedisPools:
    # retry 3 times, maximum delay 2 seconds
    retry = Retry(ExponentialBackoff(cap=2, base=0.1), 3)

    pool_user = BlockingConnectionPool.from_url(
        REDIS_URL,
        max_connections=2 * MAX_WORKERS + 10,
        timeout=1,
        decode_responses=True,
    )
    r_user: Redis = Redis(connection_pool=pool_user, health_check_interval=5, retry=retry)
    # r_internal ...
    # r_tasks ...

    return RedisPools(r_user=r_user, r_internal=r_internal, r_tasks=r_tasks)
```
The above setup creates three distinct Redis connection pools:
- The r_user pool is generously sized to handle incoming user traffic without delay.
- Meanwhile, the r_internal pool is used by our workers, isolating their blocking operations so they can’t affect new users joining the queue. This pool is also used to serve requests hitting the /position endpoint, which is an internal-facing endpoint for users already in the system.
- Finally, r_tasks handles low-priority background jobs. This separation ensures that a bottleneck in one part of the system doesn’t compromise the others.
Also, note that each Redis pool is configured with automatic retry in case of errors and performs automatic health checks every 5 seconds. In production, we would also implement the Circuit Breaker pattern, e.g., as described here.

Managing the garbage collector: Python’s garbage collector can compete with the asyncio loop for the Global Interpreter Lock (GIL). To ensure smooth performance, we disable automatic garbage collection and instead run it periodically on a schedule via a background task:

async def gc_task(stop_event: asyncio.Event, interval: float) -> None:
    gc.disable()
    try:
        while not stop_event.is_set():
            await asyncio.sleep(interval)
            gc.collect()
    finally:
        gc.enable()

4. The API in action: example requests

Here is what the output of our three endpoints looks like.

4.1. A user joins the queue

A POST request to /join places the user in the queue and returns a unique token, the initial position, and the estimated waiting time in the response body.

$ curl  -i -X POST http://localhost:8000/join
HTTP/1.1 200 OK
x-poll-after-seconds: 15.57
...

{"id":"bdea74e6-5c19-40c0-9fc3-fef8cc0a7485","position":366,"wait":33.68}

One custom HTTP header, x-poll-after-seconds, is included. This header indicates the maximum amount of time (in seconds) that can elapse before the next poll request is required. Polling before this deadline is fine, but polling after it could result in the token expiring and the user losing their place in the queue.

📌 Note

4.2. A user checks their position

The client polls the /position/{token} endpoint to receive updates. Each response contains the user’s current place in the queue, an updated estimate of the remaining waiting, and the uncertainty around that estimate. The x-poll-after-seconds retains its role, informing the client when the next poll should be sent.

For example, after joining at position 366, the user moves forward in the queue and receives updated wait estimates each time. This polling action also acts as a heartbeat, extending the token’s lifetime and maintaining the user’s place in the queue.

$ curl -i http://localhost:8000/position/bdea74e6-5c19-40c0-9fc3-fef8cc0a7485
HTTP/1.1 200 OK
x-poll-after-seconds: 14.33
...

{"position":318,"wait":28.82,"variance":4.45}


$ curl -i http://localhost:8000/position/bdea74e6-5c19-40c0-9fc3-fef8cc0a7485
HTTP/1.1 200 OK
x-poll-after-seconds: 10.89
...

{"position":206,"wait":18.81,"variance":3.7}


$ curl -i http://localhost:8000/position/bdea74e6-5c19-40c0-9fc3-fef8cc0a7485
HTTP/1.1 200 OK
x-poll-after-seconds: 2.19
...

{"position":62,"wait":5.33,"variance":1.96}


$ curl -i http://localhost:8000/position/bdea74e6-5c19-40c0-9fc3-fef8cc0a7485
HTTP/1.1 200 OK
x-poll-after-seconds: 1.91
...

{"position":38,"wait":3.05,"variance":1.46}

As seen above, each subsequent call to /position/{token} shows the user’s progress through the queue:

Position steadily counts down toward the front.
Estimated waiting time (wait) decreases as their turn approaches.
x-poll-after-seconds also drops significantly, reflecting the need for more frequent liveness checks as the user nears the front.

💡 Notes

4.3. Checking the total queue size

/qsize exposes a monitoring endpoint allowing us to quickly check the total number of users in the queue.

$ curl -i http://localhost:8000/qsize
HTTP/1.1 200 OK
...

{"len":320}

Conclusion: a running Lobby and a look ahead

We have successfully moved from the architectural blueprint of Part 1 to a tangible, running service. We built a lean, high-performance ASGI application from scratch, carefully avoiding framework overhead. We then designed a robust state-management layer using Redis, leveraging its powerful data structures like Lists, Hashes, and Sorted Sets to manage the queue, track users, and handle token expiry.

Finally, we explored critical implementation details for ensuring atomicity with Lua scripts, calculating accurate wait times, and managing resources like connection pools to build a stable and reliable system. You now have a solid, testable foundation for a virtual lobby. The complete source code for this series will be made available in a public GitHub repository in the coming weeks.

Coming up in Part 3…

Our lobby is running, but nobody is being let in yet! To test our virtual queue in isolation, we will simulate the backend services rather than building real ones. This approach allows us to create a range of traffic conditions (like bursty arrivals or heavy-tailed service times) to really put our system to the test.

In the next article, we will implement these simulated backend workers. Each worker will be a Python coroutine that:

Removes a user from the front of the queue using the Redis BLPOP command (blocking if the queue is empty).
Simulates doing work by sleeping for a configurable, random amount of time.

We will program this logic, tackle dynamic admission control, and dive deeper into handling abandoned tickets and other real-world edge cases. Stay tuned!

📬 Get weekly queue insights

Not ready to talk? Stay in the loop with our weekly newsletter — The Queue Report, covering intelligent queue management, digital flow, and operational optimization.

Virtual queues from scratch, Part 2: implementing the Lobby Service with Redis

Table of contents