Images and pools¶

Most BCDock provisioning time is spent in two places: building VM images and creating pool VMs. Both are amortised: one image build serves many pools; one pool serves many environments. Knowing the model makes the timing surprises (and the cost shape) less surprising.

2-stage image system¶

A VM image is a captured Azure managed image you can deploy to a fresh VM. We build them in two stages because the second stage depends on Microsoft's BC artifact catalog, which changes monthly, while the first stage doesn't.

Stage 1 - Generic image¶

Windows + Docker + BcContainerHelper, no BC artifacts cached.

Naming: bcdock-vm-base-{timestamp}
Build time: ~30-45 minutes
Build trigger: when a region has no Generic image yet, or when the Generic image is more than ~quarterly old

This image is the substrate for every Versioned image in the same region. It changes rarely.

Stage 2 - Versioned image¶

Generic image + BC artifacts cached for one specific BC version × country × artifact-type.

Naming: bcdock-vm-bc{ver}-{country}-{artifactType}-{timestamp}
Build time: ~60-90 minutes
Build trigger: when a customer requests a BC version × country combination that has no cached image

The cache contains the BC artifact MSIs and a pre-pulled Docker image, both ready to start in seconds. Without the cache, the equivalent operation takes ~78 minutes (download + install + first-time-startup) per env, every time.

Why two stages¶

If we baked everything into one stage, every BC version refresh (~monthly) would force a full Generic rebuild. Splitting them means one Generic image serves 12+ months of Versioned image refreshes.

Unified MT/ST images¶

Each Versioned image caches both multi-tenant and single-tenant flavours of the BC Docker image. Tenancy (MultiTenant) is a runtime attribute of the pool and environment, not the image - which flavour to start is selected at container-create time.

This was a design simplification (single image instead of one per tenancy flavour). Before it, we had separate MT and ST images per version × country, doubling cache size and image-build pressure.

Pool model¶

A pool is an Azure VM hosting 2-9 BC containers. Pools are the load-bearing infrastructure unit:

Always running. We never stop pools. A stopped pool means a stopped Traefik, which means broken URLs.
Hosts a single combination. One pool = one BC version × country × artifact-type. Environments on the pool share the combination; you can't put a v27-au env on a v28-au pool.
Has its own DNS. Pool VMs get a stable hostname under *.bcdock.io; environments under that pool get per-env subdomains.

Why 2-9 containers¶

The lower bound (2) avoids the "whole pool to host one customer's empty env" cost shape - pools are sized so a typical concurrent workload of 2-4 active envs doesn't contend.

The upper bound (9) is empirical - beyond 9 we've seen Traefik routing degrade and disk-IO contention show up under heavy AL compile load. Could be tuned upward with bigger pool VM sizes; we've stayed conservative pre-launch.

Pool lifecycle states¶

State	Meaning
`Creating`	Pool VM provisioning, Docker installing, agents deploying
`Running`	Operational; can accept new env allocations
`Draining`	Existing envs continue to run; no new allocations. Used when phasing out a BC version.
`Failed`	Provisioning failed; kept around for forensics, not destroyed automatically
`Deleting` / `Deleted`	Tearing down

Allocating a new env always uses a Running pool - the autoscaler implicitly excludes other states.

Autoscaler¶

A background service decides:

When to create a pool - when env demand exceeds current pool capacity for a combination, or when the customer requests a region/version that has no pool yet
When to delete a pool - when a Running pool has been at 0 utilisation for a sustained window and isn't pinned
When to drain a pool - when its BC version is being phased out (see ADR-010)

Pools can be pinned to exempt them from autoscaler deletion - useful for staging environments and "I want this exact pool for the next training run" scenarios. Pinning is staff-handled via internal tooling; if you need a pool pinned for a workshop or event, reach out via support@bcdock.io.

Allocation algorithm¶

When a customer creates an env:

Check for image - does the requested BC version × country × artifact-type combo have a Ready versioned image in the requested region? If not, kick off image build (~78 min) and return a queued env.
Check for pool - is there a Running pool in the right region with capacity for one more env? If yes, allocate. If no, kick off pool create (~20 min) and queue.
Allocate slot - write the env-record row, ask the pool agent to start the container, return.

In the warm-pool case, only step 3 runs - provisioning completes in 1-2 minutes. The other steps amortise across envs.

Pre-warmed environments¶

For workloads where the 1-2 minute create is still too slow (training cohort, demo prep), the platform supports pre-warmed environments - created ahead of time on a pool slot, sitting in a pending-assignment state. When a customer requests a matching env, we assign the pre-warmed one to them, bypassing the create wait.

This is a silent optimisation; there's no pre-warmed status in the public API. The customer sees a fast running state and that's it.

Why this shape matters¶

Three customer-visible consequences:

First-time-for-a-combo is slow. ~78 minutes of image build, then ~20 minutes of pool create, then ~1-2 minutes of env create. Subsequent envs in that combo take ~1-2 minutes each.
Cross-region resume is supported. The hibernation primitive plus the multi-pool model mean an env hibernated in australiaeast can resume on a westus2 pool - the blob copy is the slow step, not the container start.
Cost is amortised, not per-env. A pool VM costs $X/month regardless of how many envs are on it. Active rate to the customer is shaped to recover that across typical concurrency, plus margin.