Images and pools¶
Most BCDock provisioning time is spent in two places: building VM images and creating pool VMs. Both are amortised: one image build serves many pools; one pool serves many environments. Knowing the model makes the timing surprises (and the cost shape) less surprising.
2-stage image system¶
A VM image is a captured Azure managed image you can deploy to a fresh VM. We build them in two stages because the second stage depends on Microsoft's BC artifact catalog, which changes monthly, while the first stage doesn't.
Stage 1 - Generic image¶
Windows + Docker + BcContainerHelper, no BC artifacts cached.
- Naming:
bcdock-vm-base-{timestamp} - Build time: ~30-45 minutes
- Build trigger: when a region has no Generic image yet, or when the Generic image is more than ~quarterly old
This image is the substrate for every Versioned image in the same region. It changes rarely.
Stage 2 - Versioned image¶
Generic image + BC artifacts cached for one specific BC version × country × artifact-type.
- Naming:
bcdock-vm-bc{ver}-{country}-{artifactType}-{timestamp} - Build time: ~60-90 minutes
- Build trigger: when a customer requests a BC version × country combination that has no cached image
The cache contains the BC artifact MSIs and a pre-pulled Docker image, both ready to start in seconds. Without the cache, the equivalent operation takes ~78 minutes (download + install + first-time-startup) per env, every time.
Why two stages¶
If we baked everything into one stage, every BC version refresh (~monthly) would force a full Generic rebuild. Splitting them means one Generic image serves 12+ months of Versioned image refreshes.
Unified MT/ST images¶
Each Versioned image caches both multi-tenant and single-tenant flavours of the BC Docker image. Tenancy (MultiTenant) is a runtime attribute of the pool and environment, not the image - which flavour to start is selected at container-create time.
This was a design simplification (single image instead of one per tenancy flavour). Before it, we had separate MT and ST images per version × country, doubling cache size and image-build pressure.
Pool model¶
A pool is an Azure VM hosting 2-9 BC containers. Pools are the load-bearing infrastructure unit:
- Always running. We never stop pools. A stopped pool means a stopped Traefik, which means broken URLs.
- Hosts a single combination. One pool = one BC version × country × artifact-type. Environments on the pool share the combination; you can't put a v27-au env on a v28-au pool.
- Has its own DNS. Pool VMs get a stable hostname under
*.bcdock.io; environments under that pool get per-env subdomains.
Why 2-9 containers¶
The lower bound (2) avoids the "whole pool to host one customer's empty env" cost shape - pools are sized so a typical concurrent workload of 2-4 active envs doesn't contend.
The upper bound (9) is empirical - beyond 9 we've seen Traefik routing degrade and disk-IO contention show up under heavy AL compile load. Could be tuned upward with bigger pool VM sizes; we've stayed conservative pre-launch.
Pool lifecycle states¶
| State | Meaning |
|---|---|
Creating |
Pool VM provisioning, Docker installing, agents deploying |
Running |
Operational; can accept new env allocations |
Draining |
Existing envs continue to run; no new allocations. Used when phasing out a BC version. |
Failed |
Provisioning failed; kept around for forensics, not destroyed automatically |
Deleting / Deleted |
Tearing down |
Allocating a new env always uses a Running pool - the autoscaler implicitly excludes other states.
Autoscaler¶
A background service decides:
- When to create a pool - when env demand exceeds current pool capacity for a combination, or when the customer requests a region/version that has no pool yet
- When to delete a pool - when a
Runningpool has been at 0 utilisation for a sustained window and isn't pinned - When to drain a pool - when its BC version is being phased out (see ADR-010)
Pools can be pinned to exempt them from autoscaler deletion - useful for staging environments and "I want this exact pool for the next training run" scenarios. Pinning is staff-handled via internal tooling; if you need a pool pinned for a workshop or event, reach out via support@bcdock.io.
Allocation algorithm¶
When a customer creates an env:
- Check for image - does the requested BC version × country × artifact-type combo have a
Readyversioned image in the requested region? If not, kick off image build (~78 min) and return aqueuedenv. - Check for pool - is there a
Runningpool in the right region with capacity for one more env? If yes, allocate. If no, kick off pool create (~20 min) and queue. - Allocate slot - write the env-record row, ask the pool agent to start the container, return.
In the warm-pool case, only step 3 runs - provisioning completes in 1-2 minutes. The other steps amortise across envs.
Pre-warmed environments¶
For workloads where the 1-2 minute create is still too slow (training cohort, demo prep), the platform supports pre-warmed environments - created ahead of time on a pool slot, sitting in a pending-assignment state. When a customer requests a matching env, we assign the pre-warmed one to them, bypassing the create wait.
This is a silent optimisation; there's no pre-warmed status in the public API. The customer sees a fast running state and that's it.
Why this shape matters¶
Three customer-visible consequences:
- First-time-for-a-combo is slow. ~78 minutes of image build, then ~20 minutes of pool create, then ~1-2 minutes of env create. Subsequent envs in that combo take ~1-2 minutes each.
- Cross-region resume is supported. The hibernation primitive plus the multi-pool model mean an env hibernated in
australiaeastcan resume on awestus2pool - the blob copy is the slow step, not the container start. - Cost is amortised, not per-env. A pool VM costs $X/month regardless of how many envs are on it. Active rate to the customer is shaped to recover that across typical concurrency, plus margin.
Read more¶
- Hibernation - how the active/stored split interacts with pool capacity
- URL shape - how DNS and TLS work across pool families
- Reference: env states - the customer-facing lifecycle states