Guidance for measuring and improving HoloVec workloads without hard-coding benchmark claims that may not match your dimension, backend, or retrieval task.

Do Not Optimize Against a Single Toy Metric

For VSA systems, "performance" is a combination of:

  • latency and throughput
  • memory footprint
  • retrieval quality
  • cleanup or factorization quality
  • robustness under bundling load, noise, or distractors

That means bind() timing alone is not enough. A slower model can still be the better system if it reduces cleanup error, supports the right algebra, or holds more structure at the same dimension.

Benchmark the Workload You Actually Care About

Measure separate workloads instead of collapsing everything into one score:

  • primitive ops: random, bind, unbind, bundle, permute, similarity
  • retrieval: nearest neighbor, threshold retrieval, codebook sweeps
  • structure recovery: role-filler unbinding, factorization, sequence cleanup
  • task quality: classification accuracy, reconstruction accuracy, factor recovery, noise tolerance

This matters because model families behave differently:

  • exact-inverse models should be judged on exact recovery and compositional fidelity
  • approximate-inverse models should be judged on cleanup quality and graceful degradation
  • sparse models need overlap or segment-aware retrieval tests
  • non-commutative models need order-sensitive workloads, not just symmetric binding tests

Backend Selection

Situation Recommended Backend Why
development and release gating NumPy simplest environment, lowest setup cost
GPU-backed batched workloads PyTorch accelerator support and familiar tensor ecosystem
repeated compiled workloads JAX JIT can help when the same computation shape repeats

NumPy remains the release-blocking backend. Treat PyTorch and JAX as environment-dependent support paths unless you are explicitly testing them in your deployment environment.

Dimension and Model Tradeoffs

Use dimension as a budget, not a vanity metric.

  • lower dimensions are useful for prototyping and smoke tests
  • mid-range dense dimensions are usually the practical starting point for production
  • large dimensions help only if the retrieval task or bundling load justifies them
  • sparse models trade arithmetic simplicity for memory efficiency and task-specific retrieval

The right dimension depends on:

  • number of bundled items
  • amount of structured composition
  • cleanup strategy
  • distractor count
  • desired false-positive and false-negative rates

Avoid Invalid Cross-Model Comparisons

Do not treat all models as if they should win the same benchmark:

  • GHRR and VTB should be tested on order-sensitive or nested structure tasks
  • BSDC and BSDC-SEG should be tested with sparse retrieval or segment-pattern workloads
  • HRR should not be judged by exact-inverse expectations
  • MAP, BSC, and BSDC-SEG are especially relevant for self-inverse cleanup or hardware-flavored use cases

If a benchmark ignores the model's intended algebra, the result is mostly noise.

Batch and Vectorized Paths

Prefer codebook-level or backend-native operations whenever they exist.

from holovec.retrieval import Codebook, ItemStore

store = ItemStore(model).fit(Codebook(items, backend=model.backend))
top_hits = store.query(query, k=10)

This is usually the right baseline before building a custom batched pipeline.

Measure Locally

Use a tiny harness that warms up the path you care about:

import time


def benchmark(fn, iterations=100):
    for _ in range(10):
        fn()

    start = time.perf_counter()
    for _ in range(iterations):
        fn()
    return (time.perf_counter() - start) / iterations


seconds = benchmark(lambda: model.bind(a, b))
print(f"bind(): {seconds * 1_000:.3f} ms")

For JAX, account for compile warmup separately from steady-state timing.

Memory Guidance

  • dense complex or real models scale roughly with dimension times element size
  • sparse models can reduce memory significantly, but only when the task actually benefits from sparsity
  • matrix-valued models can be the right choice for structure even when they cost more per binding

VSA.create() validates only the documented factory kwargs. If you want custom precision or other backend-specific array behavior, handle that in application-specific backend code rather than assuming a generic factory passthrough.

Current Documentation Policy

This page intentionally avoids fixed timing tables. Release-facing benchmark numbers should come from the benchmark suite and methodology docs, not from hand-maintained prose.

Use Benchmark Methodology and python -m benchmarks.run ... for reproducible measurements rather than copying an old table into design docs or release notes.

Canonical Examples

See Also