Cleanup strategies project noisy or composite hypervectors back to known items in a codebook.
Why Cleanup?
After operations like unbinding (especially with approximate models) or bundling, vectors become noisy:
# HRR unbinding is approximate
model = VSA.create('HRR', dim=10000)
a, b = model.random(seed=1), model.random(seed=2)
c = model.bind(a, b)
a_recovered = model.unbind(c, b)
print(model.similarity(a, a_recovered)) # ~0.70 (not 1.0!)
Cleanup finds the closest known item:
# With cleanup
cleaned = cleanup.cleanup(a_recovered, codebook)
print(model.similarity(a, cleaned)) # ~1.0 (if a is in codebook)
Available Strategies
| Strategy | Method | Speed | Best For |
|---|---|---|---|
| BruteForce | Check all items | O(n×d) | Small codebooks (<1000) |
| Resonator | Iterative refinement | O(k×d) | Large codebooks, composites |
BruteForceCleanup
Exhaustively compares with every item in the codebook.
Usage
from holovec import VSA
from holovec.retrieval import Codebook
from holovec.utils.cleanup import BruteForceCleanup
model = VSA.create('HRR', dim=10000)
# Create codebook
items = {f"item_{i}": model.random(seed=i) for i in range(100)}
codebook = Codebook(items, backend=model.backend)
# Create cleanup
cleanup = BruteForceCleanup(model)
# Clean up noisy vector
noisy = items["item_42"] + model.random() * 0.3
cleaned, label, similarity = cleanup.cleanup(noisy, codebook)
print(f"Cleaned to: {label} (sim={similarity:.3f})") # item_42
Parameters
| Parameter | Default | Description |
|---|---|---|
| threshold | None | Minimum similarity to accept |
When to Use
- Small codebooks (< 1000 items)
- Need guaranteed best match
- One-time queries (not batched)
ResonatorCleanup
Iterative algorithm based on resonator networks (Kymn et al. 2024).
How It Works
- Start with initial estimate x⁰
- Iterate: x^(t+1) = normalize(similarity_matrix × x^t)
- Converge to attractor (known item)
Usage
from holovec.utils.cleanup import ResonatorCleanup
# Create resonator
cleanup = ResonatorCleanup(
model,
iterations=10,
soft=True
)
# Clean up
cleaned, label, similarity = cleanup.cleanup(noisy, codebook)
Parameters
| Parameter | Default | Description |
|---|---|---|
| iterations | 10 | Max iterations |
| soft | True | Soft vs hard update |
| convergence_threshold | 0.999 | Early stopping |
Soft vs Hard Mode
Soft resonator (default): - Smooth updates via weighted average - Better for noisy input - More stable convergence
Hard resonator: - Winner-take-all updates - Faster convergence - Can oscillate with noise
# Hard resonator
hard_cleanup = ResonatorCleanup(model, iterations=10, soft=False)
Factorization
Resonators can decompose bound composites:
# c = bind(a, b) where a, b are from codebook
codebook_a = Codebook(...) # Possible values of a
codebook_b = Codebook(...) # Possible values of b
# Factorize
a_cleaned, b_cleaned = cleanup.factorize(c, [codebook_a, codebook_b])
When to Use
- Large codebooks (1000+ items)
- Composite vector factorization
- Repeated queries (can precompute)
Performance Comparison
| Codebook Size | BruteForce | Resonator (10 iter) |
|---|---|---|
| 100 | 0.1 ms | 0.3 ms |
| 1,000 | 1.0 ms | 0.5 ms |
| 10,000 | 10 ms | 0.8 ms |
| 100,000 | 100 ms | 1.5 ms |
Resonator becomes faster for large codebooks due to sublinear convergence.
Cleanup with Different Models
| Model | Cleanup Needed? | Recommended Strategy |
|---|---|---|
| FHRR | Rarely (exact inverse) | BruteForce if needed |
| GHRR | Rarely (exact inverse) | BruteForce if needed |
| MAP | After bundling | BruteForce |
| HRR | Always | Resonator |
| VTB | Always | Resonator |
| BSC | After bundling | BruteForce |
| BSDC | After operations | BruteForce |
Example: Complete Cleanup Pipeline
from holovec import VSA
from holovec.retrieval import Codebook, ItemStore
from holovec.utils.cleanup import ResonatorCleanup
# Setup
model = VSA.create('HRR', dim=10000)
# Create word embeddings
words = ["apple", "banana", "cherry", "date", "elderberry"]
word_vectors = Codebook(
{w: model.random(seed=hash(w)) for w in words},
backend=model.backend
)
# Create cleanup
cleanup = ResonatorCleanup(model, iterations=20)
# Store and retrieve with cleanup
def semantic_query(query_word, context_word):
"""Find word most related to query given context."""
q = word_vectors[query_word]
ctx = word_vectors[context_word]
# Bind to create association
associated = model.bind(q, ctx)
# Unbind with query
result = model.unbind(associated, q)
# Clean up to find closest word
cleaned, label, sim = cleanup.cleanup(result, word_vectors)
return label, sim
# Test
result, sim = semantic_query("apple", "cherry")
print(f"Result: {result} (sim={sim:.3f})")
References
- Kymn, C., et al. (2024). Resonator Networks for Learning Compositional Representations
See Also
- Retrieval-Overview — Retrieval components
- Model-HRR — Approximate inverse model
- Model-VTB — Another approximate model