Skip to content

aegean.cache

cache

An opt-in, off-by-default persistent cache for expensive analyses.

Some analyses are pure but slow — morphological clustering over the whole vocabulary, dispersion/keyness across a large corpus, big queries. When you opt in, their results are memoised to a local sqlite file keyed on a content fingerprint of the inputs, so re-running the same analysis on the same corpus is instant across runs. Disabled, @memoize is a transparent passthrough — zero overhead and identical behaviour, so the cache never changes a result, only how fast it arrives.

No new dependency: sqlite3 and pickle are stdlib, and the cache lives under the same user cache dir as the fetched data (PYAEGEAN_CACHE to relocate).

import aegean
aegean.cache.enable()                 # opt in (or set PYAEGEAN_ANALYSIS_CACHE=1)
aegean.analysis.dispersions(corpus)   # computed once, then served from disk
aegean.cache.stats()                  # {'enabled': True, 'entries': 1, 'path': …}
aegean.cache.clear()                  # wipe it

Security note. Values are stored with pickle in your own cache directory (same trust boundary as pip/mypy/pytest caches); a stale or corrupt entry is treated as a miss and recomputed, and the cache key embeds a format + per-function version so a code change never deserialises against a changed class. Only enable it for caches you control.

DiskCache

DiskCache(path: str | Path)

A sqlite-backed key→value store. Values are pickled; unpicklable values are silently not cached, and unreadable rows are treated as misses.

enable

enable(path: str | Path | None = None) -> DiskCache

Turn the cache on (idempotent), at path or the default cache file.

disable

disable() -> None

Turn the cache off; subsequent memoised calls compute directly.

is_enabled

is_enabled() -> bool

Whether the analysis cache is currently active.

clear

clear() -> None

Remove every cached entry (no-op if the cache is disabled).

stats

stats() -> dict[str, Any]

{'enabled', 'path', 'entries'} for the active cache.

memoize

memoize(*, version: str = '1') -> Callable[[F], F]

Decorator: persist a pure function's result when the cache is enabled.

A transparent passthrough while disabled. When enabled, the result is keyed on the function identity, version, and a content fingerprint of the arguments; arguments that can't be fingerprinted (no cache_key() and not a JSON scalar/list/dict) make the call compute directly rather than error. Bump version when the function's logic changes.