aegean.cache¶
cache ¶
An opt-in, off-by-default persistent cache for expensive analyses.
Some analyses are pure but slow — morphological clustering over the whole
vocabulary, dispersion/keyness across a large corpus, big queries. When you opt
in, their results are memoised to a local sqlite file keyed on a content
fingerprint of the inputs, so re-running the same analysis on the same corpus is
instant across runs. Disabled, @memoize is a transparent passthrough — zero
overhead and identical behaviour, so the cache never changes a result, only how
fast it arrives.
No new dependency: sqlite3 and pickle are stdlib, and the cache lives under the
same user cache dir as the fetched data (PYAEGEAN_CACHE to relocate).
import aegean
aegean.cache.enable() # opt in (or set PYAEGEAN_ANALYSIS_CACHE=1)
aegean.analysis.dispersions(corpus) # computed once, then served from disk
aegean.cache.stats() # {'enabled': True, 'entries': 1, 'path': …}
aegean.cache.clear() # wipe it
Security note. Values are stored with pickle in your own cache
directory (same trust boundary as pip/mypy/pytest caches); a stale or corrupt
entry is treated as a miss and recomputed, and the cache key embeds a format +
per-function version so a code change never deserialises against a changed class.
Only enable it for caches you control.
DiskCache ¶
A sqlite-backed key→value store. Values are pickled; unpicklable values are silently not cached, and unreadable rows are treated as misses.
enable ¶
enable(path: str | Path | None = None) -> DiskCache
Turn the cache on (idempotent), at path or the default cache file.
memoize ¶
Decorator: persist a pure function's result when the cache is enabled.
A transparent passthrough while disabled. When enabled, the result is keyed
on the function identity, version, and a content fingerprint of the
arguments; arguments that can't be fingerprinted (no cache_key() and not
a JSON scalar/list/dict) make the call compute directly rather than error.
Bump version when the function's logic changes.