aegean.ai¶

ai ¶

Multi-provider AI layer — grounded, exploratory-labeled.

Providers: Anthropic (default), OpenAI, xAI Grok, Google Gemini — each an optional extra, lazily imported. Capabilities: translate, gloss, decipher_hypotheses, nlp_assist, ask, summarize. Every generative output is an ExploratoryResult with provenance and an unverified flag.

from aegean import ai
client = ai.get_client("anthropic")          # needs pyaegean[anthropic] + a key
result = ai.translate("μῆνιν ἄειδε θεά", client=client)
print(result.labeled())                       # carries the EXPLORATORY tag

ResponseCache ¶

ResponseCache(path: str | Path | None = None)

Get/set completions by content hash, optionally persisted to JSON.

AIError ¶

Bases: RuntimeError

Base class for AI-layer errors.

ExploratoryResult `dataclass` ¶

ExploratoryResult(text: str, kind: str, provider: str, model: str, prompt_version: str, grounding: tuple[GroundingItem, ...] = (), exploratory: bool = True, data: Any = None)

A generative result, explicitly labeled exploratory and provenanced.

grounding is the structured corpus/lexicon/analysis evidence fed to the model (each a GroundingItem with a source and a ref). Use labeled when surfacing to a user so the caveat travels with the text, trace to audit which local facts grounded the output, and data (when set by a structured capability) for the parsed JSON payload.

labeled ¶

labeled() -> str

The text prefixed with an unmistakable exploratory provenance tag.

trace ¶

trace() -> str

A human-readable provenance trace: the generative step and the local, non-generative evidence that grounded it, grouped by source.

Makes the exploratory result auditable — every grounding line names the source (corpus, lexicon, analysis step) and the ref it came from, so a reader can check the output against the facts it was given rather than taking it on trust.

LLMClient ¶

LLMClient(model: str | None = None, *, api_key: str | None = None, cache: ResponseCache | None = None)

Bases: ABC

Abstract provider client. Subclasses implement _complete.

complete ¶

complete(prompt: str, *, system: str | None = None, max_tokens: int = 1024) -> LLMResponse

A cached single-turn completion (cache is keyed on provider/model/ system/prompt so re-asking is free and deterministic).

LLMResponse `dataclass` ¶

LLMResponse(text: str, provider: str, model: str, raw: Any = None)

A raw completion from a provider.

MissingAPIKey ¶

Bases: AIError

Raised when no API key is available for a provider.

ProviderNotInstalled ¶

Bases: AIError

Raised when a provider's optional SDK isn't installed.

UnknownProvider ¶

Bases: AIError

Raised for an unregistered provider id.

CaseResult `dataclass` ¶

CaseResult(name: str, used: tuple[str, ...], missing: tuple[str, ...], fabricated: tuple[str, ...], groundedness: float, clean: bool, text: str)

The scored outcome of one case.

EvalReport `dataclass` ¶

EvalReport(cases: tuple[CaseResult, ...], groundedness: float, fabrication_rate: float, n: int = 0)

Aggregate over a case set: mean groundedness and the fabrication rate (fraction of cases where any must_avoid appeared).

GroundingCase `dataclass` ¶

GroundingCase(name: str, prompt: str, grounding: tuple[str | GroundingItem, ...] = (), must_use: tuple[str, ...] = (), must_avoid: tuple[str, ...] = (), kind: str = 'ask', note: str = '')

One eval case: a prompt, the evidence to feed, and the facts a faithful answer should use / must not fabricate.

kind picks the capability (ask / decipher / gloss / summarize / translate). must_use are strings a grounded answer should reference; must_avoid are strings that, if present, signal the model went beyond (or against) its evidence.

GroundingItem `dataclass` ¶

GroundingItem(content: str, source: str = 'custom', ref: str = '')

One piece of grounding evidence and its provenance.

content is what the model sees; source is the provenance category (e.g. "corpus:lineara", "lexicon:LSJ", "lemmatizer", "transliteration", "analysis:cooccurrence"); ref is the specific locator it concerns (a word, lemma, or document id). Stringifies to content so it drops into the prompt like a plain evidence line.

ask ¶

ask(question: str, *, grounding: Grounding = (), client: LLMClient | None = None) -> ExploratoryResult

Answer a question over corpus/commentary grounding.

decipher_hypotheses ¶

decipher_hypotheses(text: str, *, grounding: Grounding = (), client: LLMClient | None = None) -> ExploratoryResult

Offer decipherment hypotheses for an undeciphered (Linear A) sequence, each tied to cited corpus evidence. Strictly exploratory.

extract ¶

extract(text: str, *, instruction: str = 'Extract the structured data from the following.', schema: Mapping[str, str] | str | None = None, grounding: Grounding = (), client: LLMClient | None = None) -> ExploratoryResult

Ask for structured (JSON) output and parse it into result.data so the AI layer can feed a pipeline or database.

schema describes the wanted shape — a mapping of field → description (rendered as a field list) or a free-form shape string — and is appended to instruction. The model is told to return JSON only; the response is parsed leniently (parse_json). result.data is the parsed value (or None if the model didn't return parseable JSON — result.text always has the raw response). Still exploratory and grounded like every capability.

r = extract("KN Fp 1: OLE S 1", schema={"commodity": "ideogram", ... "amount": "number"}, client=client) # doctest: +SKIP r.data # doctest: +SKIP {'commodity': 'OLE', 'amount': 1}

gloss ¶

gloss(text: str, *, source: str = 'Ancient Greek', grounding: Grounding = (), client: LLMClient | None = None) -> ExploratoryResult

Produce an interlinear, word-by-word gloss of the source text.

nlp_assist ¶

nlp_assist(text: str, *, task: str = 'lemma and POS disambiguation', grounding: Grounding = (), client: LLMClient | None = None) -> ExploratoryResult

Ask the model to disambiguate an NLP analysis (lemma/POS/parse) where the rule-based pipeline is uncertain.

parse_json ¶

parse_json(text: str) -> Any | None

Best-effort parse of a JSON value from a model response. Returns None (never raises) when nothing parseable is found.

Tolerant of the ways models wrap JSON: a `json fenced block, or prose around a bare object/array. Tries the fenced content, then the whole string, then the outermost{...}/[...]`` slice.

summarize ¶

summarize(text: str, *, grounding: Grounding = (), client: LLMClient | None = None) -> ExploratoryResult

Summarize a corpus excerpt or commentary.

translate ¶

translate(text: str, *, source: str = 'Ancient Greek', target: str = 'English', grounding: Grounding = (), client: LLMClient | None = None) -> ExploratoryResult

Translate source text, grounded in optional lexicon/corpus evidence.

get_client ¶

get_client(provider: str = 'anthropic', *, model: str | None = None, api_key: str | None = None, cache: ResponseCache | None = None) -> LLMClient

Construct a client for provider (default Anthropic). Importing aegean.ai registers all built-in providers.

register_provider ¶

register_provider(cls: type[LLMClient]) -> type[LLMClient]

Register an LLMClient subclass under its provider name (each adapter calls this).

run_eval ¶

run_eval(cases: Sequence[GroundingCase], client: LLMClient) -> EvalReport

Run each case through its capability with client and aggregate.

Needs a working LLMClient (a provider with a key, or a stub). Returns an EvalReport with mean groundedness and the fabrication rate — the AI layer's analogue of the lemmatizer's held-out accuracy.

score_text ¶

score_text(text: str, case: GroundingCase) -> CaseResult

Score one answer against a case (case-insensitive substring containment).

list_providers ¶

list_providers() -> list[str]

The sorted names of registered providers, e.g. ['anthropic', 'gemini', 'grok', 'openai'].

as_item ¶

as_item(x: str | GroundingItem) -> GroundingItem

Coerce a string or GroundingItem to a GroundingItem (strings become source="custom").

cooccurrence_evidence ¶

cooccurrence_evidence(corpus: object, word: str, *, limit: int = 12) -> list[GroundingItem]

Grounding for an undeciphered-script query: the words that most often share a document with word. Source analysis:cooccurrence, ref=word. Empty if word co-occurs with nothing.

corpus_context ¶

corpus_context(corpus: object, *, limit: int = 20) -> list[GroundingItem]

A small grounding context from a corpus: its most frequent words.

Kept deliberately small — this is seed grounding, not retrieval. Accepts any object exposing word_frequencies() (e.g. aegean.Corpus); the source is tagged corpus:<script_id> so the trace names the corpus.

evidence_block ¶

evidence_block(evidence: Iterable[str | GroundingItem]) -> str

Render grounding evidence as a compact, labeled bullet list (or empty).

Only the content reaches the prompt — provenance is for the trace, not the model — so the wording stays stable across GroundingItem and plain strings.

lexicon_evidence ¶

lexicon_evidence(words: Iterable[str], *, limit: int = 20) -> list[GroundingItem]

Grounding from the active LSJ lexicon: a short gloss per word that has an entry. Returns nothing if the lexicon isn't loaded (greek.use_lsj()) — grounding is best-effort, never a hard dependency. Source lexicon:LSJ.

wrap_untrusted ¶

wrap_untrusted(text: str, label: str = 'SOURCE') -> str

Delimit untrusted source text with an explicit do-not-follow note.

aegean.ai¶

ai ¶

ResponseCache ¶

AIError ¶

ExploratoryResult dataclass ¶

labeled ¶

trace ¶

LLMClient ¶

complete ¶

LLMResponse dataclass ¶

MissingAPIKey ¶

ProviderNotInstalled ¶

UnknownProvider ¶

CaseResult dataclass ¶

EvalReport dataclass ¶

GroundingCase dataclass ¶

GroundingItem dataclass ¶

ask ¶

decipher_hypotheses ¶

extract ¶

gloss ¶

nlp_assist ¶

parse_json ¶

summarize ¶

translate ¶

get_client ¶

register_provider ¶

run_eval ¶

score_text ¶

list_providers ¶

as_item ¶

cooccurrence_evidence ¶

corpus_context ¶

evidence_block ¶

lexicon_evidence ¶

wrap_untrusted ¶

ExploratoryResult `dataclass` ¶

LLMResponse `dataclass` ¶

CaseResult `dataclass` ¶

EvalReport `dataclass` ¶

GroundingCase `dataclass` ¶

GroundingItem `dataclass` ¶