Matter-route inference: focus areas as retrieval constraints for official sources

Part one defined source-route inference: a pasted URL is an observation, not the official record. Horizon tests routes, accepts only safe and useful representations, then binds change to a route, a content unit, and a decision record. That solves the first failure mode: the system should not confuse a public URL with the best official path to the record (here in detail).

The second failure mode starts one layer later. A user does not only ask, “What changed at this official source?” The user asks, “Does this change fall inside my work?”

A focus area therefore cannot stay a string. “MiCA,” “AI,” or “AML” does not name a retrieval object. It names a rough legal interest. Official change arrives through legal acts, technical standards, register rows, Q&A pages, forms, PDFs, feeds, and national authority pages. A focus area must route through those objects before a model writes an alert.

Horizon treats the focus area as a matter route. A matter route binds six things: legal scope, source owner, jurisdiction, source class, change unit, and delivery audience. The user sees a simple matter. The system sees the official sources and event types that belong to that matter. The private part is how Horizon scores sources, probes failures, orders parser fallbacks, and tests edge cases.

The topic-label failure

A topic label fails in two directions. It is too wide for retrieval and too narrow for change detection.

It is too wide because “Digital Assets” can mean MiCA Level 1, ESMA Level 2 and Level 3 measures, EBA ART and EMT work, DAC8 tax reporting, the crypto travel rule, tokenized securities, and national CASP authorisation. A single label cannot say which official channel owns the next change.

It is too narrow because the official text may not use the user’s label. An EBA page can speak about ARTs, EMTs, reserve assets, own funds, supervisory colleges, and non-EU currency use under MiCAR. A pure text match on “crypto” or “MiCA” would miss part of the legal signal or flood the user with noise. EBA’s own MiCA page separates ART and EMT authorisation from technical standards, with status filters for draft RTS, ITS, Commission adoption, Official Journal publication, and translation state.

The failure is sharper for registers. ESMA’s interim MiCA register is not prose. It has five CSV files: white papers, ART issuers, EMT issuers, authorised CASPs, and non-compliant entities. ESMA says it publishes the latest version at weekly intervals. A topic label cannot tell the system whether a row addition, row removal, status change, or withdrawn authorisation matters.

The scientific claim is narrow: focus-area retrieval works when the system routes through official source objects and typed change units before model classification. It fails when the system waits for a full diff, then asks a model whether the text “looks like” the user’s topic.

Matter route, not keyword query

A matter route is a typed retrieval path. It does not start with the phrase the user typed. It starts with the official object that can prove the change.

A legal text route can carry an ELI or CELEX identifier. A register route can carry a row key. A PDF route can carry a document hash, page number, and heading. A feed route can carry an entry ID. A source-owner route can carry ESMA, EBA, AMLA, the Commission, or a national authority. These route parts matter because legal change is not only semantic. It is institutional.

The EU Publications Office’s Cellar shows why this matters. Cellar exposes EU publication metadata through structured identifiers, REST, SPARQL, RSS and Atom, linked data, and multilingual machine-readable metadata. It also represents publications through work, expression, manifestation, and item layers. A legal source system should use those layers where they exist, not flatten the source into a visible web page.

Matter routing adds the user boundary. Source-route inference asks, “Which route gives the best official record?” Matter-route inference asks, “Which subscribed matter owns the change produced by that route?” The first question controls retrieval quality. The second controls relevance.

Horizon separates those two questions. One official source can serve several matters. One matter can use several official sources. One source event can match several matters. That last rule matters: an ESA publication may touch MiCA, DORA, AML, and capital markets in one document. A forced single label would erase part of the change.

Event first, model second

The model should not discover the event from raw official text when a better unit exists. It should receive the event after the retrieval layer has named it.

For a register, the event is a row-level delta: entity added, entity removed, status changed, field changed. For a PDF index, the event is a document added, document removed, or same-link document replaced. For a legal record, the event may be a new consolidated text, corrigendum, amendment, or new linked act. For a Q&A page, it may be a new question, changed answer, or new topic tag.

For example, ESMA’s MiCA register proves the point. Its five CSV files carry different legal objects. A white-paper row does not mean the same thing as a non-compliant-entity row. A withdrawn authorisation does not mean the same thing as a newly listed CASP. ESMA also provides field descriptions for the register files. A good retrieval system should parse the fields, compare row identities, and then write the alert from the typed delta.

This ordering changes the model’s job. The model is not the retriever. It is the late classifier and writer. It receives a source event, a matter route, and an output contract. It decides whether the event sits inside the matter boundary, then writes a short alert tied to the source unit.

The matter ledger

A focus area becomes useful when it leaves a ledger trail. Horizon’s public source-route work already names four ledgers: route, representation, change, and decision. Matter routing adds a fifth ledger: scope.

The scope ledger records why the event belongs, or does not belong, to a user’s selected matter. It records the matter identity, official source owner, jurisdiction, source class, event type, and model decision. It also records negative decisions. That negative record matters. A silent system should be able to say whether nothing changed, the route failed, extraction failed, the event was outside scope, or the model returned a rejected answer.

The ledger must store sets, not only scalars. A single event may match several matters. A single user may receive the event through a direct subscription and a team subscription. A source may belong to a parent matter and a child matter. The scientific unit is therefore not “one label per alert.” It is “one source event with zero or more matter matches and zero or more deliveries.”

This design prevents two common errors. The system does not scrape the same official source once per subscriber. It also does not send the same user the same alert twice when the user qualifies through more than one route. The internal method for deduplication can stay private. The public principle is simple: retrieval, matter match, and delivery are separate records.

Official source classes change the retrieval test

A matter route needs a source class. Without it, the system cannot know what kind of proof to seek.

A legal-store source needs identifier checks. A PDF directory needs link-set comparison and file hashing. A register needs row keys and field-level deltas. A feed needs item identity and pagination handling. A rendered web page needs content markers, not only a successful browser load. A webhook or user-pasted URL needs safety checks before any fetch.

Security belongs inside retrieval, not after it. Server-side request forgery is a case where an application fetches a remote resource without proper validation of a user-supplied URL. Its application-layer controls include validating input, enforcing URL scheme, port, and destination through a positive allow list, not returning raw responses, disabling redirects, and handling DNS rebinding and time-of-check/time-of-use races.

Horizon applies the same rule to focus-area retrieval. A source is not accepted only because it belongs to a matter. The route must pass safety checks, content checks, parser checks, and source-owner checks. A matter route that points to a blocked page, a cookie shell, a wrong-language stub, or a broken PDF does not become a trusted baseline.

Structured decisions

A focus-area decision should not be a free paragraph. It should be a typed object.

OpenAI’s Structured Outputs can bind model responses to a supplied JSON Schema. Structured Outputs constrain responses to the schema and reduce omitted required keys or invalid enum values. All fields must be required, with null used when an optional value is needed.

For matter routing, that means the alert object should contain the noise decision, matched matter IDs, official source event type, regime tags, urgency, action flag, deadline if present, affected entities, and confidence. The exact field names are implementation detail. The research point is that the model returns a decision record, not loose prose.

A strict decision object also blocks vocabulary drift. Without it, the model may emit MiCA/crypto, MiCA, crypto-assets, or CASP rules as if they were interchangeable. A matter-route system uses one canonical term set for machine routing and lets the user interface show the official legal name.

Testing the matter route

A matter-route system cannot be tested by reading a handful of finished alerts. A clean paragraph can hide a missed event. A correct event can still receive the wrong matter. A correct matter can still reach the wrong audience.

The eval set must test the route in pieces. The first test asks whether the source parser found the event. The second asks whether the event matched the right matter set. The third asks whether the model wrote a valid decision object. The fourth asks whether delivery deduplication sent one alert per intended recipient.

OpenAI’s eval guidance warns against “vibe-based evals” and calls for a defined objective, a dataset, metrics, repeated runs, and continuing evals as the system changes. It also notes that eval design should cover production-like data, domain-specific examples, edge cases, and adversarial cases.

For Horizon, the test set must include official change units, not only natural-language snippets. It should contain register row additions, withdrawn entries, PDF replacements, linked-document additions, legal-text changes, non-English authority text, and negative controls: reordered menus, cookie banners, stale caches, and speeches with no operative change. The private eval examples should stay private because they expose the system’s weak spots and source priorities.

Why this is not ordinary RAG

Matter-route inference is not the same as broad retrieval over an indexed library. A vector store can help answer questions over a known corpus. It does not by itself decide whether an ESMA CSV row, an EBA RTS status change, a Commission page update, and a national authority PDF belong to the same user matter.

The hard step is before semantic search. The system must decide which official channel owns the change, what unit changed, which matter routes can claim that unit, and whether the user should receive it. Semantic matching can support that path. It should not replace source ownership, source class, event identity, and jurisdiction.

Matter-route inference: focus areas as retrieval constraints for official sources

The topic-label failure

Matter route, not keyword query

Event first, model second

The matter ledger

Official source classes change the retrieval test

Structured decisions

Testing the matter route

Why this is not ordinary RAG

More from the journal

Ireland Publishes Regulation of Artificial Intelligence Bill 2026 to Implement EU AI Act

FCA Publishes Final Cryptoasset Rules Setting UK Regime Effective October 2027

Senator Warner Releases Discussion Draft of AI AGENT Act on 29 June 2026

Ready to launch without the regulatory guesswork?

Try Licentium AI

Browse the Fintech Licensing Hub

Talk to us