HTTP Caching

When grammars are supplied via URI, the grammar service fetches their content over HTTP. In high-throughput environments where the same grammars are requested hundreds of times per second, fetching the full content on every request is prohibitively expensive. To address this, the grammar service implements a multi-layered caching strategy designed to minimize network traffic and reduce grammar load times.

This article describes the steps the service takes when fetching a grammar by URI, and how caching is applied at each stage.


Step-by-Step: What Happens When a Grammar URI Is Fetched

1. The grammar service receives a load request containing a grammar URI

The URI may be an HTTP or HTTPS URL pointing to the grammar content on a remote server. For example:

http://grammars.example.com/en-us/navigation.grxml

2. The URI is parsed for embedded parameters

Before fetching, the service checks the URI for query parameters. Any key-value pairs after the ? are extracted and applied as request settings. Parameter names are case-insensitive (normalized to uppercase internally).

The parameter that controls HTTP caching behavior is:

  • Cache-Control — standard HTTP cache control directives (e.g., no-cache, max-age=300)

Other query parameters such as Fetch-Timeout (network timeout) and SSL_VerifyPeer (certificate validation) are also recognized, but these affect the connection rather than caching.

For example, a URI like:

http://grammars.example.com/grammar.grxml?Cache-Control=max-age%3D600

would set a max-age of 600 seconds (10 minutes) for that grammar's cache freshness window. Note that because = has special meaning in query strings, the = between max-age and the value must be percent-encoded as %3D per standard URL encoding rules (RFC 3986).

The full URL (including the original query string), the Cache-Control value, and the other extracted parameters (Fetch-Timeout, SSL_VerifyPeer) are all passed to the HTTP handler. Cache-Control drives caching behavior, while the remaining parameters are applied to the connection itself.

3. The HTTP handler checks its in-memory cache

The service maintains an in-memory cache of previously fetched URLs. When a fetch is requested, the handler first checks whether it already has a cached response for that exact URL (including query string).

If the URL is found in cache and the entry is still fresh, the cached content is returned immediately. No network request is made. This is the fastest path and produces zero network overhead.

Freshness is determined as follows:

  • If a max-age value was provided via the Cache-Control parameter, the entry is considered fresh if it was cached less than max-age seconds ago.
  • If no max-age was specified, the service uses a default cache lifetime of 300 seconds (5 minutes). This default can be adjusted via the HTTP_HANDLER_SETTINGS__DEFAULT_CACHE_LIFETIME_SECONDS environment variable.

4. If the cache entry has expired, the service attempts a conditional request

When the cache entry exists but is no longer fresh, the service does not immediately download the full grammar. Instead, if the original response included an ETag header, the service sends a conditional HTTP GET request with an If-None-Match header containing the cached ETag value.

  • If the server responds with 304 Not Modified — the grammar content has not changed. The service reuses the cached content, resets the cache timer, and returns the data without transferring the grammar body. This is significantly more efficient than a full download, especially for large grammars.
  • If the server responds with 200 OK — the grammar content has changed. The service downloads the new content, updates the cache with the new body, headers, and ETag, and returns the fresh data.

5. If there is no cache entry at all, a full HTTP GET is performed

For the first request to a given URL (or after the service restarts), the full grammar content is downloaded. The response body, headers, and ETag (if present) are stored in the in-memory cache for subsequent requests.

6. The fetched content is returned to the grammar engine for compilation

After the HTTP handler returns the grammar content, the grammar engine processes it normally — parsing, compiling, and caching the compiled result in its own separate grammar cache for use by ASR and SISR operations.


Summary of Caching Behavior

ScenarioNetwork CostWhat Happens
URL in cache, entry is freshNoneCached content returned directly
URL in cache, entry expired, server supports ETagMinimal (headers only if 304)Conditional request; body skipped if unchanged
URL in cache, entry expired, no ETag supportFull downloadNew content fetched and cached
URL not in cacheFull downloadContent fetched, cached for future requests

Configuration Reference

SettingWhere SetDefaultDescription
Cache-ControlURI query parameter or session settings(empty)Standard cache control directives. max-age=N sets the cache freshness window in seconds. no-cache bypasses the cache.
HTTP_HANDLER_SETTINGS__DEFAULT_CACHE_LIFETIME_SECONDSEnvironment variable300 (5 minutes)Default freshness lifetime when no max-age is specified in the request

Important Notes

Cache keys include the full URL. Two URLs that differ only in query parameters (e.g., ?v=1 vs ?v=2) are treated as separate cache entries.

The max-age value used for freshness decisions comes from the request configuration, not from the remote server's response headers. This is by design: the grammar service's caching layer enforces predictable caching behavior in environments where the remote server may not provide cache directives. To control freshness, set max-age via the grammar URI query parameters or session-level configuration.

ETag support on the grammar server is strongly recommended. When the remote server includes ETag headers in its responses, the grammar service can validate cached content with minimal network overhead (a conditional request returning only headers). Without ETag support, every cache expiration results in a full re-download of the grammar content. For high-throughput deployments with large grammars, this difference can significantly impact load times and overall throughput.

The default 5-minute cache lifetime is a conservative baseline. For environments with stable grammars that change infrequently, increasing HTTP_HANDLER_SETTINGS__DEFAULT_CACHE_LIFETIME_SECONDS can further reduce network traffic. For environments where grammars change frequently, setting an explicit max-age via query parameters gives fine-grained control per grammar.


Was this article helpful?