GraphQL Optimization: Batching and Combining Requests
GraphQL introduces a declarative data layer that promises to speed the development of frontends. Much like the way relational databases separated the logical schema from the physical schema opening up a new world for data independence and access optimization, GraphQL provides data independence between frontend data consumption from backend data retrieval. This capability allows the GraphQL engine to have a holistic view of the data needs of the entire application that may be co-developed over time by multiple programmers.
We have identified opportunities for optimizing GraphQL and suggested several techniques for minimizing backend requests when an application is using GraphQL— we've explored the N+1 problem, tackled deduplication and reuse techniques, and examined the role of caching and prefetching fields in optimizing your GraphQL system. Here we'll take a look at techniques to batch and combine requests.
Variations of these techniques have been used previously throughout the hardware and software stack. While a few of them can be effectively implemented by bespoke resolvers, such an approach will be limited to the extent that it can optimize, as it will not have full visibility into the data model.
Batching
Batching is the ability to take a number of individual backend requests (typically after deduplication) and send them as a single request to the backend. Consider the following query:
{
greene: authors(name:"Greene") { name birthplace email}
huxley: authors(name:"Huxley") { name birthplace email}
orwell: authors(name:"Orwell") { name birthplace email}
}
A single backend request to return details for all three authors simultaneously would save multiple requests (in this case two) to the same backend. To leverage this optimization the backend must be able to support multi-valued parameters. Fortunately, this type of capability is fairly straightforward with SQL queries, REST calls and GraphQL endpoints as the following examples demonstrate.
SQL databases: The SQL used to define the API to return information for a single author can easily be rewritten to use an IN list, a temporary join table or less elegantly, to submit multiple SQL statements in the same client request.
REST calls: REST APIs can support multi-valued parameters by simply repeating the query parameter, such as .../authors?name=Greene&name=Huxley&name=Orwell
,
or providing a different endpoint that accepts a list of names in a POST body instead of a path element author/<name>
.
GraphQL endpoint: for GraphQL, we can simply include multiple top-level field selections in the operation, such as:
{
batch001: authors(name:”Greene”) { name birthplace email }
batch002: authors(name:”Huxley”) { name birthplace email }
batch003: authors(name:”Orwell”) {name birthplace email }
}
A less obvious requirement for the backend is that the response to the widened request must preserve the mapping from the requested objects to their results so the GraphQL server can associate the returned results with the request parameter. In our example, we return name in the result type because the result needs to map the returned book lists to their associated author, and the GraphQL server can then use this mapping to build its result.
In SQL this is easy as the request parameters can be added to the rows returned in the results, but this may not be readily available in other APIs. For example, a weather REST API may take input lat/long coordinates, but instead of returning the lat/long they were given, may return the lat/long of a weather station or grid point. There is precedence to formulate such responses introduced to support aggregation in XML and JSON.
Combining
As its name suggests, combining requests from different levels into a single request from the backend. This requires that the GraphQL server understands which requests are from the same backend, and can be combined into a single request.
Consider, in our running example, how the books
field of the Author
type might be resolved:
type Author {
…
books: [Book]
@materializer(
query: “bookByAuthor”
arguments: [{name: auth_id”, field: “id”}]
}
The @materializer
directive tells us that the books for a given author are those that satisfy the booksByAuthor query, with the auth_id of the query matching the author’s id.
The GraphQL operation:
{
author(id: 1) {id name books {title}}
}
Would naively be satisfied by first requesting id and name from the author backend, followed by a request for books with the given auth_id from the books backend. If both backends were databases, this would result in the following sequence of database queries:
1. SELECT name, id FROM authors WHERE id = 1
2. SELECT title FROM books WHERE auth_id = 1
If both of these backends were from the same database, then we could combine these two requests into one:
SELECT A.name, B.title
FROM authors A, books B
WHERE A.id = 1 AND B.author_id = A.id
While this kind of combined request is easily possible with a SQL database and can be supported by some REST APIs, it is not supported by all REST APIs, and careful consideration must be given. For example, getting the pinned tweets for a Twitter user along with their details is possible, but other APIs will require additional endpoints.
Similar to batching, when such requests are combined, the GraphQL server needs to be able to unpack the response into the required object field structure.
Conclusion
At StepZen, we are adding a unique, declarative way to build and run GraphQL APIs accessing REST, database and GraphQL backends. This declarative approach to the schema definition language (SDL) gives us more context, such as the relationships of fields to backends, their types and capabilities. This visibility increases the opportunities to optimize. Furthermore, we can implement these optimizations behind the scenes without burdening the schema developer or the backend services. The schema developer simply describes the data and the linkages, and StepZen does the rest.
We are just scratching the surface of the potential optimizations and data independence we can provide with GraphQL. Just like SQL optimization evolved from flexible index definitions, simple predicate pushdown, cost-based join optimizations and query rewrite engine, we believe GraphQL optimization will evolve with the needs and opportunities the data independence layer provides.
Feedback & questions
If you jumped in here at part 4, here's what we explored in previous posts:
- GraphQL Optimization: It’s More Than N+1
- GraphQL Optimization: Deduplication & Reuse
- GraphQL Optimization: Caching & Prefetching Fields
As you may have guessed, we love to talk about performance :-) If you have any questions or feedback or want to discuss a performance challenge, we’d love to connect. Drop us a note via this page or drop in to our Community Discord.