Nesting GitHub’s API in your GraphQL Schema
GraphQL is great. Every GraphQL endpoint agrees to speak the same language of {data, errors}
and that makes communication between servers easy.
Now suppose two public APIs both speak GraphQL; what advantages can we leverage? Packages like graphql-tools make it easy to merge and stitch schemas together, which allows teams to build out separate parts of their subschemas via join and union functionality.
But what about 3rd party schemas that have their own authentication, rate limiting, and errors? For example, take a look at GitHub’s GraphQL API. Wouldn’t it be great if you could nest GitHub’s endpoint in your own schema?
In a single GraphQL query, you could get the user’s name from your application as well as their bio from GitHub:
Of course this is only the beginning. A client could update your app and GitHub without any extra backend logic. Just good ‘ol GraphQL:
With nest-graphql-endpoint, this is finally possible .
The Business Case for Nested Endpoints
Building a successful SaaS today means meeting customers where they are — integrating against the tools they already use. I’ve built integrations for Slack and Atlassian, but when it came to building a GitHub integration, I noticed I was re-creating a lot of the logic that GitHub’s API already had built-in.
For example, our planning poker meeting fetches all the team’s stories from GitHub, provides a fun, immersive way to score each story, and exports the scores back out to GitHub. Without nesting GitHub’s schema, I made my own GitHubIntegration
object that had a repos
field. That field had a custom resolve
function that fetched the repos from GitHub by using a handwritten GraphQL string. This wasn’t great for a few reasons:
- It’s extra code that I have to maintain
- The query doesn’t expressively show front end devs which parts comes from GitHub
- Without a dataloader, multiple queries can cause multiple fetches to GitHub
- Even with a dataloader, multiple fetches to GitHub are unavoidable unless every query is identical (thus overfetching)
What I needed was a way to batch all the fragments going to GitHub, merge them into a single network request, and then parse the response into their corresponding fragments again. Why batching? Because in the real world, queries can get pretty large & often repeat themselves:
How it Works: nest-graphql-endpoint
For all of this complexity, I wanted a way to nest endpoints in my schema with just a single line of code:
That’s it! Your User
object now has a githubApi
object that includes queries, mutations, and errors. resolveEndpointContext
allows you to fetch and provide necessary keys to access the endpoint.
Behind the scenes, here’s how it works:
- It fetches the GitHub schema & prefixes all the
__typename
fields so you can write your query without worrying about naming conflicts - It collects all the fragments inside the
gitHubApi
objects & merges them into a single query - It prunes unused variables, variable definitions, & fragments
- It un-prefixes the
__typename
fields so GitHub understands the query - In the event of a name conflict, it will alias fields before the request is fetched
- It de-aliases the response, re-applies the
__typename
prefix, and filters the errors by path
Let’s see how it’s built.
Building the GitHub Schema
GitHub offers a great package called @octokit/graphql-schema. It provides a GitHub schema that is guaranteed to be up to date so I don’t need to asynchronously fetch the introspection schema. Then, I use graphql-tools’ wrapSchema
to rename the types with my prefix. wrapSchema
internally adds a proxying resolver that calls their delegateToSchema
function. Since we’re handling all the fetching ourselves, we can overwrite that resolver with the default GraphQL resolver:
Finally, we use graphql-tools to merge our wrapped schema into our parent schema. That gives us an object extension that looks like this:
Note that errors
is its own field, even though it will be populated by the response of query
or mutation
. By design, GraphQL makes it seemingly impossible for 1 field to populate the response of another. To get around this, the errors
field returns a promise & exposes the resolve
callback to the other operations by mutating the source
. You can call it hacky, but I think it’s pretty clean .
Batching the GraphQL Fragments
Before the days of Urql, Apollo, and Relay Modern, I wrote a bad GraphQL client cache called Cashay. While the project didn’t go anywhere, it taught me a bunch about the GraphQL AST. For example, traversing an AST is painful, but GraphQL has a node visitor function built-in!
Let’s suppose our query has a bunch of variables, but only some of them need to be sent to GitHub. How do we figure out which variables to prune? It’s as simple as:
This same pattern can be repeated for fragments, variable definitions (the chunk that looks like $foo: String!, $bar: ID
), and even the __typename
fields that we prefixed. Once each fragment is refactored into a standalone query, it is time to batch as many together as possible and merge them.
To accomplish the batching, we use a dataloader with the caching functionality turned off. Why no cache? Each request will have a query that is a little different than the others. For example, one fragment might ask for viewer {id}
while the other asks for viewer {id, bio}
. We want to merge those together, and if we cached based on the key, then they’d be kept separate.
That said, we want to reuse the same dataloader for the entire execution, so we keep all the dataloaders in a WeakMap
where the key is the context
because a new context
is created for each call to GraphQL.execute
. By using a WeakMap
, we are preventing a memory leak because as soon as GraphQL.execute
no longer references the context
, it will be garbage collected from the WeakMap
, too.
Once a single tick has passed, dataloader calls its batch function with an array of queries & variables that we can merge together. First, we merge all the fields that have unique names. Then, if two fields share a name, we compare all their children. If the two fields are different, we alias one of them:
This strategy allows us to batch an endless number of fragments together into one network request. We just have to keep a list of the aliases we added so the final response looks just like the #Before
request.
Handling the GraphQL Endpoint Response
When GitHub responds, it might not be a GraphQL object. GitHub could be down, or the gateway could take too long to respond, or if the auth token was invalid, it might just send{message}
. To handle those cases, the executor wraps the fetch with a timeout & if the response doesn’t look like a GraphQLExecutionResult
, it will coerce it into one.
Once we have something in the shape of {data, errors}
, all that’s left to do is create one response object for each fragment. That means de-alias the fields that we renamed, re-prefix the __typename
fields, and filter errors by fragment. Filtering errors is easy because most errors have a path
that shows where the error occurred. For example, if the path is['viewer', 'repository_2']
, then we know the error should only appear in the 2nd fragment.
Conclusion
Nesting GraphQL endpoints is the next step in GraphQL’s world domination. In a future where every service uses a GraphQL schema, integrations will be a breeze to implement and require a bunch less code. Sound like fun? We’re hiring. Let’s build cool stuff together.