Nesting GitHub’s API in your GraphQL Schema

GraphQL is great. Every GraphQL endpoint agrees to speak the same language of {data, errors} and that makes communication between servers easy.

Now suppose two public APIs both speak GraphQL; what advantages can we leverage? Packages like graphql-tools make it easy to merge and stitch schemas together, which allows teams to build out separate parts of their subschemas via join and union functionality.

But what about 3rd party schemas that have their own authentication, rate limiting, and errors? For example, take a look at GitHub’s GraphQL API. Wouldn’t it be great if you could nest GitHub’s endpoint in your own schema?

In a single GraphQL query, you could get the user’s name from your application as well as their bio from GitHub:

Of course this is only the beginning. A client could update your app and GitHub without any extra backend logic. Just good ‘ol GraphQL:

With nest-graphql-endpoint, this is finally possible .

The Business Case for Nested Endpoints

Building a successful SaaS today means meeting customers where they are — integrating against the tools they already use. I’ve built integrations for Slack and Atlassian, but when it came to building a GitHub integration, I noticed I was re-creating a lot of the logic that GitHub’s API already had built-in.

For example, our planning poker meeting fetches all the team’s stories from GitHub, provides a fun, immersive way to score each story, and exports the scores back out to GitHub. Without nesting GitHub’s schema, I made my own GitHubIntegration object that had a repos field. That field had a custom resolve function that fetched the repos from GitHub by using a handwritten GraphQL string. This wasn’t great for a few reasons:

It’s extra code that I have to maintain
The query doesn’t expressively show front end devs which parts comes from GitHub
Without a dataloader, multiple queries can cause multiple fetches to GitHub
Even with a dataloader, multiple fetches to GitHub are unavoidable unless every query is identical (thus overfetching)

What I needed was a way to batch all the fragments going to GitHub, merge them into a single network request, and then parse the response into their corresponding fragments again. Why batching? Because in the real world, queries can get pretty large & often repeat themselves:

How it Works: nest-graphql-endpoint

For all of this complexity, I wanted a way to nest endpoints in my schema with just a single line of code:

That’s it! Your User object now has a githubApi object that includes queries, mutations, and errors. resolveEndpointContext allows you to fetch and provide necessary keys to access the endpoint.

Behind the scenes, here’s how it works:

It fetches the GitHub schema & prefixes all the __typename fields so you can write your query without worrying about naming conflicts
It collects all the fragments inside the gitHubApi objects & merges them into a single query
It prunes unused variables, variable definitions, & fragments
It un-prefixes the __typename fields so GitHub understands the query
In the event of a name conflict, it will alias fields before the request is fetched
It de-aliases the response, re-applies the __typename prefix, and filters the errors by path

Let’s see how it’s built.

Building the GitHub Schema

GitHub offers a great package called @octokit/graphql-schema. It provides a GitHub schema that is guaranteed to be up to date so I don’t need to asynchronously fetch the introspection schema. Then, I use graphql-tools’ wrapSchemato rename the types with my prefix. wrapSchema internally adds a proxying resolver that calls their delegateToSchema function. Since we’re handling all the fetching ourselves, we can overwrite that resolver with the default GraphQL resolver:

Finally, we use graphql-tools to merge our wrapped schema into our parent schema. That gives us an object extension that looks like this:

Note that errors is its own field, even though it will be populated by the response of query or mutation. By design, GraphQL makes it seemingly impossible for 1 field to populate the response of another. To get around this, the errors field returns a promise & exposes the resolve callback to the other operations by mutating the source. You can call it hacky, but I think it’s pretty clean .

Batching the GraphQL Fragments

Before the days of Urql, Apollo, and Relay Modern, I wrote a bad GraphQL client cache called Cashay. While the project didn’t go anywhere, it taught me a bunch about the GraphQL AST. For example, traversing an AST is painful, but GraphQL has a node visitor function built-in!

Let’s suppose our query has a bunch of variables, but only some of them need to be sent to GitHub. How do we figure out which variables to prune? It’s as simple as:

This same pattern can be repeated for fragments, variable definitions (the chunk that looks like $foo: String!, $bar: ID), and even the __typename fields that we prefixed. Once each fragment is refactored into a standalone query, it is time to batch as many together as possible and merge them.

To accomplish the batching, we use a dataloader with the caching functionality turned off. Why no cache? Each request will have a query that is a little different than the others. For example, one fragment might ask for viewer {id} while the other asks for viewer {id, bio}. We want to merge those together, and if we cached based on the key, then they’d be kept separate.

That said, we want to reuse the same dataloader for the entire execution, so we keep all the dataloaders in a WeakMap where the key is the context because a new context is created for each call to GraphQL.execute. By using a WeakMap, we are preventing a memory leak because as soon as GraphQL.execute no longer references the context, it will be garbage collected from the WeakMap, too.

Once a single tick has passed, dataloader calls its batch function with an array of queries & variables that we can merge together. First, we merge all the fields that have unique names. Then, if two fields share a name, we compare all their children. If the two fields are different, we alias one of them:

This strategy allows us to batch an endless number of fragments together into one network request. We just have to keep a list of the aliases we added so the final response looks just like the #Before request.

Handling the GraphQL Endpoint Response

When GitHub responds, it might not be a GraphQL object. GitHub could be down, or the gateway could take too long to respond, or if the auth token was invalid, it might just send{message}. To handle those cases, the executor wraps the fetch with a timeout & if the response doesn’t look like a GraphQLExecutionResult, it will coerce it into one.

Once we have something in the shape of {data, errors}, all that’s left to do is create one response object for each fragment. That means de-alias the fields that we renamed, re-prefix the __typename fields, and filter errors by fragment. Filtering errors is easy because most errors have a path that shows where the error occurred. For example, if the path is['viewer', 'repository_2'] , then we know the error should only appear in the 2nd fragment.

Conclusion

Nesting GraphQL endpoints is the next step in GraphQL’s world domination. In a future where every service uses a GraphQL schema, integrations will be a breeze to implement and require a bunch less code. Sound like fun? We’re hiring. Let’s build cool stuff together.

Matthew Krick

Matt is a full-stack web developer, data scientist, and global project manager. He has previously worked for Peace Corps, Ecova, and Boeing, and is the creator and lead developer of several open-source projects including Meatier and Cashay. Matt lives and works in San Diego, CA.