January 8, 2019 · 12 min read

Service Discovery in a Microservices Architecture with Travis Scheponik

Travis Scheponik is a Master Software Engineer, currently with Capital One. His previous decade of experience has included work in finance, government, education, and a series of multi-industry startups. His areas of expertise include legacy system conversion, cybersecurity, and application performance, as well as recent projects in service discovery, a topic we delve into in detail in this episode.

Travis Scheponik

Master Software Engineer at Capital One

Travis Scheponik is currently employed by Capital One as a Master Software Engineer where he contributes to the authorizations platform. His background includes working for the Department of Defense, education technology startups, and finance. He is currently pursuing a Ph.D. at UMBC (University of Maryland Baltimore County) in Computer Science where his primary research focus is cybersecurity. He believes that complex fields such as cybersecurity and software engineering need to focus on “hand washing” techniques, such that the basics are universally agreed upon.

Read transcript

Ledge:

Travis, welcome! Thanks for joining us. We’d love your two-minute story for the audience.

Travis:

Thanks a lot for having me today, David. I’m currently at Capital One. I’ve been there for a little bit over a year. I have extensive experience in the Department of Defense national security arena.

David:

Travis, you and I spoke off-mic about service discovery in your work at Capital One. We can use that as a starting point.

Travis:

Service discovery, very briefly, is just a way that developers and product owners, for example, can find new and exciting code implementations so they can build that new functionality.

Everybody builds out their own little microservice that does “I want to query this customer database and get some small pieces of information about the customer.”

At a certain point, you’ll have an entire collection of microservices that are all independent that way. They can scale independently and nobody worries about things crashing or anything of that nature.

Then, from the service discovery standpoint, you’d want to be able to stitch those things together in a meaningful way; and the way you do that is through techniques and service discovery.

This can be as simple as when you launch your microservice, you have some header information that goes to what I’ll call the “mother ship of sorts.”

What that will do is say, “Hey, Service X, I can now capture this piece of customer data and anybody else who wants to use this piece of customer data, just go right here to our registry and, from there, you can start stitching things together.”

That’s kind of at a very high level and very broadly touching on service discovery and the whole registry concept.

David:

This appeals to me because I come from an enterprise knowledge management background, and this is very similar to that type of idea. We know someone throughout this organization had an idea, probably developed an intellectual property, and maybe delivered a solution already that we’re now going to pay money to rediscover; or we want to resell something to somebody else; we want to solve an existing problem.

But how do we know whether or not something existed? And if something did exist already, was it the same problem? How do you search the library of all the things that you can do? And once you find it, is that actually the same thing that I want to do? Also, am I allowed to even do that or should I build my own?

It’s like sort of this internal builder buy kind of discussion. I imagine that there are all kinds of human and organizational problems that you need to deal with in this domain.

Travis:

You’re absolutely correct, David. The whole tension around service discovery ─ I’ve worked with many engineers over my time as part of teams and as consulting with other teams. I find that we all believe that we can just do slightly better or, in some cases, better than our predecessors before us.

So, a lot of times, people want to go and oil the ocean and rebuild everything from the ground up because the microservice that’s available doesn’t solve this very niched problem that can’t be generalized. And that seems to be the line of thinking that happens a lot with these microservice architectures and registries, in general.

What I find most teams have to do is they’d say, “1.0 provides 90% of the functionality. And, now, we’ve decided we can either build a Version 1.1 and begin to be the team that takes ownership of this new and exciting version and have to deal with the entire support tail that comes with it or they build something completely new in a bubble and say, ‘We’re the only the team that needs to build this.’”

So, now, we’ll have duplicate code, duplicate capability. And when the business, comes in and changes their mind, we now have this situation where I now have to patch multiple microservices.

You start getting into these walled garden-style scenarios where teams become so entrenched in saying, “Well, we’ve done it better but we don’t want to share with anybody because, then, we have to support this at this enterprise level.”

A way to kind of make that less difficult on teams is where they believe, “This microservice doesn’t solve my program.”

I found teams that will publish services to a registry. If they also provide the source code whether that’s a GitHub link or Bitbucket or whatever, it will enable these other teams to say, “Hey, instead of us taking this whole ownership thing, let’s help this team move forward start solving this new business problem.”

So they get into the whole developer culture and engineering culture where we can say, “Here, I’ll give you a full request. I don’t want to own this problem set. You own this problem set. I’m willing to write all the code; I just don’t want the enterprise governance that lives with kind of running these things.”

In those situations, it may actually be a case where the way the governance is rolled out has rough edges that could be smoothed over. And if those edges were smoothed over, maybe more teams would be willing to engage in these kind of “Let’s share everything. Let’s all take the shared experience of managing enterprise level systems and enable our partners whether they’re internal partners or external partners to be successful.”

David:

We’re hooked up in the open source community and when you see the discussions around open source, often, it’s around “Wow, it’s so difficult to be an owner because everybody wants you to sort of merge their PR and everybody wants you to do all the things and handle the support and all that.”

And that’s really where the friction happens. You actually see project-owner burnout because they can’t distribute the ownership burden. It really isn’t about the code in all those cases. It’s actually about the ancillary support that is really the human problems around the code.

Travis:

Yes, I absolutely agree.

The code is so beautiful all by itself. Then, the users get a hold of it or the other teams get a hold of it. And then, it just falls apart.

And much like any open source community in some of my experience ─ for example, there was one case where we were trying to get FIPS compliance because we were on some government system. But this open source project was kind of hosted out of Europe and they were like, “We don’t care about FIPS.” And I was like, “It would be super if we could use this.”

And they were like, “We’ll just use something else or just monkey patch this RubyGems.” And I was like, “Yes, but now when you patch ─ I can’t get the next version of Ruby and I’m in this constant cycle.”

It’s all about trying to show that you’d want to be a part of the community. You’d want to try to make it as easy on the “maintainer” or the “product owner” because you are right. Eventually, those people will leave. They’ll find new projects. Nobody is going to want to pick those things up and keep running with them.

As they start talking with the previous product owner, they’ll say, “What happened here? How did we get here?”

With any project no matter how large or small, the hand-off piece can also impact the way some of these other governance-related issues creep in.

David:

Microservices are largely supposed to solve the organizational abstraction and being able to work on and not have to release your entire monolith in, let’s say, traditional C++ sort of object-oriented programming. You would have this concept of master class that you would, then, be able to extend and not break the master.

Why don’t services adhere to ─ and is there any infrastructure or architecture to think about services in that way where I can rely upon a master architecture of a grand service but, then, do some extensions on my own without having this problem?

Travis:

From the early to mid-2000s where we learned a whole of lessons that services are great but they shouldn’t be of code when we push them out.

In my experience with microservices, what you actually find is that duplicated code is completely okay because in the master-child class relationship that you were describing, what we find is we have these libraries that get passed around. The problem, then, moves to “Now I have this master library that I have to kind of patch and then the distribution problem goes up a level.”

In the microservice way, the current trend seems to be “Oh, let’s just fix this 60-line deployment instead of trying to get this shared library.”

So I think the trend that I’ve seen is “Yes, let’s go for code duplication.” I’ve seen that libraries really come into play when you’re talking about very specific problems I’ve seen.

Security libraries, for example: There are very few groups that should be out there rolling their own crypto or security packages. And those are things that should be shared.

As I’ve said, now you get to the point where you avoid code duplication at the cost of flexibility and replacement of microservices because now what you can do instead of having to worry about this whole deprecation chain on a library where the master pattern gets applied across is say, “Ruby is no longer a language that we care about. Hand this off to some new team. Get some new features. Now, it’s in whatever the language of the day happens to be.”

David:

You really are still down to the classic challenge of “Do I centralize or do I decentralize? What are the tradeoffs one way or the other?”

You continue to pay that organizational bill. If I decentralize greatly, then I deal with duplication and siloing. If I centralize greatly, then I deal with sort of the challenge of the code monolith. It doesn’t ever seem to go away there. It’s just sort of a pendulum balance kind of thing based on the technology that is currently available and in which version it supports better.

Does it resonate? It sounds like that’s the kind of situation you get in.

Travis:

When I first started, everything was client server, client server forms. Then, it was “Oh, push as much to the client as you possibly can because clients are so powerful now we could not envision a time when we had to do server-side computing again.

Now, we have all of these cloud vendors out there who say, “Oh, now shove everything back to the server. We’ll do all the computing.” The cycle I’ve seen come back up now is the whole client-server model just in general.

Going back to something we were talking about previously, the server, in this case, would be that service registry and then all the clients ─ dumb clients, if you will ─ will now connect to this and say, “Oh, here, use this small aspect that.”

I try to conceptualize it almost like if you had a ping pong ball and you were moving the ping pong ball. That’s the current packet that you’re tracing as it goes between all of these different clients and, eventually, they get back into this overseer registry.

I think you’re absolutely right. The current technology says, “Server-side computing is essentially free. Storage is free. So let’s load everything on the server.”

Now, we’re coming back to the client-server model. I think we’ll see this cycle in another five years where “Oh, who would envision having a server in the cloud? Now, shove that all on this mobile phone that has a hundred gigs of RAM and the server will get to it when it gets to it.”

David:

Maybe there is this sort of grand conversion where every cell phone is really just an unlimited node on the cloud.

You’re experiencing and talking about these things from a discoverability standpoint from the enterprise huge multi team environments. Yet, these same things are going to happen to any organization probably once you weave five or six developers and even maybe break into sub teams.

And, certainly, if you’re in microservices architecture from the standpoint of rolling out a scalable cloud app, this stuff needs to be thought about early on to avoid becoming a total technical debt disaster that can’t scale and can’t roll out quickly. It’s not just a large company problem.

Travis:

Ultimately, you will hit that point of either team scale or code-based scale or service scale where you need this kind of centralized location where other teammates can pull information.

But it reminds me of whenever in school I would have to write a paper. It’s like if you write the paper first and then do the outline later, of course, it’s easy to say, “Of course, this is what we needed. Look how my outline matches my end result.”

But in the startup space where I’ve had some experiences in the past, you don’t want to be overbearing and miss that key time-to-market aspect.

Sure, it’s great. We have this awesome service discoverability, this whole governance thing. We’ve made negative a hundred dollars this week. We might be in trouble. The judgment that has to be applied is that once you get out of that team mode, once you start getting to more than two teams or three teams, then, it’s like,

well, let’s start trying to solve these communication problems.

As we know, its scale is super linear. It’s not just one to one. I forget how that grows exactly but it’s not as clean as one would hope.

David:

As the communication channels measurements of multiple people and minus one or something like that.

Travis:

Exactly!

David:

We’d have to look that up for the show notes but I believe that’s the calculation. It becomes this constellation of ─ basically, you don’t yell across the garage anymore, “I need that thing.” And every function of the business faces that so it’s neat as a business thinker to see some of that technology just keep running into the same things. It’s not going to solve all your problems and we still to act like people and collaborate.

Travis, any finishing thoughts for our audience of professional freelancers? I know you’ve been in the game yourself.

Travis:

The one thing I can offer in the whole umbrella that we’ve been chatting under is that it’s all about speed to market. If you can be fast and stable, you’re going to have super value add. It will be rough at times but, eventually, you’ll make it through the journey.

That’s how I always perceive these kind of difficult problems.