Pragmatic Microservice Design: keep calm, and ship code with Daniel Knight
In its purest form, the microservices architectural pattern tells us to “dream small” while designing services, which is all well and good, but how small is still useful?
Pragmatic microservice design helps draw some useful boundaries, and keeps our eyes on shipping product.
In this episode we talk to Director of Engineering Daniel Knight about breaking down business problems into known objectives in the business and technical domain, and taking the focus off trying to solve problems we don’t yet understand.
In that way, product teams can focus on the WHAT while engineering teams can focus on the HOW. It’s all about balance.
Ledge: Daniel, it’s good to have you here. Thanks for joining us today.
Daniel: Thanks, Ledge! I’m happy to be here.
Ledge: Would you mind giving a two- to three-minute intro of yourself and your work just so the audience can get to know you.
Daniel: Sure! I work for a company called “Red Ventures.” I work in the Austin office. My job is to be the director of engineering for our Austin platform team which we call the “cards platform team” because our team manages all the financial products for the cards division of Red Ventures.
I don’t know if you’re familiar with Red Ventures but we have a ton of divisions, cards being one of the more significant pieces.
So my day to day really is about mentoring and coaching developers, doing architecture reviews, doing code reviews as well as giving developers and engineers opportunities to grow and work with architecture that they might normally not work with.
As a team, we’ve decided on a few core principles one of which being standards-compliant architecture. We can get into more details about that if you would like.
We also have adopted microservice architecture and we’ve kind of taken a pragmatic approach to that. We think we’re really on to something great there.
Ledge: Please walk us through those steps. It would really be interesting to hear about each of those areas because I know that your career has evolved from software engineer to architect to leadership so you probably have a lot of interesting opinions and experiences around each of those to get to that pragmatic approach. I think that’s a great place to go.
Daniel: Yes. If you read a lot on microservice architecture, I think your imagination can run pretty wild with it. A lot of people do what I like to call “dream small” which is dream as small as you possibly can.
I do that, too, but I also grapple with the reality that I have to deliver a product at the end of the day.
So how small can I dream?
One of the things that the team does is we really take a pragmatic approach to microservice architecture in that we try to solve the problems that we know we have. We don’t try to solve the problems we don’t know we have.
Some of those problems are that ─ and some of your listeners may disagree ─ we don’t know what scale we need. So do we out of the gate try to optimize for 20,000 VPS?
I would say that in our approach being more pragmatic, we wouldn’t try to shift that to 20,000 VPS. What we would do is try to solve the problem and then see what scale we need.
We also approach the problem domain a little uniquely in that we really look at what we know about the problem domain before we make any decisions on the architecture. Every developer needs to understand and be able to talk and really problem-solve within the problem domain.
Then, we kind of look at responsibilities within there. What are some key functionalities inside that problem domain?
If we can identify some key functionalities and we can get creative and try to make it self-contained, if we can make it decoupled, then we’ll do that; that becomes a microservice for us.
To give an example, when we were asked to basically take a monolith application that was responsible for a pretty massive ETL project ─ ETL being extract, translate, and load ─ we really looked at it and said, “What are the responsibilities within that problem that we have to really get right?”
So we decided to take on the extraction side and make that into a market service.
Since we have to do a lot of different interfaces without external partners, we needed to be able to translate those files as well into a common data format. And then, we had to load them into our system.
Those seem like very natural breakpoints for our microservice architecture. Again, we’re trying to be pragmatic so we’re trying to solve the problems we know in that we know that we have to pull the data; we know we have to translate it; we know we have to store it in our database.
Ledge: I hear from a lot of clients that when they’re working in microservices, they way to design them from a domain-driven design is that you’ve got to have a lot of subject-matter expertise to break down the particular vertical subject matter: financial or health care or industrial.
I just wonder if you see that at all because you, guys, have it broken down in kind of like a horizontal nature which would be along your process line ─ your E, your T, and then your L.
Do your developers need to also have that hybrid of knowing about financial and consumer data or whatever the domain is and then also knowing about the discipline of the technology stuff as well?
Daniel: That’s really a good question. We have some really great product managers who own that part for us. We really rely on them to be top partners with us in that as we solve the technology problems, we rely on them to really translate those business needs to us.
Eventually, all the developers on this particular project have become domain experts. They didn’t start out that way but we try to blow the problem down to some really concrete solutions in that we don’t expect them to domain experts, to begin with, nor do we have really the time.
Where we’re really expecting that expertise to be is with the product management.
Ledge: That’s fantastic! So that you have a trusted product function that can do that for you ─ it’s huge. A lot of organizations really wish they have that. So you have a tight relationship there between the functions. That’s another topic that’s come up a lot as I talk to engineering leaders.
What’s that blurry line now between product and engineering and how those things are kind of converging and yet they are different disciplines and you have different ways to think about organizing that organization, too?
Daniel: Yes. The best advice I can probably give anybody is to become friends with your product managers or whatever you call them. Learn to brainstorm with them.
Now, I am very fortunate to work with some very talented and very awesome product managers and they come from an engineering background ─ at least, most of them do.
But the one thing that I have in common with all of them that is we all play our parts; we all play our roles. Their job is to ensure that we’re building the right things and they’re going to be valuable. And in order to know whether or not something is going to deliver value, we have to trust them. We have to trust that they’ve done the research, they’ve talked to customers, and they’ve talked to stakeholders.
That really does free up engineering to really focus on the technical excellence of what we’re delivering. And as I’ve said, you cannot be successful as an engineer if you don’t understand the problem domain or what it is that we’re doing. But that will come in time if you have a solid product organization.
Ledge: Yes, absolutely! So you have a strong delineation between the “what” and the “how”; and they’re going to trust you to say, “We’re not going to dictate to you what the technical specifications are. We’re just going to tell you exactly what we want from a business and objective results perspective.”
I love that.
Daniel: And it’s a conversation. I expect to hear them having arguments. I expect to hear them talking over each other because, in the end, it shows that they’re both passionate and they’re both doing their jobs. And, as I’ve said, the product org doesn’t come to us.
I really like how you put that. They’re going to come to us with the “what” and we’re going to come up with the “how.” As I’ve said, eventually, the developers will be domain experts but I don’t expect that off the bat.
Ledge: I took you off on the product path there. I’m curious where you’re going on the engineering path.
Daniel: What I was talking about is ─ as we were breaking down that problem ─ it was a big problem and I couldn’t go into little more specifics about all the nuances of it.
Just from a high-level architectural perspective, we decided that we were going to go head and star with those three microservices: one microservice to fetch data; another to translate the data; another to load the data into our database.
And then, as we got in there, we realized that we had hot spots in our architecture. We had things that we were doing a lot and we had things that we were doing not as much as thought we would. So we decided that it would be good to break those things out further.
At that point, we also saw a need for messaging. We like messaging a lot because it lets us do different work and it lets us kind of get some durability within our architecture.
So, at some point, you’re going to decide. Once you solve the problems you know, you have new problems that are going to come up; and this is where pragmatic microservice architecture really comes from. You’re going to solve some problems and new problems are going to come up. You’re going to have hot spots in your architecture. You’re going to be doing some things more than you didn’t think you were going to be doing them or you’re going to be doing things less than you expected.
So it’s really about how you can then break down the microservices down even smaller and get better scale that way.
Ledge: Yes, I get it. You’re not trying to predict the future of what the requirements are going to be. You don’t know so you’re trying to create a reasonably well-delineated and abstracted version of the future and then see what actually happens based on that. So it’s much more experimental and lean in nature.
Daniel: I want to go back to something I said very early on which was that, at the end of the day, we have to ship product. And so, by doing this approach, we get something good. It may not be perfect yet but we get something good. We get it out there.
We’re now in a position to learn like what you were talking about. We’re in a position to learn. And some of those learnings are what I reference: You’re going to have areas where you could break out functionality.
Let’s take the ETL example again and you look at the extract method. Currently, we extract data from five or six different types of sources whether that be email, FTP, PPI call and so on. We have inbound and we have to go outbound. We found that there were some of those protocols that we were doing more often. So it made sense to break them out into their own services.
And with messaging tying everything together, it gave us the ability to really scale that because we were able to subscribe and use different AMQP exchange types to really accomplish that.
We’re pretty stable now and we’re pretty happy with it. And I will say that if somebody is coming in and they’re microservice experts, they’re probably going to look at one of our services and go, “This is too big.”
And I understand that. But from point of view, it’s as big as it needs to be in order for us to have shipped the value; and, at the end of the day, it was important to us.
But we did it with excellence as well. Our code is maintainable. With messaging, we get durability between the services that we normally wouldn’t have if we were just going directly between the services.
Ledge: I’m curious. Talk to me about the specifics, if you can, of the stack. How are you utilizing the cloud or each of the stack components? Just for the listeners to know, any different tools sets and messaging? What’s the best practice? Solving a big ETL problem is not an uncommon disposition right now for tons of different industries. So what do the tech stacks look like?
Daniel: First, I don’t know if I’m the best person to tell anybody what the best practices are. I will say that my stuff works but we’ll leave it to history to determine whether or not it’s the best practice.
We primarily run a Java shop and it’s not that we’re all Java developers; it’s just that, for us, that was the language that got us the most value very early on and it was easy to find developers and it was very easy to find talented developers.
But, primarily, we run Java Spring Boot for all of our APIs. We run all of those in Docker containers on an ECS cluster and AWS.
One of the reasons why we love Docker so much is that it’s shortens our feedback loop which is very important to us. We can iterate much faster using Docker because of the way that we’re able to package that stuff up and get it deployed across ECS very quickly.
We use New Relic and Datadog for different reasons but New Relic gets us tons of elementary data. It tells us how our applications are performing. We have a New Relic deployed across all our applications and across all of our infrastructure in AWS.
We get tons of data and it’s there for the taking. We can see where we’ve got hot spots and where we probably could afford to go and take a second look.
Ledge: How do you handle CI/CD?
Daniel: This is one of those areas where we didn’t want to get wrapped up in the tool. We are huge fans of Travel-CI because of how easy it is to use. I know some people out there are Jenkins for life. I understand that. But, at the end of the day from a leadership perspective, I want something someone can plug into and be successful very quickly. In my experience, that wasn’t Jenkins.
So Travel-CI fits the bill. It also allows us to deploy directly to our Artifactory instance which is our way of making our Docker images and our applications portable and reasonable across all different types of environments, whether they be local environments, staging environments, or production environments.
Ledge: Here’s a final question I ask all the technology leaders that we talk to. We’re in the business of finding and vetting and you’re putting it out there just like the very most senior freelance engineers who are just excellent senior, A+ unicorn ─ use your favorite metaphor ─ and we have a pretty strong heuristic for doing that and kind of a system that we’ve developed over the years but we feel that we can always learn more.
So my question to the guest on every episode is “What are your heuristics to know and hire the A+ players from the market when you’re assessing them to add to your team?
Daniel: I try to look beyond their qualifications. I’m really looking, one, for culture. That’s the number one thing that I as well as most of the people at my company look for. We put a lot of focus on that.
A long time ago, I learned ─ and I’m sure a lot of people have heard this and I don’t know how many people this would be new to ─ that you can teach skills but you can’t teach culture. And so, we try to look for culture.
What type of things are we looking for in culture?
I think that might be your next question. We’re looking for people who are passionate or who are willing to learn and who are willing to learn from anybody.
I learn everyday from junior engineers. I learn everyday from the CTO of my company.
Just because you’re at a certain place in your career doesn’t mean that you’re immune from learning. Learning is very important. In fact, at Red Ventures, we have an entire department that’s run extremely professionally and extremely well that is all about running seminars and classes and getting external speakers and it’s called “Learning and Development Department.”
If you’re a Red Ventures employee, you have a world-class educational system in the company available to you which, I think, is fantastic.
So learning is a huge part of what we look for ─ people willing to learn, people willing to teach and make the people around them better.
And then, we’re just looking for creativity and “Can they think outside the box when it comes to a solution?”
Going back to my example, I know that doing pragmatic architecture and doing pragmatic microservice architecture, sometimes, can make people feel like they’re doing it wrong and I understand that, too.
But we really want people to get to a point where they can think creatively in that way and say, “You know what, at the end of the day, it’s all about delivering value. How can we create it? How can we get to learn and how can we get to production faster even if it means that we have to do things creatively and do so without sacrificing technical excellence and without sacrificing the feature of our applications because it’s very easy to just produce a prototype into production and people might think that’s value?
I would say that that’s not what I’m talking about. I’m talking more about working fast and working pragmatically but also doing so with a sense of excellence and a sense of professionalism.
In my mind, there is no excuse to ship broken code. I don’t know if I should have said that last part, though.
Ledge: Daniel, thanks for the insights. I appreciate the opportunity to have you on and good luck with the continued pragmatism.
Daniel: Thank you.