Avoid The Fog of Development Through Continuous Learning with Ravi Lachhman of Cisco
It only takes a few minutes of talking to Ravi Lachhmann before you realize, “This dude seriously loves engineering.” That personality serves him well as a Technical Evangelist at Cisco company AppDynamics. Ravi’s perspectives on production outages, operating what you build, and “Netflixian” organizational design all come back to one root idea: keep learning, and keep growing. In this episode Ravi and Ledge talk engineering flavors, purpose-built tools, customer empathy, and much more.
Ledge: Hey, Ravi, thanks for joining us.
Ravi: Hey, Ledge, thanks for having me. I’m super excited to be talking today at the Frontier Podcast.
Ledge: Very fun to have you! Can you give a two- or three-minute intro of yourself and your work, your background for the listeners?
Ravi: Absolutely! I’m currently at a CISCO company called “AppDynamics.” I’m a technical evangelist. It’s really a sweet job. From my background looking back over the last ten years, I kind of rose through the ranks of engineering. I was a principal engineer at two startups. I really enjoyed building distributed systems.
I started out in the federal sector and kind of moved around to the commercial sector and the startups scene.
Most of my background has been in JAVA development. My passion is big web scale. I like making applications that impact people but I kind of stuck around the JAVA ecosystem.
It seemed the ecosystem and the technologies that we use change. It’s something that I’m very passionate about; actually, it’s a lot of DevOps practices and reliability practices. It was kind of shoehorned in for me to get involved in these practices.
Looking through the course of my career, I used to be confined to my laptop with one application server. Now, there’s a term that Netflix calls a “full cycle developer” ─ you operate what you build ─ which a lot of my development counterparts are going through right now. The expectations for us are increasing; the platforms are more disparate and more purpose built than ever before.
Really, that’s how I kind of got into reliability and DevOps engineering. I was forced to by employers in the past.
But a lot to learn! The big benefit of being a technologist is you’re always learning. So it’s something I’m very passionate about.
Ledge: Just having jumped into a new year and having done a bunch of episodes and lots of interviews with leaders, I’ve noticed that there are these sort of conversions of the engineering and product functions; that’s one trend that I’m seeing.
Then, if you think of this broad horizontal line that is the whole cycle of engineering ─ the move left, move right; security is moving left; QA is moving right.
Ultimately, that line ends with the customer. And then, there’s this big push for engineers to have this customer empathy and even work on the support side. I’m just looking at the convergence and compression of that product functions or customer line and the expectation that sort of our software engineers can do everything.
Does that resonate with what’s happening in the field from your perspective?
Ravi: I see that happening even when I was in university. I was a coop at a large software firm. And for a customer to get a hold of an engineering team was like a month-long process. I would not know who bought our software; it was quite expensive.
And there were three levels of support to get to the engineering team. Level Four, you’re at the engineering level at this point. But it was a three- or four-week roadmap dealing with support and maintenance before you get to the engineering team.
Now, in the previous firm I worked at, our customers could Slack me directly. They know my name.
And so, it’s put a more human face on the software but it also makes it difficult to correlate the data.
Say I’m a single contributor or I’m on a team of people, it’s just really correlating what’s going on.
Sometimes, if we’re running an Agile or Scrum Team, we do plan for blogs. So we do X amount of story points. Or they might not be story points but we have X amount of time per sprint for blogs if something is an application.
But as a pure software engineer or someone who has been focused on development, I don’t like blogs especially if I didn’t write it. It’s always like a dumpster fire. We have to look at multiple systems. We might have a monitoring solution or a logging solution or a different solution.
What’s actually going on?
It shifts focus away from the features that were being built.
But going back to the harder question, yes, there’s a lot more expectation for engineers but especially developers. It’s a very Netflixian model. You operate what you build.
And so, who best to have knowledge about the service but the folks who write it. And then, also because of a lot of DevOp practices, the team lines are getting really blurred. It’s all hands on deck, sometimes.
Ledge: Off-mike, we were talking about “Maybe this isn’t the right thing always.” I think everybody wants to be ─ Netflix is thrown around a lot or the way Facebook runs engineering, we’ve got to do that or whatever it is.
But I believe there are different flavors that work in different types of situations. I wonder if there are some demarcation that you think about in the technical evangelist seat.
What’s the right way to do it? It probably depends, right?
Ravi: It’s definitely to diagnose the patient. There’s not always one right way to do something.
Going back to the Netflix, we want to be Facebook; we want to be Netflix.
They are excellent bars of engineering but the issue is if they compare a typical enterprise versus Netflix, Netflix has maybe a handful of concerns at massive scale.
Let’s take a bank or an insurance company. They have thousands of concerns at varying scales. And so, you already spread a lot differently than having very straightforward concerns.
Having an operate-what-you-build type of model, it might not work for everybody just because ─ going back to a survey and something that resonates really well for the Gun.io listeners, it takes Stack Overflow one of my favorite service that has to do with their annual insights.
They just recently published their 2018 annual insights. Comparing 16, 17, and 18, the numbers are actually increasing. It’s kind of going counterintuitive because everyone’s community at large are getting better.
The average time for an engineer to become useful is pushing three months.
For me to produce, let’s say, a viable code that goes with the production or produce viable features ─ or you work on a team. I get parachuted in to a new customer. I need to learn the team dynamics. I need to learn the environment. I need to learn the bill process. And that’s about three months; and that number has been shifting from one to three to three plus if you look at the survey year to year.
And there’s definitely a reason for that. It’s at a platform that you have to interact with. They’re all becoming more specialized just because of the scale we deal with.
If you look at the bloom of the Cloud Native Foundation, just a host of projects, the project is becoming more granular in their feature and functionality because of the scale that we’re dealing with ─ and so, more purpose-built technology; the platforms are becoming more disparate; the number of platforms and tools you have to deal with are increasing; they’re actually not decreasing.
And that amount of time to learn multiple tools takes a long time. Definitely, there are pros and cons of doing them both. For other institutions that are not Netflix, it’s extremely challenging to do that.
Ledge: Yes. We have clients come in all the time and they’ve got this list of fifteen different technologies that are all over the cycle ─ frontend, backend, DevOps, CI/CD, Q&A, you’ve got to write their test in this package; you’ve got to do this and that and then demanding a hundred percent match for the engineer who is also full-time available who has exactly that combination of things in a growing pool of tools. It becomes more and more difficult and it takes you away, I think, from some of the idea of ─
Every engineer will probably tell you, “Hey, I’m smart. I can pick things up fast.” Clients typically don’t want to hear but, at some point, that’s going to crash and burn because you’re going to have learn on the dollar because you’re not going to get fifteen perfect things matching.
So we kind of have to ask people, “Hey, which things really matter the most and where are you willing to have some flex on that rack because, otherwise, you’re never going to find that unicorn rock star Ninja, whatever you’re looking for.”
Ravi: Absolutely! Let’s take a step further. When I would hire people, most of the time, I would hire people for a full-time role. I call it the “70/30 percent rule.” Right. on it depending on the person.
It’s all about interest. No one is going to be perfect. Everybody souls are different. We are in a little more beneficial age because of open source. If you take it ten or fifteen years ago, for me to get skills, let’s say, an IBM webscure working for IBM or a large company that bought those licenses.
I couldn’t pull them by myself.
When I went to Bloom Open Source, I can be pick up a lot of technologies free of the community. I’ve learned them myself.
But going back to the point that there are so many pieces of moving parts out there, it’s a two-part thing. Can we teach somebody something? Is it going to be interesting for the engineer?
There are a lot of smart people. Most people are pretty good.
Do they know enough not to get stressed out? The main thing is, is the job fulfilling?
Without learning, they’re not being fulfilled. If I do the same thing everyday, I would be very unfulfilled. I’d get another job.
Anything above 40% that you don’t know, they might struggle with the job a little bit but if they’re getting sales, if they’re bringing a creative approach especially folks who are doing something for the first time, their approach might be more creative. I’ve done hundreds of times.
I’m pretty rigid. I’m not going to change like this is how I always did it.
I used to make fun of people who do that but after you’ve done it so many times, you turn to that person. And so, they have a fresh pair of eyes and that’s why it’s always good to get fresh eyes.
Ledge: Our listeners always tell me that we love, if not dumpster fire, sort of big audacious failures that led to learnings. I wonder if from your vast résumé there you can think of any really excellent crash and burn, rise from the ashes learning experience that you’d like to share.
Ravi: I had a lot of production outages. Actually, I had a lot, to be frank. That’s why I’m not on production anymore ─ the level of stresses is not there.
We’re people. When humans interact with a system, we only know ─ it’s a term I like to call “foggy development.” It comes from the Fog of War. It’s a military term. It has to do with situation awareness.
Basically, how it translates to technology is like the change that you make, you probably don’t have situation awareness along the lines to say, “Oh, what is this change going to impact maybe one iteration or two iterations down?”
In our technology world, it’s going back to all the platforms. As a product owner or a previous product owner, I might own less than ten endpoints for a platform/shopping cards ─ those are the big endpoints but I own less than ten. And the application of the platform itself for us to buy something or integrate something has hundreds.
And so, really knowing where you fit into the picture, you might really know that, my greatest outage, I actually corrupted a database.
The thing is, they pretty much lost a day of work because we were rolling out the next day. It was a big release . We retooled the platform. We changed the database provider. We changed app server. It was a big bang for the buck. We were going for gold.
And so, the one thing I learned is that if you can programmatically corrupt something, you can programmatically uncorrupt something.
I had a more senior person tell me that.
“Hey, man, you know what, we had a lot of people fixing our database. There are ways for us to to undo it but if you inserted it like this, you can definitely delete it like this.”
That was pretty interesting. You learn a lot.
Ledge: Business users love it when hundreds of people are sitting idle at their desks waiting for you to uncorrupt their database.
Ravi: No, they do not.
Ledge: They’re watching the dollars tick up like the national debt clock. I always wonder what it’s like. Now, we’re also ubiquitously dependent on these upstream and downstream types of apps. You run your business on other people’s stuff who are running their business on other people’s stuff who are running their business on other people’s stuff.
And then, every once in a while, an entire region of AWS goes down and we all go, “Well, we can’t do anything. Let’s actually take a breath now.”
I think of it as a microservice architecture. You know, we’re all really building on somebody else’s stuff. In a way that that’s enabled a startup in rapid growth environment, it’s also enabled us or really made us dependent upon the fates of events of God out there and the universe like if there’s an earthquake or whatever.
There’s no actual complete reliability scenario and I’m sure enterprise clients run this stuff like that all the time.
Ravi: Absolutely! The big outage years ago was S3 and AWS had an outage pretty much like that Internet stopped working just because folks are so reliant on that storage for their sites or for digital properties or omnichannel experiences.
Even Amazon themselves have written several services to be cyclically dependent on S3. Even their dashboard wasn’t even available because it was dependent on S3.
It was painful but you’re right. Going back to another practice, it’s called “SRE” ─ Site Reliability Engineering. The adage is true. Don’t put all your eggs on one basket.
But that’s expensive. Having a hybrid cloud strategy and actually pulling it off is quite expensive. Having the ability to move workloads around, having workloads that are not tightly coupled to a particular vendor infrastructure is extremely difficult.
But a lot of what’s going on right now like containerization and other kind of orchestration platforms are really making that more ubiquitous.
But taking that back to our original conversation, there’s more tooling you have to learn. So my humble beginnings as a JAVA developer, now I have to learn Docker or I have to learn Mesos or I have to learn an orchestrator. I have to learn how to set up our hybrid cloud setup as my workload to be transferred from Point A to Point B without losing any data, and that’s difficult. That’s really pushing what what a JAVA developer would be able to do.
That’s why the foggy development comes back.
Does anyone have a good idea what the entire system looks like?
Ledge: No doubt. The orders of magnitude more abstraction. It’s not that long ago that those of us with some age in the industry could remember writing on bare metal. I mean, now, you’re sort of abstracting out to ─ first, it was cloud, and then it was ─ now, it’s containers and now, they’re serverless and on and on and on.
And before you know it, there’s a hundred and sixty-five different AWS services you need to know how to use and thousands and thousands of pages of documentation.
I think our abilities have become so complex and interesting we could do so many things but the educational burden of staying relevant and staying a professional certainly has increased at the same orders of magnitude.
And I wonder at what level the abstraction starts to become a burden. And there might be some compression or some consolidation in the space.
Ravi: Absolutely! I’m pretty fortunate with the job I have to attend a lot of industry events. I was at re:Invent and I was at a bunch of events last year. And it’s surprising to hear when people do things kind of counter to what the events taught you about.
I was at a DevOp space event in Washington, D.C.
You run into a lot of people that just says “Government” on their tag. They’re not going to tell you their names; it just says “Government” so they probably work for an intelligence community.
They were talking about organization. I was working at a particular container orchestration company and this one person came up, and started talking to us.
“We want everything on bare metal because we want speed.” And it’s like, “Yes. You’re the first person who comes up and talks about ─ we’re really concerned about the abstractions slowing us down just because the amount of speed and scale that they require.
Even the money wasn’t really a problem for them because they put on bare metal but speed was a big concern for them.
You’re right. The more abstraction something is, the more abstraction something is , the old computer science principles apply here, the more abstraction you have, the more fire power you need.
I’ve worked with bare metal but then I might have a host operating system, a guest operating system, a container orchestrating system. And then, even my container orchestrator might a serverless infrastructure.
And so, the layers just keep adding on. You’re trading in speed and ubiquity for performances. But that’s the march we take.
Ledge: Yes. You’ve got to run your sort of containerized microservice, the entire thing, on your local machine; and you’ve got your MacBook. All of a sudden, 64 gigs of RAM just to load your eight containers. At some point, it becomes like “Wait a second, what happened to the client server or the cloud?” You’re just running a local copy of the cloud in order to even deploy something.
We run into a lot of cases like that where even a superpowered engineering machine gets crushed by just setting up the local environment.
Ravi: It’s a very common trend. This is so funny. I was in school. I was an engineer, being a co-op, probably fourteen or fifteen years ago, my machines were always being crushed. It never stopped being crushed. Just the amount of stuff I’m able to do on it, but consistently They’re always crushed.
Ledge: I like the program on a Spark terminal.
Ravi: Back in my day, we had five hundred and twelve megabytes ─
Ledge: Right. And nobody ever knew what you do with 8K of RAM.
RAVI: We had less than a gig to start with. Now, when we hit that gig when we hit that gig barrier it’s like “Wow, two gigabytes!”
Ledge: Ravi, for the last question, maybe give some advice to our freelance engineers out there who are trying to make an impact ─ large company, small company. What do you think of the keys to success as a technologist in the field now?
Ravi: Absolutely! It’s always to keep learning. I might be more prone to this. I might have changed jobs a little bit more quicker than some of my contemporaries because I just like getting more experience. I always enjoy hearing how other people do things.
I never like being the smartest person in the room because I want to learn. I want to learn from somebody else. That’s always the greatest way to learn.
For the listeners and for the folks who part of the Gun.io community, they’re doing the right thing. They’re expanding their horizons by being able to learn very quickly from other folks, other industries.
The main experience is invaluable. Let’s say, if I had a year at Gun, I might look at maybe two or three projects four to six months a pop. I might say, “You know what, I want to learn the insurance industry right now. Let me go learn valuable knowledge an insurance company has” or “Let me go to a healthcare company” or “Let me go to a media company.”
Learning those domains are invaluable. It’s like being a kid at the candy store. All of those particular industries are looking for folks to help fill a gap. And definitely don’t forget to learn about the business domain that you’re in because it will come back and help you out.
Being a freelancer, that’s perfect. You can come in and come out and learn a lot.
Ledge: Great insights! Thanks, Ravi! It’s good to have you on, man.
Ravi: Thanks, Ledge. Awesome to be here!