Machine Learning in Healthcare: Interoperability via Machine-to-Machine Data Architecture with Jim Nasr
Jim Nasr is Vice President of Technology and Innovation at Synchrogenix where he spearheads strategy and implementation of emerging technologies such as large scale blockchain and machine learning in healthcare and the life sciences. He previously served as Chief Software Architect for the CDC, the Centers for Disease Control and Prevention. In this episode Jim walks us through the critical requirements of data interoperability and how designing for machine to machine data transfer is the most critical of architectural decisions.
Ledge: Jim, welcome! Good to have you here!
Jim: Thank you, Ledge. Thank you for the invitation.
Ledge: Fantastic! Why don’t you just give your two-, three-minute story for the audience, and then we’ll jump into some more discussion around different topics?
Jim: I’m really a technologist by profession. I’ve been doing large-scale software revision for, at least, a couple of decades now. I had my own company for quite some time, about 12 years or so, largely with the large-scale unstructured data and unstructured content. I did quite a bit of work with the government.
After a while, I decided that it was a good chance to open up the opportunity, and some other things and that kind of led me, after a little bit of time, to work at the CDC, the Center for Disease Control and Prevention right here in Atlanta where I’m based.
I started as an entrepreneur in residence because of my background in private sector and technology background. After a little while, it became evident that the job at hand was to really revisit how software was developed.
We’re in the business of public safety and providing information flow for things that are very important to all of us humans such as infectious diseases and outbreaks and preventing disasters happening and things like these.
In many ways, the systems that were there we’re not designed for that. It’s really for Internet scale and real time throughput. That was really my agenda there and very much my path. And that led me down the path of blockchain as well and then to public health and some of the used cases there.
And then, at the beginning of this year, I joined a company in the pharmaceutical space and the life sciences space called Certara and very much with the intention of building kind of a large-scale set of open technologies including, where it makes sense, distributed ledger technologies, to really address the very large problems in life sciences and pharma around interoperability, around software that meets really market needs an quick turnaround and provide this set of open protocols and open technologies that we can mix and match.
So my work is around this technology or this platform called “OpenPharma” and very much with the intention of building small functional components that can be mixed and matched together and needs a bigger set of application.
Ledge: Quite a collection of activities there! You can definitely see the arc of the career going to where you’re solving major problems using more modern architecture, patterns, and approaches, as you’ve said, at Internet scale.
What have you learned in that that’s applicable maybe for any size company that’s trying to approach and deal with these massive interoperability and data concerns?
Jim: That’s a good question. There are a few key patterns that come out. One is, from inception, think machine readable; think machine to machine interaction.
If you think like that, they’ll guide you significantly and you’ll, hopefully, avoid some pitfalls particularly in terms of high architect data because, ultimately, if you’re not thinking like that at this stage, you’re not building for scale. I think we’re very much in the world of machine to machine, software to software, API to API like a medical device to another medical device per se. Thinking like that will certainly clarify some design concepts in your mind.
And this is really regardless of interoperability. It has all to do with usability that we are very much in the world of consumer-grade technology. Without doubt, it needs to have a very intuitive user experience, it needs to work with the natural things or things that we consider as being natural elements of that consumer experience. I think that’s another problem that I’ve seen largely in the health and life sciences space.
But technology really has not been designed with the consumer in mind, with the actual user in mind. It’s not consumer grade. As a result, that leaves a lot of adoption problems.
I would say that these are some key things and I think that, to me, it’s very important this idea of decoupled architecture, decoupled software. So you build software that inherently or really by design decouple from various software or from underlying infrastructure.
So if you wanted to run it on Amazon Web Services, you could. If you wanted to run internally, you could. And it’s really a lot of this concept of software as a theme park where you have an API that acts as the physical turnstyle you go through. Almost your “wristband for the day.” And then, you’re able to traverse different playgrounds and different experiences.
And then, you’re still in the same theme park but they’re not just one experience. They are different experiences. You can pick and choose where you want to go and you can have different software, different applications. They can be composed very quickly from singular components that work independently.
To me, those are important concepts in terms of software design.
Ledge: What do you think about when you’re developing a big open standard or attempting to develop an open standard?
One of the classic jokes about standards is “My standard is always better than your standard and discoverability.”
How do you serve all the needs of the audience and try to take all the stakeholders into account? What is it like trying to develop an open standard from a political standpoint and making sure all the stakeholders are taken care of?
Jim: That is really a good question. In my mind, I differentiate between standards and interoperability. I’m a big believer of interoperability. Again, all you’ve got to do is look at the Internet and see what has worked over a course of time and what has worked at scale.
What has worked is the OpenAPI Spec. I think that has worked. There’s clear evidence at a very large scale that it works. So I’m a believer of that.
Conversely, I’m not really a big believer in many data standards because I think data standards take a long time to adopt but also your demands may change significantly at rapid rates, at times.
As an example, we hear a lot about Internet of Things and taking data from ─ in the world of healthcare, from wearables or various nanosensors. There are some things like these.
Very few data standards are around to deal with that kind of data. It’s not your traditional data. And if you’re going to take a long time to build the standards and have all the vendors and various systems to support it, you may never get there.
To me, when it comes to data standards, I’m much more about ─ and I think we have the technology now – to take very large-scale data in different forms and use, essentially, dynamic indexing, use unread schema development, use sophisticated search, use API-secure data in different forms as opposed to trying to come up with a group of people in a room and spend two years to come up with very well-defined and strongly typed standards and then try to convince everyone to do so.
I just think that’s not really going to work. That’s not the right approach.
Ledge: Right. It doesn’t move fast enough.
Jim: No, it doesn’t. I’m a big fan of data linking like I mentioned, dynamic indexing. And some rule engines can key metadata and things like this that you’re trying to enforce but not to have standards for every single element of data and things like that because I don’t think it’s a market-relevant approach. I think this takes too much time.
Ledge: Does it become incumbent on every player in the field to, from an interoperability standpoint, if we’re not using a strongly typed standard, then you really have to consume and do some ETL on and transact the data in your own middleware? Doesn’t that add a ubiquitous cost then to doing business in the field, the in and out and translation of one to another?
Jim: It does. That was my point at the beginning about really thinking machine readable data because if you’re transforming as part of the process of taking a data and, say, your data for interaction is, let’s say, JSON, as an example, so it’s machine readable, that makes it far easier to deal with because if at every step you have to go and convert or transform proprietary data and recreate like middleware accumulators, that’s a very difficult and a complex approach.
So I think part of this is not just the technology that we create, but it’s also some education around data interoperability. And this is what we do with OpenPharma, our partner. We tell them, “Look, we don’t care what your source – CSV or you may have a PDF whatever, but the form that we want to be interacting with is machine readable.”
Part of the process for them is to get the data into that machine readable form because, from there, it’s relatively easy for us to transform it to different context to do machine learning, to do natural language processing, all kinds of other things.
But if you have to build on proprietary and converters and data transformation services and things like this, we would never get out of that and that becomes a real kind of onerous process.
There’s definitely a cost to be paid but, ultimately, if you think big picture, having machine readable data – that really is the answer. There’s no doubt that the phase is gone where humans can deal with data at the level that we are at, not only with volume, but also the pace with which it’s created.
Ledge: That’s interesting. So it’s incumbent upon the provider to provide it in a machine readable context and form that would allow other people to consume it. Even if they’re not following a standard, they are, in fact, following a standard, at least at JSON or something that can be consumed at scale.
Let me shift gears. I like to ask all our guests, “How do you evaluate what makes someone a really excellent senior software engineer? What are heuristics necessary to be a high-performing engineer in the current market context?”
Jim: That’s a really good question. I’m a big believer in what I think are fundamental characteristics regardless of what specific function you’re doing. In a world of soft architecture, I think being curious is really important because it’s very easy to kind of drink your own Kool Aid and think what worked five years ago is going to work now.
There are many examples but database architecture is one that’s close to my world; and, again, what worked ten years ago is obsolete now.
Even if you’re like a big data, big data was a thing. Many people now, especially in the context of machine learning, they would refute where big data would fit in. There are other ways of going about that.
So I think being curious, being really able to kind of absorb what’s going on around you and apply it as it’s relevant is really important.
The other thing is that you do need to have a good amount of depth technically. It is a broad job but if you just brought it at PowerPoint level, it’s not anywhere good enough because you can’t really make informed decisions and you can’t lead a team. I think you have to have some depth.
There are many areas you can depth around but, certainly, some of the core principles of software development ─ in my world, certainly, microservices architecture is very important.
As a software architect, you need to be able to communicate not just with yourself and other people with your own kind of background but really across an organization, digest information but be able to kind of parlay it, as well. You have to kind of practise some basic psychology. You have to have some basic ability to affect behavior, which is more than just various software development methodologies and design patterns or whatever.
I think being able to communicate clearly to other people, particularly people who are either the users or the sponsors of what you’re doing, is really important. Otherwise, our really don’t move the needle and you wind up doing things that are interesting but are not really important.
Ledge: I imagine you’re someone who has dealt with a lot of engineers on all qualities of the spectrum and had to bring in all types of personalities and that psychology is super important there.
Jim, thank you so much for the insights today.
Jim: I appreciate your time.