Skip to content
Gun.io
May 29, 2019 · 19 min read

AI could be your new legal assistant with Lars Mahler of LegalSifter

Ever had to parse through an absolutely awful contract? How about a 60-page MSA from one of those big company procurement departments? One of the promises of AI is that the happy robots will start taking away the drudgery of tasks like parsing our 4-page IP transfer clause.

In this episode, Ledge sits down with Lars Mahler, Chief Science Officer at LegalSifter. Spoiler alert: they already have AIs to do this! As a bonus Lars lays down one of the best machine learning analogies the I’ve heard.


Lars Mahler

Chief Science Officer and co-founder of LegalSifter

Lars Mahler is the Chief Science Officer and co-founder of LegalSifter. He co-founded LegalSifter with the goal of using AI to help make legal services more affordable, and more accessible. Lars is responsible for LegalSifter’s AI platform, as well as the team of lawyers and data scientists who work together to build AI algorithms. His team specializes in text classification (LegalSifter product), information extraction (ContractSifter product), and natural language processing.

Read transcript

Ledge: Could you state your full name, please.

Lars: Lars Peter Mahler.

Ledge: Hey, Lars, thanks for joining us. It’s really cool to have you on.

Lars: Ledge, thanks for having me. It’s great to be here.

Ledge: Can you give a two- to three-minute background story of yourself and your work? I’ve read it and you have covered a lot of ground. This is going to be interesting.

Lars: I’m the chief science officer at LegalSifter. I’m actually a co-founder as well. I started LegalSifter about just over six years ago while I was finishing out my grad program at Carnegie Mellon University.

At LegalSifter, I oversee all of our AI and data science activities and that include some platform development as well as a team of lawyers and data scientists who build our models.

Ledge: You’re a co-founder and a data scientist and, off-mike, you even talked about wearing the sales hat. Put this together for us. The AI is so hype cycle yet so active and prevalent. Break it down a little bit. Tell me about the market.

Lars: You’re right. The AI is prevalent. It’s kind of in a hype cycle.

It’s actually a really fun space to be in because there is that sort of hype and excitement. And, to some extent, it’s a little bit overblown. People think about how these super intelligent machines ─ you know, we’re not there yet.

The reality is, really, a lot of these AI applicationS, at the core, are productivity tools which, sometimes, doesn’t sound sexy. But they’re actually really sexy productivity tools.

So it’s fun. During the day to day, sometimes, I put my sales hat on with potential clients, and then I might shift gears and meet with my data scientists and get deep into the numbers.

Ledge: Talk about what the tool does because I was not familiar with it until we connected with LegalSifter. I read the homepage and I was like, oh, I so need this. Just talk through that value prop just a little bit, and then we can get into “How do you actually make that happen?”

Lars: LegalSifter is an application where if you are negotiating a contract and you’ve received the contract from the other party, it’s on their paper and you’ve never seen this before; it’s sixty pages long; it’s a real pain in the butt to read.

And you can upload it into LegalSifter. We’re going to automatically scan and analyze all the terms and conditions. We’re going to let you know which terms and conditions we’ve found to things that you care about. We’re going to let you know which terms and conditions are missing. These are also things where you’ve specified in the application, “Hey, let me know if this is missing because I really care about it.”

Thirdly, when we find the terms and conditions that you care about, we’re going to inject help text that has guidance either from your own lawyer, your in-house counsel, or our network of experts.

Ledge: That sounds incredible. That actually gets to the AIs that I didn’t even know existed, and I talk to a lot of people. Just talk about this technology because that almost sounds like bleeding edge impossible to me.

Lars: It’s fun. It’s cutting edge, not impossible, but it is bleeding edge. Do you want to know how it works?

Ledge: Yes, I absolutely want to know how it works. A lot of people will probably know a lot of the terminology and I think we also have audience members who say, “I have got to get into this. I want to learn where to even start.”

So walk us through ─ I guess, there’s probably a bunch of NLPs in there. There’s probably a bunch of, certainly, learning of some sort. What is the architecture and the layout of such a thing?

Lars: I think about it in three main chunks. The first layer is the chunk that’s taking a file from a use be it a PDF, a Word document, a JPEG, or whatever. It’s taking that and converting that into text.

There are different things that we have going on there. There’s OCR if it’s a PDF or image file. If it’s a Word document, there are different things that we do to kind of exploit the structure of that document.

The first layer is really all about translating input document into a standardized text file. The second high-level process is really the domain of natural language processing or NLP. Here, you’re taking that long string of characters in the text file and turning them into chunks of text that have meaning and attributes.

So the things we might do would be, first of all, chopping it up into sentences, then chopping the sentences up into tokens or words; and then, for each of those tokens, you’re trying to figure out what part of speeches this token is. Is it a noun? a verb? an adjective?

So you’re doing a lot of things at the NLP layer to basically take what was previously just a raw set of characters, breaking it into meaningful units, and then sort of adding metadata to each of those chunks of text.

The third and final layer is once you’ve gotten all of this text and you’ve chunked it out in a nice way, then you’re in the world of machine learning. Now, you’re taking all those words doing what’s called “feature extraction” or “feature engineering” and taking all that information and basically crunching it down into numbers. And that black box that you get at the end, that is the AI algorithm.

Ledge: Many, many moons ago, I would have to write these gigantic scripts that would suck up near binary feed data and try to turn it into some kind of a structured data source ─ ETL, I guess, if you will ─ and that sounds very familiar to me on the top. In fact, a lot of folks I’ve talked to are sort of like “Hey, when you’re getting started with this, 80% of the work is just figuring out ‘How do I take this data and tokenize it and make it useful?’”

And then, there’s the part where you’re training algorithms. You need to have a great deal of input ─ so many contracts, and business rules, and feedback.

Is that true? Is that how the mechanism work?

Lars: That’s exactly right. What you’re talking about, that sort of data munging or data wrangling, is a huge part of data science, in general, but especially with NLP. Anytime you’re dealing with text, it’s messy. There are lots of things you have to do and you can think of it as a very complicated type of ETL.

To your second point, yes, once you get to the world of machine learning ─ you’ve taken this document which was a string of characters; now, you’ve converted it into chunks and you can think of each of those chunks as a row in a spreadsheet and the columns in the spreadsheet would be what we call “features.”

And so, if you’re training an algorithm, you could have a hundred features, a thousand features, a hundred thousand features. It’s the world’s biggest spreadsheet that nobody wants to look at and you’re going to feed that into an algorithm and then it’s going to take all that and crunch it down to meaningful numbers.

Ledge: So it’s essentially really about taking each token and applying a huge number of tags to it. It’s not unlike what would be your tag cloud on a much more advanced scale.

Lars: Yes, that’s a great way to think about it.

Ledge: This is super interesting to me because no one has actually broken it down that way.

Lars: I love this stuff.

Ledge: I don’t think anybody really sort of understands that. It’s far less frightening and sort of abstract when you break it down that way. And you really start to appreciate the compute power and the contribution that that makes because we are just able to do things that we were doing anyway but at massive scale and with millions of dimensions instead of four or five or six or the worst spreadsheet you’ve ever worked on _____ 0:08:54.9 or Column AB. And this time, you’re talking about tons and tons of those metaphorical columns.

So how does meaning and learning result from the algorithm? What’s on the other side after you’ve run algorithms against all that?

Lars: What happens with machine learning, let’s say, is you’ve got this really nasty spreadsheet. It’s a million rows’ long and it’s a million columns’ wide. It’s just super nasty. You’ve not crunched it into numbers or coefficients. And then, on the other side, when you have that spreadsheet, let’s say, there are a million columns, there’s one additional column, your one millionth and one column that has your labels of what it’s trying to learn.

For us, what we’re trying to learn is “Hey, is this a governing law cause or not?”

For a credit card company, that very final column might be “Was this transaction fraudulent or not?” If you’re detecting whether an image is cancerous or not, the last thing might be “Is this image cancerous or not?”

That final column is on the other side of the algorithm and what the algorithm is trying to do is figure out “When I take all these million columns and run them through the black box, how can I set the coefficients so that they do a good job of predicting that final column?”

Ledge: And so, that final column is not only the label but it’s also a probability then?

Lars: It can be. It depends on the type of algorithm you’re using but, usually, it’s a label. If you’re talking about classification, usually, it’s a label ─ dog versus cat, red versus yellow. For many classification problems, that’s what’s in that column.

Ledge: And what does it do if it’s not sure?

Lars: Usually, when you’ve got that labeled column, either a human has made a label indecision so that the human said, “This is a dog” or “This is a cat” or you have that data from some other source where you’re relatively confident that it’s right.

But, to be honest, there are a lot of times when the labels are wrong. The data is not always clean. So that’s why, a lot of times, you have to have a lot of data so that when you have noise or dirty labels, the good ones outweigh the bad.

Ledge: That’s really interesting. Prior to our record here, you were talking about the market for AI developing around this distinction of narrow AI, and I’ve read about that. I would love to get your comments on that.

And then, I guess, the holy grail that everybody is waiting for is this general AI. This seems to be a really tactical way to view the market and the accessibility and usefulness of AI at this point.

Lars: Yes. You can think of AI in sort of two buckets. Let’s start with general AI. Five years ago, when most people thought of AI, you would think of “how” from a space odyssey or some kind of super human robot with super human intelligence. That is what people are working towards. Whether we should or shouldn’t be working towards that, that’s kind of TVD. We don’t know if it will be friendly or unfriendly.

But that’s general AI. It’s AI that can do many different things and be as intelligent as a human or maybe even more intelligent. So super intelligent AI is beyond human.

That’s still many years into the future. I don’t know if it’s twenty years or fifty years or a hundred years. It kind of depends on breakthroughs in different fields.

What you see today in the market is exclusively what we call “narrow AI” and this is where you have artificial intelligence techniques being used to solve a very specific task or problem.

So, for us, the task that we are solving with LegalSifter is given a contract, can we understand what each of the sentences in the contract is saying?

And we’re very good at that and getting even better every week.

But we’re never going to be able to take that AI and have it make a cup of coffee or drive a car or do anything else. It does one thing really well but it can’t do anything else.

Similarly, Uber is building a self-driving car but their self-driving car will not be about understand legal contracts.

Ledge: We kind of hope not.

Lars: So what you’re seeing in the market today is ─ and there’s really a gold rush going on ─ is many, many different narrow AI applications. And they’re each focused on problems that people have. They’re productivity tools but they’re productivity tools at a higher level than what we’ve seen in the past. They’re productivity tools that help predict things that humans might have a hard time predicting, make judgments that might take people time combing through text and quickly find things that might take lawyers weeks or years to find.

So, in a way, they’re just productivity tools but, in a way, they’re really super duper powerful productivity tools.

Ledge: Of course, you got me thinking there that I really would want my lawyer to buy this. This is a good six hundred dollars an hour, I would really like to reduce that time burden.

Lars: By the way, a lot of our clients are lawyers and they’re using this for that purpose.

Ledge: Good! God bless them. You should put a directory on your site of “Lawyers who use us.”

I’m curious. When you look at the narrow AI you described as a gold rush and sort of the after-the-fact economic analysis of any gold rush that the only people who got rich were the ones selling shovels, how do you, guys, think about that?

Lars: I’ve thought about that so many times, actually. Google, AWS, and Microsoft are definitely selling shovels so if you’re using any of their cloud platforms, they have machine learning tools; they’ve got annotation tools which is very important in the world of machine learning. They are definitely selling their shovels.

There are other companies out there that are also similarly trying to help automate the process of building ML or AI applications. It’s hard to know which ones are going to work and which ones are not going to work.

I think not every AI company will succeed but I think a large number of them will succeed because of the problems that they’re solving are real problems. We had the algorithms for it before but we didn’t have the compute power before. So this is the first time when we’re able to actually help lawyers read faster or think better or write more clearly.

Ledge: That duality kind of switches over time. It’s almost like your new version of client server. At what point does the compute engine power outrun our thinking of great algorithms?

I know we have AI sort of coming up with AIs now. Is that the necessary next step and then quantum ─ what does that flipflop look like in this space?

Lars: To be honest, I don’t know. That’s a great question. I think creating AI is very interesting. I think it’s definitely within the realm of possibility.

Generative AI, AI that’s creating things like art work or music or things like that, to me, is really super interesting. That’s still in the R&D stage right now but it’s definitely within the realm of possibility.

I don’t have a good sense of when we might see AI creating AI.

The most intensive number of crunching that we have right now is usually around deep learning; and, right now, Google and AWS have solutions that are really powerful that can help you crunch these numbers really quickly.

So it may be that our compute power is going to be sort of more than what our algorithms need within the next year, too.

Ledge: We live in an interesting time where one hopes that it doesn’t become general terminator style.

The ability to really augment the human condition ─ it just opens up all kinds of opportunities in health care and neuroscience and just on and on with so many fascinating conversations. And I always think that each of the narrow applications is sort of out there on your scatter chart and as that bubble for each gets bigger and bigger, obviously, they start to overlap and, eventually, the whole chart is colored; and that’s really your event horizon for general AI.

And you’re right. The time frame is the biggest question. If you’re reading the singularity, it’s only six to seven years away.

Lars: To be honest, I think that in the next ten years or so, what we’re going to see is most routine cognitive tests, the things that people do again and again that requires some brainwork and is kind of like drudgery or TSFramework, I think you’re going to see most of those automated or accelerated with AI so that AI is kind of your partner helping you do that faster.

In the legal space, what that means is lawyers are doing less groundwork and thinking more. I think it’s probably the same in the medical space and probably every other space.

So for a while, it’s going to be great. It’s going to be everybody kind of focusing on the stuff they’d rather focus on.

And then, beyond ten or twenty years, when general AI starts to become a thing, then, it gets really unpredictable ─ hopefully good and exciting, though.

Ledge: Yes, I just have to trade my stocks for me and just sit back.

Lars: Exactly!

Ledge: Getting a little and just drink my smoothie.

I’m always interested ─ and the audience consistently loves this ─ I think of the massive audacious failures that really turn into these great learning opportunities from which you grow successes.

I don’t know if you have any good stories that come to mind of “Wow, I really wish we had known X at the time.”

Lars: Man, you know, I’m racking my brain. There are many small things. I’m kind of becoming empty. It’s not that we haven’t had mistakes. We’ve had many but they are all kind tiny and just incremental ─

Ledge: ─ narrow learning opportunities and not general.

Lars: Yes, exactly!

Ledge: Everybody has got a great story on how they brought production down for every client on the DevOps side.

Lars: This is a little bit kind of operational but a lot of our learnings ─ we’ve had a few kind of breakthroughs algorithmically where we’ve tried some different algorithms and learned, “Oh, man, we can get a step increase by using this algorithm or this feature.”

Those were exciting but a lot of our real gains have been from tweaking what we call “annotation.” I don’t know if you’re familiar with annotation or not but underneath every AI application, usually, there’s a team of humans having to label data and that process is called “annotation.”

And so, we’ve really turned that into a factory so that when we identify “Hey, we want to build a new sifter or a new model,” we’ve got a whole team and we can just build it really quickly with high quality and in a very repeatable way.

Ledge: It reminds me of this kind of the promise of a Mechanical Turk ─ humans doing necessary tasks that require human brains but almost like an API. Any thoughts on that? Is that the kind of thing that develops around the edges of every business?

Lars: That’s exactly what we have. We have our own internal Mechanical Turk. We don’t outsource it. We’ve got to keep the data inside but we essentially have that going.

I think most large companies or serious AI companies either have their own internal team doing that or they’re outsourcing the Mechanical Turk or several other vendors.

Ledge: It’s a fascinating thing that all these bleeding edge technology comes down to brute force of humans.

Lars: You’ve got to feed the machine.

Ledge: Yes. And I think people misunderstand that sometimes. We get a lot of inbound entrepreneurs who are like “I want to build AI to do X.”

And we’ve learned through sort of the trial and error of facilitation that “Where is your training data going to come from?” because you don’t have a data set large enough to even make any assumptions there. I think that’s the piece that nobody wants to talk about.

So on your front end you’ve got the huge quasi ETL data ingestion; and then, on the back, you really need to have an amazing amount of this tagging classification, whatever the terms are, and somewhere in there, I guess you get to actually write an algorithm. Maybe your lucky data scientists do that part while everybody else is still feeding the machine.

Lars: That is the fun part, the algorithms and feature engineering. But we’ve got to be all over the place. We use data science in the annotation process, too. So we use it everywhere to kind of accelerate the process.

Ledge: We’re in the business of finding and vetting and certifying A+ very super Ninja unicorn engineers and we’ve been doing that a long time and we have good processes around it. I always love to ask about talent vetting and evaluation for every tech lead that I interview.

I’ll even pivot the question a little bit to you. Are there meaningfully different heuristics between hiring a data scientist and a software engineer?

Lars: I think there are two differences. To me, in general for everybody, I want to somebody who has years of real world experience, somebody who understands best practices but maybe doesn’t follow them dogmatically but understands why they are the best practices and when to use and when to discard; and then, somebody who has great communication skills who can understand the business context and use that to influence the way they write code; but who also when they hit barriers or sort of decision points in the code, they can clearly communicate that back to other people so that others can make a decision of “yes, we want to do it this way” or “We want to do it that way.”

I think that applies to everybody. Also, I want to somebody who is really good but humble.

That’s for everybody. I think the only two things I’d add on for an AI engineer or a data-science person would be, first of all, a good data intuition. A lot of times, you’ll have people who come out of a grad program and they really understand the algorithms and they really understand the math but they don’t have a lot of real world experience like cleaning data, munging it, twisting and pivoting it, and knowing when it’s wrong.

My ideal person would not only have that AI background but they’d have years of database and ETL experience so that they would know and when something smells, they can smell it.

Similarly, somebody with a good machine learning intuition because, sometimes, you’ll get somebody and you’re building a model and you can’t hit the performance you want to do and they just always resort to the most fun thing which is either feature engineering or they’re saying, “Hey, let’s throw some deep learning at it and do deep learning like the newest flashiest thing.”

So my most productive ML folks have been people who have a good intuition about “Okay, well, what’s the real problem here? Maybe we can solve it a different way.”

I’d say the data intuition and ML intuition.

Ledge: I love that. Great insights! I can tell you that, universally, every time I’ve asked this question a hundred times, there’s a quick gloss over experience and actual technical skill and straight to communications, problem solving.

Intuition is a new one and that discernment to understand the real context in which the work matters. That’s a great insight. I love that.

Lars, it’s really cool to have you on. I totally appreciate your time and it’s been really instructive.

Lars: Thanks a lot. It has been great.