Building enterprise AI products with Spellbook

Building enterprise AI products with Spellbook


Hey readers, this is the series in which I interview people at the forefront of building AI products for enterprises. Through these interviews, I hope to share the hard-won lessons these folks have gotten from doing large-scale deployments. If you enjoyed this interview, who should we interview next?

In this interview, I speak with Spellbook’s Scott Stevenson (CEO and co-founder) about building AI products. Their product uses AI to review and draft contracts and they have always been at the forefront of adopting the latest models to legal work, such as launching the first agentic product for lawyers, Spellbook Associate. Spellbook is used by over 2,600 law firms, professional services, & in-house teams including the likes of Addleshaw Goddard (Global Law 200), Nestle (Fortune 100), and BDO (top 5 auditing firm). Spellbook recently raised a $20 million series A from Inovia Capital and strategic investor Thomson Reuters.

  1. Embrace the “Skepticism Window”: Capitalize on the period when a new AI technology faces skepticism but shows promise. Scott’s experience with agentic AI mirrors the early days of GPT models, suggesting that this skepticism often precedes rapid adoption and improvement.

  2. Focus on Hard Sub-Problems: In complex AI systems like Spellbook Associate, solving difficult sub-problems (e.g., manipulating a 100-page legal document) is crucial before tackling higher-level tasks. This approach ensures a solid foundation for more advanced features.

  3. Leverage Existing Workflows: Spellbook’s success partly comes from integrating with lawyers’ familiar processes, like track changes. This reduces adoption friction and addresses potential concerns about AI errors.

  4. Sequence Feature Expansion: Start with core workflows and gradually broaden functionality. This strategy allows for mastering specific use cases before expanding, as seen in Spellbook’s evolution from single to multi-document workflows.

  5. Ride the Foundation Model Wave: Build on top of rapidly improving foundation models to benefit from their advancements. Scott emphasizes the importance of capturing upside from new model releases like GPT-4 and o1.

  6. Value-Based Pricing in High-Stakes Industries: In fields like law where professional time is extremely valuable, flat-fee pricing can be more appealing than usage-based models. This approach simplifies billing and emphasizes the product’s value proposition.

  7. Balance AI Aggressiveness with Cost Control: Encourage aggressive AI usage in development while monitoring for excessive resource consumption. Scott mentions allowing developers to use AI freely but intervening when necessary to prevent runaway costs.

Kenn: Welcome Scott! Maybe to start, I’d love to hear the history of Spellbook leading up to the launch of Spellbook Associate. Along that journey, I’m curious how the problems you’re trying to solve have changed and, correspondingly, how the products you built changed as well over time.

Scott: Sure, I’ll start with the problem, which has always been the same. We started, technically, almost 5 plus years ago. There are three co-founders, and we all had our own stories. I was an engineer, and I had my own small business. One day I got a legal bill that took half the cash out of our bank account. That was a moment for me where I thought, “Wow, this is a huge problem.” Everyone has that experience of getting their first legal bill and thinking, “Holy cow, what did I just pay for?”

We set out to solve the inefficiency of legal work, that sort of value-to-cost equation not feeling good. My co-founder Daniel was a lawyer, so he saw the problem from the other side. He went to law school for years, racked up student loans, and then he graduated and started practicing. He realized he actually hated the work, which involved many nights until midnight with ten Word documents on the screen, trying to make sense of them and copy-pasting between them all.

That was the problem we’re trying to solve: basically, the legal document drudgery required primarily in business and transactional work. This includes contracts, employment agreements, setting up companies and minute books, doing VC transactions, and all these things that create so much friction. That problem has always been the same.

Scott: When we started, generative AI wasn’t around. We started with a templating tool, basically a glorified templating tool where a lawyer could build a template for an employment agreement and then use the same one for multiple clients. We sold that to around 100 firms, and we actually had over a hundred landing pages. We sold many variations; we had literally over a hundred variations of our messaging and tweaks on our product to sell it and angle it in different ways. We even had one that was direct to businesses.

It did okay but never really worked that well because legal work is ultimately unstructured. Lawyers can’t easily put their work into a template. Everything is pretty bespoke a lot of the time, and often a lawyer is working on someone else’s paper, someone else’s contract. You’re almost never working on a brand new contract.

As an engineer, I saw GitHub Copilot, and this was probably early 2022 or late 2021. I thought, “Whoa, this thing is amazing.” The early version of GitHub Copilot would auto-complete your code based on what you’ve written. What amazed me about it is you can use it in any situation without any setup. You didn’t have to create templates in advance or anything like that. It just worked out of the box all the time, and it’s in your existing workflow. For us, that was an “Aha!” moment. I thought, “This is what lawyers need.”

We launched the public version one at the end of summer 2022. This was before ChatGPT, before any other generative AI company launched to lawyers. Within three months, we had more revenue than our past three years. It was just this explosive moment. We had 30,000 lawyers on our waitlist.

Kenn: It’s really interesting to hear the journey. I’m curious, when you launched v1, was it mostly around helping generate or complete the next set of text similar to what GitHub Copilot does, or make edits? But then you’ve evolved it to also have benchmarks, broadly integrating the lawyer’s workflow. I guess I’m curious, was it just a function of what your customers were telling you to build, and how much of that is also driven by what you’ve seen developing in the foundation models becoming smarter?

Scott: Yeah, so I think a bunch of things have driven our new features. Let me share my screen. I think it’s helpful for you to see. So yeah, we started with autocomplete. Then we originally had a bunch of spells that would find issues for you. That kind of became our second thing, issue spotting. Here’s an example of our negotiate feature: I can give it an agreement, and it will tell me things that I might want to improve with the agreement. It can even personalize suggestions, so it learns over time the sort of things that I like to negotiate for.

This became possible with GPT-4. This sort of fine-grained document analysis wasn’t really doable or very good with GPT-3. When GPT-4 came around, this became more possible. This was something our customers really wanted. We just listen to our customers, see what they want, and yeah, the models advanced very quickly.

We do leverage a lot of off-the-shelf models. We do many things internally as well, but our general attitude is that foundation models are moving so fast. Most startups should build on top of them most of the time so that you capture the upside when something like GPT-4 is launched.

Kenn: What led to the creation of Associate? You already have a successful product sitting on top of a Word application. What led to building Associate, which automates much more of the work? Is that something you saw as an evolution of what the work will be for lawyers? Or is this something you’ve heard lawyers want, which is interesting too if that is the case, because that decreases their hourly billings?

Scott: I think since we launched Spellbook, the very first request we got was, “Can it work on multiple documents?” So even in 2022, we had customers asking. Spellbook works in Word, really on a single document at a time. We basically pushed the single document workflows as far as we could go. We built out just a ton of things that you can do: summarize changes, draft new clauses, draft full documents. We built out as much as we thought we could do at the single document level.

There are all these multi-document workflows that lawyers want to be able to do, too. You can’t really do that in the Word sidebar, so the big motivation for us is getting to a multi-document workspace.

I think also we just have faith that agentic AI is the next big thing. It feels so similar to what GPT-3 felt like, and GPT-2 even, where everyone was skeptical at first. And then all of a sudden, it actually works really well. Agents feel like they’ve gone through that exact same thing: immense skepticism, a lot of “Oh, your examples are cherry-picked,” and then, “Oh, actually, it’s working 80% of the time. Oh, actually, it’s working 90% of the time. Oh, actually, it’s working 95% of the time.”

We like to lean into that skepticism when something is new and it shows promise, and people aren’t really ready to accept it yet. I think that’s when you can be early. You know, those are all the signs of being early to a market. For us, we try to be really fast. We want to be two years ahead of the market, and two years ahead of all of our competitors is our goal. Moving fast is one of our competitive advantages as a startup.

Kenn: When I saw the Associate product, it’s like you’re starting to build into how you break down a task or request into subtasks and then execute those clearly, because in a multi-document setting, my wife would ask the junior associate to update the name of the company, but it’s actually spread across many docs, and they often miss one doc to update. That’s really painful. So I really like how you set up the Associate product.

I’m curious, since it’s early access, if there have been any surprises in terms of how you’ve seen your users use it, and any pitfalls that you’ve learned so far, being ahead of the market.

Scott: I think the main thing we’ve learned is that users want consistency. Yeah, it’s cool to be able to do anything, but users want a sort of consistency. Focusing on a couple of really specific, popular use cases has been really good for our user experience.

I think of it as more of a sequencing. I think it’s like you should nail your core workflows at first, and then once you have those nailed, you should broaden it and broaden it. I don’t think there’s one set point. I think it’s really a sequencing of wanting to start and make sure you have really solid use cases out of the gate, and then you want to expand from there.

Another thing I think about is, if you think about Spellbook Associate, there are really kind of two layers of planning. There’s the top level of, “Oh, I’m going to look at this document,” or “I’m going to revise that document.” And then there’s once it gets to a document, actually conducting all the changes it needs to make. That second part of actually going to the document and making the changes is still an immensely hard problem that agents help solve.

We have found it’s really good to focus on these hard steps. It’s almost like there’s an agent within an agent, and there’s no point having a high-level agent if you can’t do those steps really well. The step for us is taking a long 100-page legal document with instructions and manipulating the document. That’s really, really hard.

Kenn: I want to shift gears a little bit into user perception and user adoption. I think Spellbook is one of the most well-adopted legal tech AI tools, and lawyers typically are the ones that are most sensitive to any errors due to the liability nature of their job. How have you seen them adopt these tools? Is that a real concern you’ve seen from your users, and how do you mitigate it? Even a simple typo in the date is a material thing when you draft contracts.

Scott: It hasn’t been a problem for us, and that’s because all of our workflows are designed to use track changes and to have this user step where they’re accepting or rejecting a change. The same with Associate – it’s changing the docs, but it’s using track changes. You still have to go through and accept or reject the changes.

We’re never altering something without the diff. Someone, when they saw our Associate video on Twitter, said, “Oh, this is what makes agents make sense.” They’re like, “This is also why Cursor works, and this is also why Devin works, because there actually is a really good, known UX already for accepting and reviewing changes.”

That’s why this works. Because lawyers do this all day already – they go through changes and accept and reject them. And that’s what developers do all day – they go through pull requests and accept and reject them or review their own changes. So that’s what makes it work well and what really puts the onus on the user to make sure that what they’re reviewing… We’ve never had any kind of issues or worries about it because of that.

Kenn: Interesting. Now that you mentioned track changes, it sounds exactly like committing code to a repo. You have the lineage, you have the diff. It mirrors a lot with engineering. How does this apply to the Associate product? Does the user have to go through each doc and approve the changes? Is that the workflow you’re envisioning for Associate?

Scott: We actually have two versions. We have one that lives on your desktop and opens up Word and uses that Spellbook UI that I showed you, and you have a sidebar. Then we have another fully browser-based cloud version that just does everything in the background and hands you back a doc. Then you basically have to open the doc and go through it normally.

We’re getting feedback on whether they like that pure cloud experience or if they want this on their desktop talking to Word. There are a bunch of trade-offs there that we have to think about. But both of those workflows still require someone to go in and accept the changes.

Kenn: Being sensitive to time, I know you have a hard stop. I want to shift to the last question, which is not often talked about – and feel free to keep it confidential if you don’t want to talk about it – how do you think about pricing and cost for Associate? I think it’s been a top concern, at least for my day job at Smartsheet. We’re always worried about AI cost blowing up. How do you price a product like Associate? And how do you think about costing for it?

Scott: It’s a great question, and it’s very difficult because the costs are so dynamic. We’ve decided to go with basically a flat fee. We have a basic product and a premium tier that has more features. We actually don’t do any usage-based billing at all today. We might at some point.

We just try to make sure that we’re taking into account margins and usage. Generally, if you think about the value of a lawyer’s time, lawyers are billing $400 to $800 to $1000 an hour somewhere in that range. So it doesn’t take a lot for us to make up for that for them. We can make a pretty good case with flat-fee billing.

We’ve thought about usage billing all the time, but it’s just hard because things are so dynamic. Generally, we tell our dev team to use AI very aggressively and not to worry about cost. That’s been totally fine. There’s been one or two moments when I’m like, “Hey, you’re auto-running this huge job every time someone opens a doc – we can’t do that.” I will monitor the charts. But generally, the margins have been good.

That’s simplified things for us and for our customers. It is something we think about, too. It’s really hard because the costs are so variable. But our goal is to provide so much value to our customers that hopefully, our value-add is enough that we don’t have to worry about it too much.

Kenn: Okay, that sounds good. That’s a great point about lawyer per hour being so high – the value is high. Scott, I super appreciate the time.

Scott: Amazing, great. Thanks so much, Kenn. It was great to chat.

Thank you for reading! If you enjoyed this interview, who should we interview next? And if you haven’t subscribed, subscribe below to get the latest posts.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.