Inside Claude Code From the Engineers Who Built It
At Every, the team credits Claude Code with transforming the way they work. They now ship to codebases they barely know, each new feature makes the next easier to build, and even non-technical teammates confidently use the terminal. To explore how this happened, AI & I host Dan Shipper invited Claude Code’s creators—Cat Wu (@_catwu) and Boris Cherny (@bcherny) from Anthropic AI—to discuss what they’ve learned from building one of the most beloved AI engineering tools in the world. This episode is a must-watch for anyone—technical or not—who wants to understand how to use Claude Code like the people who built it. If you found this episode interesting, please like, subscribe, comment, and share. Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT here: https://every.ck.page/ultimate-guide-to-prompting-chatgpt. It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: Subscribe to Every: https://every.to/subscribe Follow him on X: https://twitter.com/danshipper Build your first AI-powered app at ai.studio/build. Timestamps: 00:00:00 - Start 00:01:26 - Introduction 00:02:25 - Claude Code’s origin story 00:07:03 - How Anthropic dogfoods Claude Code 00:14:06 - Boris and Cat’s favorite slash commands 00:15:49 - How Boris uses Claude Code to plan feature development 00:21:53 - Everything Anthropic has learned about using sub-agents well 00:26:16 - Use Claude Code to turn past code into leverage 00:33:14 - The product decisions for building an agent that’s simple and powerful 00:36:38 - Making Claude Code accessible to the non-technical user 00:45:12 - The next form factor for coding with AI Links to resources mentioned in the episode: - Cat Wu: https://x.com/_catwu - Boris Cherny: https://x.com/bcherny - Claude Code: https://www.claude.com/product/claude-code
- Published
- Published Oct 29, 2025
- Uploaded
- Uploaded Jun 12, 2026
- File type
- Podcast
- Queried
- 00
- Source
- share.transistor.fm
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] What made it work really well is that CloudCode has access to everything that an engineer does at the terminal. Everything you can do, CloudCode can do. There's nothing in between. There's actually an increasing number of people internally at Anthropic that are using a lot of credits, spending over $1,000 every month. We see this power user behavior. This is something that they teach in YC. If you can solve your own problem, it's much more likely you're solving the problem for others. There's this really old idea and product called latent demand. [00:30] other use cases it wasn't really designed for. And you build for that because you kind of know there's demand for it. Do you think the CLI is the final form factor? Are we going to using cloud code in the CLI primarily in a year or in three years? Or is there something else that's better? [00:42] *music* [00:56] AMAR AMAR: This podcast is sponsored by Google. Hey, folks, I'm Amar, product and design lead at Google DeepMind. We just launched a revamped Vibe coding experience in AI Studio that lets you mix and match AI capabilities to turn your ideas into reality faster than ever. Just describe your app, and Gemini will automatically wire up the right models and APIs for you. [01:17] And if you need a spark, hit I'm feeling lucky and we'll help you get started. Head to ai.studio/build to create your first app. [01:27] Boris, thank you so much for being here. [01:29] Thanks for having us. Yeah. So for people who don't know you, you are the creators of Claude Code. Thank you very much from the bottom of my heart. It's I love Claude Code.
[01:39] Thank you. [01:40] That's amazing to hear. That's what we love to hear. [01:44] Okay, I think the place I want to start is when I first used it, there was like this moment. I think it was around when Sonnet 3.7 came out where I used it and I was like, holy shit. This is a completely new paradigm. It's a completely new way of thinking about code. And the big difference was you went all the way and just eliminated the text editor from [02:08] And you're just like, all you do is talk to the terminal. And that's it. And previous paradigms of AI programming, [02:18] previous harnesses have been like, you have a text editor, and you have the AI on the side, and it's kind of like, or it's the tab complete. So take me through like that. [02:27] decision process, that process of architecting this new paradigm. How do you [02:33] How do you think about that? Yeah, I think the most important thing is it was not intentional at all. [02:39] We sort of ended up with it. So at the time when I joined Anthropic, we were still on different teams at the time. There was this previous predecessor to QuadCode. It was called Clyde, like C-L-I-D-E. And it was this research project. It took like a minute to start up. It was this kind of like really heavy Python thing. It had to run a bunch of indexing and stuff. And when I joined, I wanted to ship my first PR. [03:03] and I hand wrote it like a noob. I didn't know about any of these tools. Thank you for admitting that on the podcast.
[03:14] I didn't know any better. And then I put up this PR, and Adam Wolf, who was the engine manager for our team for a while, he was my ramp-up buddy, and he just rejected the PR. And he was like, you wrote this by hand? What are you doing? He was Clyde. Because he was also hacking a lot on Clyde at the time. And so I tried Clyde. I gave it the description of the task, [03:33] one shot at this thing. [03:35] And this was like, you know, Sonnet 3.5. [03:38] So I still had to fix the thing even for this kind of basic task. And the harness was super old. So it took like five minutes to turn this thing out and just took forever. But it worked. And I was just mind blown that this was even possible. And they just kind of got the gears turning. Maybe you don't actually need an IDE. Yeah. [03:58] And then later on, I was prototyping using the Anthropic API. And the easiest way to do that was just building a little app in the terminal because that way I didn't have to build a UI or anything. And I started just making a little chat up. And then I just started thinking maybe we could do something a little bit like Clyde. So let me build like a little Clyde. And it actually ended up being a lot more useful than that without a lot of work. [04:28] to give the model tools, they just started using tools. And it was just, it was this insane moment. Like the model just wants to use tools. Like we gave it bash and it just started using bash, writing Apple script to like automate stuff. [04:39] in response to questions. And I was like, this is just the craziest thing. I've never seen anything like this. Because at the time I had only used IDEs with like, you know, like text editing, a little like one line autocomplete, multi-line autocomplete, whatever.
[04:52] So that's where this came from. It was this kind of convergence of like prototyping, but also kind of seeing what's possible in kind of like a very rough way. [05:01] And this thing ended up being surprisingly useful. And I think it was the same for us. [05:07] I think for me, it was like kind of Sonnet 4, Opus 4. That's where that magic moment was. Where I was like, oh my God, this thing works. [05:14] That's interesting. So like tell me about that the tool moment because I think that is one of the special things about cloud code is it just writes bash and it's really good at it. And I think a lot of previous agent architectures or even anyone building agent today your first instinct might be, okay, we're going to give it a find file tool and then we're going to give it a open file tool and you build all these like custom wrappers for, you know, [05:40] all the different actions you might want the agent to take, but [05:44] Cloud Code just uses Bash and it's like really good at it. So how do you think about, um, [05:48] How do you think about what you learned from that? [05:51] - Yeah, I think we're at this point right now where QuadCode actually has a bunch of tools. I think it's like a dozen or something like this. We actually add and remove tools most weeks. So this changes pretty often. But today there actually is a search, there's a tool for searching. And we do this for two reasons. One is the UX, so we can show the result a little bit nicer to the user 'cause there's still a human in the loop right now for most tasks. [06:14] And the second one is for permissions. So if you say in your like quad code, like settings.json, this file you cannot read, we have to kind of enforce this. We enforce it for bash, but we can do it a little bit more efficiently for permissions.
[06:26] if we have a specific search tool. But definitely we want to unship tools and keep it simple for the model. Like last week or two weeks ago, we unshipped the LS tool. [06:37] Because in the past we needed it, but then we actually built a way to enforce this kind of permission system for Bash. [06:44] So in bash if [06:46] We know that you're not allowed to read a particular directory. Quad's not allowed to ls that directory. And because we can enforce that consistently, we don't need this tool anymore. And this is nice because it's a little less choice for Quad, a little less stuff in context. Got it. And how do you guys split responsibility on the team? [07:02] So [07:02] I would say Boris sets the technical direction and has been the product visionary for a lot of the features that we've come out with. I see myself as more of like a supporting role to make sure that that one that like our pricing and packaging resonates with our users to making sure that we're shepherding our features across the launch process. [07:32] communicating that to our end users. And there's definitely some new initiatives that we're working on that I would say historically, a lot of Cloud Code has been built bottoms up, like Boris and a lot of the core team members have just had these great ideas for to-do lists, sub-agents, hooks, like all these are bottoms up. [07:53] As we think about expanding to more services and bringing quad code to our places, I think a lot of those are more like, all right, let's talk to customers. Let's bring engineers into those conversations and prioritize those services and knock them out. Got it. What is ant fooding? Oh, ant fooding is... Oh, ant fooding? Oh, it means dog fooding.
[08:17] Anthropic ants. I got it. [08:23] And so ant fooding is our version of dog fooding. Internally, over, I think, 70 or 80% of ants, technical anthropic employees use quad code every day. And so every time we are thinking about a new feature, we push it out to people internally and we get so much feedback. We have a feedback channel. I think we get a post every five minutes. And so you get really quick signal [08:47] on whether people like it, whether it's buggy, or whether [08:51] it's not good and we should unship it. You can tell. You can tell that someone that is building stuff is using it all the time to build it. [09:01] because like its ergonomics just makes sense if you're trying to build stuff and that that only happens if you're like [09:07] And fooding. Yeah. And I think that that's a really interesting paradigm for building new stuff like that sort of bottoms up. I make something for myself. Tell me about that. [09:20] Yeah, and Cat is also so humble. I think Cat has a really big role in the product direction also. It comes from everyone on the team. And these specific examples, this actually came from everyone on the team. To-do lists and subagents, that was Sid, Hooks, Dixon shipped that, plugins, Daisy shipped that. So everyone on the team, these ideas come from everyone. [09:40] And so I think for us, like we build this core agent loop and this kind of core experience. And then everyone on the team uses the product all the time. And so everyone outside the team uses the product all the time. And so there's just all these chances to build things that serve these needs. Like, for example, like bash mode, you know, like the exclamation mark and you can type in bash commands. This was just like many months ago. I was using quad code and I was going back and forth between two terminals and just thought it was kind of annoying.
[10:09] I asked Cloud to kind of think of ideas, the thought of this like exclamation mark bash mode. And then I was like, great, make it pink and then ship it. [10:17] It just did it. And like, that's the thing that's still kind of persisted. And, you know, now you see kind of others also kind of catching on to that. That's funny. I actually didn't know that. And that's extremely useful because I always have to open up a new tab to like run any bash commands. So you just do an exclamation point and then it just like runs it directly instead of filtering it through all the Cloud stuff. Yeah, and Cloud Code sees the full output too. Interesting. [10:39] Perfect. So anything you see in the Cloud Code view, Cloud Code also sees. Okay, that's really interesting. And this is kind of a UX thing that we're thinking about. Like in the past tools were built for engineers, but now it's, [10:49] equal parts engineers and model. And so like as an engineer, you can see the output, but it's actually quite useful for the model also. And this is part of the philosophy also, like everything is dual use. So for example, the model can also call slash commands. So like, you know, I have a slash command for a slash commit, where I run through kind of a few different steps, like diffing and generating a reasonable commit message and this kind of stuff. I run it manually, but also Claude can run this for me. And this is pretty useful, because we get to share this logic, we get [11:19] We both get to use it. Yeah. What are the differences in designing tools that are dual use from designing tools that are used by one or the other? [11:30] surprisingly it's the same okay so far [11:34] Yeah, I sort of feel like this kind of elegant design for humans translates really well to the models. So you're just thinking about what would make sense to you and the model generally...
[11:45] It makes sense to the model too, if it makes sense to you. [11:47] Yeah, I think one of the really cool things about Cloud Code being a terminal UI and what made it work really well is that Cloud Code has access to everything that an engineer does at the terminal. And I think when it comes to whether the tool should be dual use or not, I think making them dual use actually makes the tools a lot easier to understand. It just means that, okay, everything you can do, Cloud Code can do, there's nothing in between. Yeah. [12:12] That's interesting. Yeah, there are a couple of those decisions. So... [12:18] No, no code editor. It's in the terminal so it has access to your files um [12:24] And it's it's on your computer versus like in the cloud and a virtual machine. So you get like repeated [12:32] you get to use it in a repeated way where you can build up your Cloud MD file, or build slash commands and all that kind of stuff, where it becomes very composable. [12:42] and extensible from a very simple way. [12:46] starting point and I'm curious about how you think about [12:50] you know, for people who are thinking about, okay, I want to build an agent, I want to build, probably not cloud code, but like something else, how you get that, that simple package that, [13:00] then can extend and be really powerful over time. [13:03] Hmm. [13:04] I'm [13:05] For me, I'd start by just thinking about it like developing any kind of product where you have to solve the problem for yourself before you can solve it for others. And this is something that they teach in YC is you have to start with yourself. So if you can solve your own problem, it's much more likely you're solving the problem for others. And I think for coding, starting locally is the reasonable thing. And now we have quad code on the web, so you can also use it with a virtual machine. And you can use it in a remote setting. And this is super useful when you're on the go.
[13:35] your phone. We started proving this out kind of a step at a time where you can do at quad in GitHub. And I use this every day. On the way to work, I'm like, "At a red light, I probably shouldn't be doing this." But I'm like, "Yeah, yeah, yeah." On GitHub, "At a red light." And then I'm like, "At quad, fix this issue," or whatever. And so it's just really useful to be able to control it from your phone. And this kind of proves out this experience. I don't know if this necessarily makes sense for every kind of use case. For coding, I think starting locals, right? [14:03] I don't know if this is true for everything though. [14:05] Got it. What are the slash commands you guys use? [14:08] slash PR commit. [14:10] Yeah. Yeah, it's, I think the PR commit slash command makes it a lot faster for call to know exactly what bash commands to run in order to make a commit. And what does the PR commit slash command do for people who aren't familiar? [14:26] Oh, it just tells it exactly how to make a commit. Okay. And you can say, okay, these are the three bash commands that need to be run. Got it. And what's pretty cool is also we have this kind of templating system built into slash commands. So we actually run the bash commands ahead of time. [14:43] they're embedded into the slash command. [14:46] and you can also pre-allow certain tool invocations. So for that slash command, we say allow, you know, git commit, git push, ghpr, and so you don't get asked for permission. [14:57] after you run this hash command because we have like a permission based security system. And then also it uses Haiku, which is pretty cool. [15:04] So it's kind of a cheaper model and faster. Yeah, and for me, I use like commit, commit PR, feature dev, we use a lot, so like Sid created this one. It's kind of cool, it kind of like walks you through step by step.
[15:17] building something. So we prompt Quad, like, first ask me how to, what exactly I want, like, build the specification. And then, you know, kind of like build like a detailed plan and then make a to-do list, walk through step by step. So it's kind of like more structured feature development. Yeah. [15:33] And then I think the last one that we probably use a lot, so we use like security review for all of our PRs and then also code review. [15:39] So like Claude does all of our code review internally at Anthropic. You know, there's still a human approving it, but Claude does kind of the first step in code review. That's just a slash code review slash command. [15:49] Got it. Yeah, what are the things... I would love to go deeper into, like, the... [15:53] how do you make a good plan? So the sort of the feature dev thing, because I think there's a lot of like little tricks that I'm starting to find or people at every start starting to find that work. And I'm curious, like what [16:04] What are the things that we're missing? So, for example, one step in the one unintuitive step of the of the, you know, plan development process is. [16:14] Even if I don't exactly know what the thing that needs to be built is, I just have like a little sentence in my mind, like I want feature X. I have Claude just like implement it. [16:23] just without giving it anything else. And I see what it does. And that helps me understand like, okay, here's actually what I mean, because it made all these different mistakes or like it, it did something that I didn't expect that might be good. And then I use that, like the learning from the sort of throwaway development [16:40] I just clear it out and then that helps me write a better plan spec for the actual feature development, which is something that you would never do before because it'd be too expensive to just like YOLO send in an engineer on a feature that you hadn't actually spec'd out. But because you have Claude going through your code base and doing stuff, you can like learn stuff from it that helps inform the actual plan that you make.
[17:01] Yeah, I feel maybe I can start and I'm curious how you use it too. I think there's like a few different modes maybe for me, like one is prototyping mode. So like traditional engineering prototyping, you want to kind of build the simplest possible thing that touches all the systems just so you can kind of get a vague sense of like, what are the systems? There's unknowns and just to kind of trace through everything. [17:31] checkpoint and then try again. [17:34] I think there's also... [17:36] maybe two other kinds of tasks. So one is just things that quad can one shot. And I feel pretty confident it can do it. So I'll just tell it and then I'll just go to a different tab and I'll shift tab to auto accept and then just go do something else or go to another one of my quads and tend to that while it does this. [17:52] But also there's this kind of like harder feature development. So these are, you know, things are maybe in the past it would have taken like a few hours of engineering time. And for this, usually I'll shift tab into plan mode and then align on the plan first before it even writes any code. [18:06] And I think what's really hard about this is the boundary changes with every model in kind of a surprising way. Where the newer models, they're more intelligent. So the boundary of what you need plan mode for got pushed out like a little bit. Like before you used to need to plan, now you don't. [18:22] And I think this is general trend of like stuff that used to be scaffolding with a more advanced model. It gets pushed into the model itself and the model kind of tends to subsume everything over time.
[18:32] Yeah, how do you think about like building a agent harness? [18:37] that isn't just going to, like, you're not spending a bunch of time building stuff that is just going to be subsumed into the model in three months when the new cloud comes out? Like, yeah, how do you know what to build versus what to just say, well, [18:51] It doesn't work quite yet, but next time it's going to work. So we're not going to spend time on it. [18:55] I think we build most things that we think would improve Cloud Code's capabilities, even if that means we'll have to get rid of it in three months. If anything, we hope that we will get rid of it in three months. [19:06] Thank you. [19:07] I think for now, we just want to offer the most premium experience possible. And so we're not too worried about throwaway work. [19:15] Interesting. [19:16] Yeah, and an example of this is something like even like plan mode itself. I think we'll probably unship it at some point when Claude can just figure out from your intent that you probably want to plan first. Or, you know, for example, I just deleted like 2000 tokens or something from the system prompt yesterday. Just because like, like Sonnet 4.5 doesn't need it anymore. [19:34] But Opus, Opus 4.1 did need it. - What about in the case where [19:40] The latest Frontier model doesn't need it, but you're trying to figure out how to make it more efficient because you have so many users that maybe you're not going to use Opus or Sonnet 4.5 for everything. Maybe you're going to use Haiku. So there's a tradeoff between having a more... [19:56] elaborate harness for Haiku versus just like not spending time on it, using Sonnet, eating the cost, and working on more frontier type stuff.
[20:04] In general, we've positioned QuadCode to be a very premium offering. So our North Star is making sure that works incredibly well with the absolutely most powerful model we have, which is Sonic 4.5 right now. We are investigating how to make it work really well for like future generations of smaller models, but it's not the top priority for us. Okay. What do you think about the [20:29] You know, one thing that I notice is [20:32] We get models often and thank you very much for this we get models a lot before they come out and it's our job to kind of figure out is it any good and [20:41] Over the last six months, [20:43] When I'm testing Claude, for example, in the Claude app, [20:48] With a new frontier model, it's actually very hard to tell whether it's better immediately. But it's really easy to tell in cloud code because the harness matters a lot for the performance that you get out of the model. And you guys have the benefit of building cloud code inside of Anthropix. So there's a much tighter integration between the fundamental model training and the harness that you're building. And they seem to really impact each other. [21:18] work internally and what are the benefits you get from having that tight integration. [21:24] Yeah, I think the biggest thing is like researchers just use this. And so, you know, as they see what's working and what's not, they can they can improve stuff. [21:31] We do a lot of evals and things like that to communicate back and forth and understand where exactly the model's at.
[21:38] Um, [21:40] But yeah, there's this frontier where you need to give the model a hard enough task to really push the limit of the model. And if you don't do this, then all models are going to equal. But if you give it a pretty hard task, you can tell the difference. [21:53] What sub-agents do you use? [21:54] I have a few. I have like a planner sub-agent that I use. I have a code review sub-agent. Code review is actually something where sometimes I use a sub-agent, sometimes I use a slash command. So usually in CI it's a slash command, but in synchronous views I use a sub-agent for the same thing. [22:08] Um... [22:11] It's a good question. Yeah, maybe it's like a matter of taste. Yeah, I don't know. I don't know. I think it's maybe when you're running synchronously, it's kind of nice to fork off the context window a little bit because all the stuff that's going on in the code review, it's not relevant to what I'm doing next. But in CI, it just doesn't matter. Are you ever spawning like 10 subagents at once? [22:33] And for what? [22:34] For me, I do it mostly for like big migrations. Okay. That's like the big thing. Actually, we have, so this like coder use slash command that we use, there's a bunch of subagents there. And so one of the steps is like find all the issues. And so there's one subagent that's like checking for quad MD compliance. There's another subagent that's looking through Git history to see what's going on. Another subagent that's looking for kind of obvious bugs. And then we do this like kind of deduping quality step after. So they find a bunch of stuff. A lot of these are false positives. And so then we spawn like five more subagents. [23:04] These are all just like checking for false positives. And in the end, the result is awesome. It finds like all the real issues without the false issues. That's great. I actually do that. So one of my non-technical cloud code use cases is expense filing. So like when I'm, I'm in SF right now. So like I have all these expenses. And so I built this little cloud project that, or in cloud code that,
[23:25] It uses one of these finance APIs to just download all my credit card transactions. And then it decides these are probably the expenses that I'm going to have to file. And then I have two subagents, one that represents me and one that represents the company. And they do battle to figure out what's the proper actual set of expenses. [23:47] It's like an auditor subagent and a pro Dan subagent. So, yeah, that kind of thing. [23:55] or... [23:56] pattern seems to be like an interesting one. Yeah, yeah, it's cool. I feel like when sub-agents were first becoming a thing, actually when it inspired us, there was like a Reddit thread a while back, where someone made sub-agents for like, there was like a front-end dev and a back-end dev and like a, I think like a designer. Testing dev. Testing dev. Like there was like a PM sub-agent. And this is like, you know, it's cute. Like it feels like a little maybe too anthropomorphic. Maybe there's something to this, but I think like the value is actually like the uncorrelated context windows. We have these two context windows that don't know about each other. [24:25] And this is kind of interesting. And you tend to get better results this way. [24:29] What about you, Ari? Do you have any interesting sub-agents you use? So I've been tinkering with one that is really good at front-end testing. So it uses Playwright to see, all right, what are all the errors that are client-side, and pull them in and try to test more steps of the app. [24:48] It's not totally there yet, but I'm seeing signs of life and I think it's the kind of thing that we could potentially bundle in one of our plugins marketplaces.
[24:59] Yeah, definitely. I've used something like that just with Puppeteer and just like watching it. [25:05] build something and then open up the browser and then be like, oh, I need to change this. It's like, this is like, oh my God. Yeah, it's really cool. It's really cool. I think we're starting to see the beginnings of this like massive, like multi-massive subagents. I don't know what to call this, like swarms or something. There's a bunch of people. There's actually an increasing number of people internally at Anthropic that are using like a lot of credits every month. Like, you know, like spending like over a thousand bucks every month. And this like this percent of people is growing actually pretty fast. [25:35] And so what they're doing is like framework A to framework B. There's like the main agent and makes a big to-do list for everything. [25:42] And then just kind of map reduce over a bunch of sub-agents. So you instruct a lot of like, yeah, like start 10 agents and then just go like, you know, 10 at a time and just migrate all the stuff over. That's interesting. [25:53] What would be a concrete example of the kind of migration that you're talking about? [25:56] I think the most classic is like lint rules. So there's like, you know, there's some kind of lint rule you're rolling out. There's no autofixer because it's like, you know, like AST analysis can't really. It's kind of too simplistic for it. [26:07] I think other stuff is like framework migrations. Like we just migrated from like one testing framework to a different one. That's a pretty common one where it's super easy to verify the output. [26:15] One of the things I found... [26:17] is, and this is both for projects inside of Every and then just open source projects, it's like if you're someone building a product and you want to build a feature that's been done before, so maybe like an example that people might need to implement a bunch is like memory. How do you do memory? Um...
[26:36] because we have a bunch of different products internally, you can just like spawn cloud sub agents to be like, how do these three other products do it? And there's like possibility for just like tacit code sharing where you don't need to like have an API or you don't need to like ask anyone. You can just be like, [26:48] How do we do this already? And then use the best practices to [26:53] to build your own. And you can also do that with open source because there's like tons of open source projects where people are like, you know, they've been working on memory for like a year and it's like really, really good. And you'd be like, what are the patterns that people have figured out and which ones do I want to implement? [27:23] copy the relevant parts. Yeah. Is there, have you found any use for like log files of, okay, you know, here's the full history of like how I implemented it and like is that important to give to Claude and how are you implementing that or making it useful for it? [27:41] Some people swear by it. There are some people at Anthropik where for every task they do, they tell Cloud Code to write a diary entry in a specific format that just documents like, what did it do? What did it try? Why didn't it work? And then they even have these agents that like look over the past memory and synthesize it into observations. I think this is like the starting budding thing. [28:04] There's something interesting here that we could productize, but it's a new emerging pattern that we're seeing that works well. I think the hard thing about one-shotting memory from just one transcript is that it's hard to know how relevant a specific instruction is to all future tasks. Like our canonical example is, if I say make the button pink, I don't want you to remember to make all buttons pink in the future. And so I think synthesizing the memory from a lot of logs
[28:33] is a way to find these patterns more consistently. It seems like you probably need, like, there's some things where you're going to know... [28:43] you'll be able to summarize, like synthesize or summarize in this sort of like top down way. Like this will be useful later and you'll know the right level of abstraction at which it might be useful. But then there's also a lot of stuff where it's like, [28:56] You actually, you know, any given, like, commit log, like, make the button pink, it could be useful for... [29:03] kind of an infinite number of different reasons that you're not going to know beforehand. So you also need the model to be able to look up all similar past commits and [29:14] surface that at the right time. [29:16] Is that something that you're also thinking about? [29:18] Yeah, I think there could be something like that. Maybe I think one way to see it is this kind of like traditional memory storage work like Memex, like kind of stuff where you just want to like put all the information into the system and then it's kind of a retrieval problem after that. [29:35] Thank you. [29:36] yeah i think as the model also gets smarter it naturally i've seen it start to naturally do this also with sonic 4 5. [29:42] where if it's stuck on something, it'll just naturally start looking, like we talked about before, like using bash spontaneously, to just like look through git history and be like, Oh, okay, yeah, this is kind of an interesting way to do it. [29:53] Yeah. One of the things that we were talking before we started recording, one of the things that we're doing inside of every, I feel like it has really changed the way that we do engineering because everyone is cloud code-pilled, like CLI-pilled. And we have this engineering paradigm that we call compounding engineering where...
[30:13] In normal engineering, every feature you add, it makes it harder to add the next feature. And in compounding engineering, your goal is to make the next feature easier to build from the feature that you just added. And the way that we do that is we try to codify all the learnings from everything that we've done to build the feature. So how did we make the plan and what parts of the plan needed to be changed? Or when we started testing it, what issues did we find? What are the things that we missed? [30:43] the slash command so that the next time when someone does something like this, it catches it and that makes it easier. And that's why for me, for example, I can hop into one of our code bases and start [30:53] like being productive even though I'm, I don't know anything about how the code works because we have this like built up memory system of, of all the stuff that we've learned as we've implemented stuff. [31:05] But we've had to build that ourselves. I'm curious, are you working on that kind of loop so Cloud Code does that automatically? [31:14] Yeah, we're starting to think about it. It's funny, we heard the same thing from Fiona. She just joined the team and she's our manager. She hasn't coded in like 10 years, something like that. [31:26] And she was winning PRs on her first day. [31:29] And she was like, yeah, like not only did I kind of, I forgot how to code and quad code kind of made it super easy to just get back into it. But also I didn't need to ramp up on any context because I kind of knew all this. And I think a lot of it is about like.
[31:43] When people put up pull requests for quad code itself, and I think our customers tell us that they do similar stuff pretty often. [31:51] If you see a mistake, [31:52] I'll just be like @quad, add this to quadmd. [31:56] So that the next time it just knows this automatically. And you can kind of like instill this memory in kind of a variety of ways. So you can say like at quad, add it to quad MD. You can also say at quad, write a test. You know, that's like an easy way to make sure this doesn't regress. And I don't feel bad asking anyone to write tests anymore. Right. It's just like super easy. And like, I think probably close to 100% of our tests are just written by quad. And if they're bad, we just won't commit it. And then the good ones stay committed. [32:22] And then also I think lint rules are a big one. [32:26] we actually have a bunch of internal Lint rules. Claude writes 100% of these. And this is mostly just like @Claude in a PR, write this Lint rule. [32:34] And yeah, there's sort of this problem right now about like, how do you do this automatically? And I think generally how like Kat and I think about it is we see this like power user behavior [32:44] And the first step is how do you enable that by making the product hackable? [32:48] So the best users can figure out how to do this cool new thing. Right. But then really the hard work starts of like, how do you take this and bring it to everyone else? And for me, I kept myself in the everyone else bucket. Like, you know, I don't really know how to use Vim. Like, I don't have this like crazy like T-box set up. So I have like a pretty vanilla set up. So if you can... [33:07] make a feature that I'll use, it's a pretty good indicator that like other kind of average engineers will use it. [33:12] - That is interesting. Tell me about that, 'cause that's something I think about all the time is,
[33:17] Yeah. [33:18] Making something that is extensible and flexible enough that power users can find like novel ways to use it that you would not have even dreamed of. But it's also simple enough that anyone can use it and they can be productive with it. And you can kind of pull what the power users find back into like the basic experience. Like how do you think about making those design and product decisions so that you enable that? [33:39] In general, we think that every environment is a little bit different from the others, and so it's really important that every part of our system is extensible. So everything from your status line to adding your own slash commands through to hooks, which let you [33:55] insert a bit of determinism at pretty much any step in quad code. So we think these are [34:00] These are like the basic building blocks that we give to every engineer that they can play with. [34:06] For plugins, plugins is actually our, so it was built by Daisy on our team. And this is our attempt to make it a lot easier for the average user like us to bring these slash commands and hooks into our workflows. [34:30] like slash commands and just let you write one command in quad code to pull that in for yourself. [34:36] Hmm. [34:37] There's this really old idea and product called latent demand, which I think is probably the main way that I personally think about product and think about what to build next. It's a super simple idea. You build a product in a way that is hackable, that is kind of open-ended enough that people can abuse it for other use cases it wasn't really designed for. Then you see how people abuse it, and then you build for that because you kind of know there's demand for it. And like...
[35:01] When I was at Meta, this is how we built all the big products. I think almost every single big product had this nugget of weight and demand in it. For example, something like Facebook dating, it came from this idea that when we looked at who looks at people's profiles, I think 60% of views were between people of opposite gender, so kind of like traditional setup. [35:22] that we're not friends with each other and so like oh man okay maybe there's like maybe if we like launch a dating product we can kind of harness this demand that exists that's interesting and for you know marketplace it was pretty similar i think it was like 40 percent of posts in facebook groups at the time were by sell posts and so like okay people are trying to use this product to buy itself we just build a product around it that's probably going to work and so we think about it kind of similarly but also we have the luxury of building for developers [35:52] As a user of our own product, it makes it so fun to build and use this thing. And so, yeah, like Kat said, we just built the right extension points, we see how people use it, and that kind of tells us what to build next. [36:02] Like, for example, we got all these user requests where people were like, dude, QuadCode is asking me for all these permissions. And I'm out here getting coffee. I don't know that it's asking me for permissions. How can I just get it to, like, ping me on Slack? And so we built hooks. Dixon built hooks so that people could get pinged on Slack. And you could get pinged on Slack for anything that you want to get pinged on Slack for. And so it was very much like... [36:29] people really wanted the ability to do something. We didn't want to build the integration ourselves. And so
[36:35] We exposed hooks for people to do that. The thing that makes me think of is you recently released, you kind of moved or rebranded how you talk about Cloud Code to be this more general purpose agent SDK. Was that driven by some latent demand where you sort of saw there's a more general purpose use case for what you built? [36:57] We realized that similar to how you were talking about using CloudCode for things outside of coding, we saw this happen a lot. Like we get a ton of stories of people who are using CloudCode to like help them write a blog and like manage all the like data inputs and take a first pass in their own tone. We find people building like email assistance on this. [37:27] agent that can just go on for an infinite amount of time as long as you give it a concrete task and it's able to fetch the right underlying data. So one of the things I was working on was I wanted to look at all the companies in the world and how many engineers they had and to create a ranking. And this is something that QuadCode can do even though it's [37:46] not a traditional coding use case. So I realized that the underlying primitives were really general, [37:51] As long as you have an agent loop that can continue running for a long period of time, and you're able to access the internet and write code and run code, [38:03] Pretty much you can... [38:04] If you squint, you can kind of build anything on it. And I think at the point where we rebranded it from the Quad Code SDK to the Quad Agent SDK, there was already many thousands of companies using this thing. And a lot of those use cases were not about coding. So it's like both internally and externally. We kind of saw that. It's like health assistants, like financial analysts, legal assistants. It was pretty broad. Yeah. What are the coolest ones?
[38:33] I feel like actually you had Noah Breyer on the podcast recently. I thought like the Obsidian like kind of mind mapping note keeping use case is really cool. It's funny. It's insane how many people use it for this. Yeah. This particular combination. I think some other like some coding or kind of coding adjacent use cases that are kind of cool is we have this like issue tracker for quad code. The team's just like constantly underwater like trying to keep up with all the issues coming in. There's just so many. And so like quad dedupes the issues and it automatically [39:03] it. It also does first pass resolution. So usually when there's an issue, it'll proactively put up a PR internally. And this is a new thing that Inigo on the team built. Um, [39:13] So this is pretty cool. There's also like on call and kind of collecting signals from other places, like getting like sentry logs and getting like logs from BigQuery and kind of collating all this. Plus, it's really good at doing this because it's all just bash in the end. And so these are all kind of these internal use cases that I saw. [39:31] Is it-- so when it's collating logs or deduping issues, is that like you have clods like continually running in the background? And is that-- [39:40] something that you're building for. [39:42] It gets triggered for that particular one. It gets triggered whenever a new issue is filed. So it runs once, but it can choose to run for as long as it needs. Got it. [39:51] What about the idea of Claude's Always Running? [39:53] Ooh, proactive quads. I think it's definitely where we want to get to. I would say right now, we're very focused on making quad code incredibly reliable for individual tasks. And if you think about
[40:07] Like if you think about like multi-line autocomplete and then like single turn agents and then now we're working on like quad code that can complete tasks. I feel like if you trace this curve, eventually you go to even higher levels of abstraction, like even more complicated tasks. And then hopefully. [40:24] The next step after that is a lot more proactivity. So just understanding what your team's goals are, what your goals are, being able to say, hey, [40:32] I think you probably want to try this feature and here's a first pass at the code and here are the assumptions I made and are these correct? [40:38] I can't wait. And I think probably right after that is Claude is now your manager. [40:48] That's not in the plan. So everyone on the team was like super excited that we were talking today and they gave me a bunch of questions and I want to make sure I hit all the questions. [41:01] Oh, here's a good one. Why did you choose agentic rag over vector search in your architecture? [41:08] still relevant? So actually initially we did use vector embeddings and [41:16] They're just really tricky to maintain because you have to continuously re-index the code and they might get out of date and you have local changes. So those need to make it in. And then as we thought about what does it feel like for an external enterprise to adopt it, we realized that this exposes a lot more surface area and like security risk. We also found that actually cloud code is really good and cloud models are really good at agentic search.
[41:46] agentic search and it's just a much cleaner deployment story. That's really interesting. [41:51] If you do want to bring semantic search to quad code, you can do so via an MCP tool. So if you want to manage your own index and expose an MCP tool that lets quad code call that, that would work. What do you think are the top MCPs to use with quad code? [42:07] Ooh, Puppeteer and Playwright are pretty high up there. Definitely, yeah. Sentry has a really good one. Asana has a really good one. [42:15] Do you think that there are... [42:18] Mwah! [42:18] Any power user tips that you see people inside of Anthropic or, you know, other people who are, you know, big power, you know, inside of organizations that are big cloud code power users that people don't know about, but they should. [42:34] um one thing that quad code doesn't naturally like to do but that i personally find very useful is um quad code doesn't naturally like to ask questions but you know if you're brainstorming with a thought partner a collaborator usually you do ask questions back and forth to each other and so this is one of the things that i like to do especially in plan mode i'll just tell quad code like hey we're just brainstorming this thing please ask me questions if there's anything you're [43:04] want you to ask questions and I'll do it. And I think that actually helps you arrive at a better answer. [43:09] There's also so many tips that we can share. I think there's a few really common mistakes I see people make. One is, like you said, not using plan mode enough. This is just super important. And I think this is people that are kind of new to agentic coding. They kind of assume this thing can do anything, and it can't. It's not that good today. And it's going to get better.
[43:29] But today it can one-shot some tasks. It can't one-shot most things. And so you kind of have to understand the limits and you have to understand where you get in the loop. And so something like plan mode, it can 2x, 3x access rates pretty easily if you land on the plan first. [43:43] Other stuff that I've seen power users do really well is companies that have really big deployments of quad code. And now, you know, luckily, there's a lot of these companies, so we can kind of learn from them. Having settings JSON that you check into the code base is really important because you can use this to pre allow certain commands. So you don't get permission prompted every time. And also to block certain commands. Let's say you don't want web fetch or whatever. And this way as an engineer, I don't get prompted. And I can check this in and share it with the whole team. [44:13] at [44:13] I get around that by just using dangerous they skip permissions. Yeah we kind of we kind of have this here but we don't you know we don't recommend it. It's like it's a model you know it can do it can do weird stuff. [44:25] I think another kind of cool use case that we've seen as people using stop hooks for interesting stuff. So stop hook runs whenever the turn is complete. So like this is in did some tool calls back and forth with whatever, and it's done and it returns control back to the user. Then we run the stop hook. So you can define a stop hook that's like, if the tests don't pass, return the text, keep going. [44:49] And essentially, it's like you can just make the model keep going until the thing is done. And this is just insane when you combine it with the SDK and this kind of programmatic usage. You can, you know, this is a stochastic thing. It's a non-deterministic thing. But with scaffolding, you can get these deterministic outcomes.
[45:05] So you guys started this sort of CLI, this CLI paradigm shift. Um... [45:12] Do you think the CLI is the final form factor? Are we going to be using Cloud Code in the CLI primarily in a year or in three years? Or is there something else that's better? [45:21] - I mean, it's not the final form factor, but we are very focused on making sure the CLI is like the most intelligent that we can make it, and that's as customizable as possible. [45:32] you can talk about the next form factors. [45:35] Yeah, I mean, Kat's asking me to talk about it because no one knows. Like this stuff's like it's just moving like so fast, right? Like no, no one knows what these form factors are. [45:45] Like right now, I think our team is in experimentation mode. So we have CLI, then we came out with the IDE extension. Now we have a new IDE extension that's like a GUI. It's a little more accessible. We have @clawed and GitHub, so you can just add quad it anywhere. Now there's at quad, there's quad on web and on mobile, so you can use it on any of these places. And we're just in experimentation mode. So we're trying to figure out what's next. I think like if we kind of zoom out and see where this stuff is headed, [46:13] I think one of the big trends is longer periods of autonomy. And so with every model, we kind of time how long can the model just keep going and do tasks autonomously and just, you know, in dangerous mode in a container, keep auto compacting until the task is done. And now we're on the order of like double digit hours. I think it's like the last model is like 30 hours. [46:32] Something like this. And, you know, the next model is going to be days. And as you think about kind of paralyzing models.
[46:38] there's kind of a bunch of problems that come out of this. So one is... [46:42] What is the container this thing runs in? Because you don't want to have to close your laptop. I have that right now because I'm doing a lot of Dyspy. I don't know. I've only read it, but DSP-Wire Dyspy prompt optimization, and it's on my laptop, and it's like, I don't want to close it. I'm in the way, no, with my laptop open because I'm like, I don't want to close it. Yeah, that's right. Yeah, we visited companies before, like customers. Everyone's just walking around with their quad codes. [47:09] Is this running? [47:12] from this mode. And then I also think pretty soon we're going to be in this mode of like clods monitoring clods. [47:17] And I don't know what the right form factor for this is, because as a human, you need to be able to inspect this and see what's going on. But also it needs to be quad optimized, where you're optimizing for bandwidth between the quad to quad communication. [47:30] So my prediction is... [47:34] Terminal is not the final form factor. My prediction is there's going to be a few more form factors in the coming months, you know, maybe like a year or something like that. And it's going to keep changing very quickly. [47:45] What do you think about, you know, I teach a lot of quad code to a lot of every subscribers. Thank you. You're welcome. Doing your work for you.
[47:57] And I think the, like, one of the big things is just. [48:01] The terminal is intimidating. [48:03] And just like being on a call with subscribers being like, here's how you open the terminal and you're allowed to do this even if you're non-technical. It's like a big deal. How do you think about that? Yeah, I, one of the people on our marketing team started using Cloud Code because she was writing some content that touched on Cloud Code. And I was like, you should really experience it. And she got like 30 pop-ups on her screen where she had to accept various permissions because she'd never used a terminal before. So I completely see eye to eye. [48:32] with you on that. It's definitely hard for non-engineers. And there's even some engineers we've found who aren't fully comfortable with working day-to-day in the terminal. Our VS Code GUI extension is our first step in that direction because you don't have to think about the terminal at all. It's like a traditional interface with a bunch of buttons. We are working on more... [48:55] graphical interfaces, so quad-code on the web, [48:58] is a GUI I think that actually might be a good starting point for people who are less technical yeah [49:03] There was this magic moment maybe a few months ago where I walked into the office and some of the data scientists at Anthropic Lakes sit right next to the quad code team. [49:14] And the data centers just had like quad code running on their computers. And I was like, what is this? Like, how did you figure this out? I think it was like Brandon was like the first one to do it. And he was like, oh, yeah, I just like installed it. Like I work on this product, so like I should use it. And I was like, oh, my God. So he like he figured out how do you like use a terminal and those? Yeah, it's like, you know, he hasn't really done this kind of workflow before. Obviously, like very technical. So I think now we're starting to see all these kind of like code adjacent
[49:41] like functions, people use Cloud Code. And yeah, it's kind of interesting. Like from a latent demand point of view, these are people hacking the product. So there's like demand to use it for this. And so we wanna make it a little bit easier with more accessible interfaces. But at the same time for us, for Cloud Code, we're laser focused on building the best product for the best engineers. And so we're focused on software engineering and we wanna make this like really good, but we wanna make it a thing that other people can hack. [50:09] Sometimes, quad code will write code that's a bit verbose, but you can just tell it to simplify it, and it does a really good job. Interesting. And so, how and when are you doing that? So, you're using a slash command, or you're... [50:23] I just say it. I just say simplify it. Sometimes you're like, hey, this should be a one line change. And I'll write five lines and you're like, simplify it. And it understands immediately what you mean and I'll fix it. Yeah, I think a lot of people on our team do that too. [50:36] That's interesting. Why do you like, why not then if you're saying that all the time, why not then? [50:42] you know, push that into like a slash command or the harness or something like that to, yeah, make it just happen automatically. [50:49] We do have instructions for this in the CloudMD. I think it impacts such a low percentage of conversations that we don't want it to over-rotate in the other direction. And then the reason why not a slash command is because you actually don't need that much context. I think slash command is really good for situations where you would otherwise need to write two, three lines. But even for plan mode, you actually
[51:14] can use a few words, but sometimes... [51:18] It actually takes two or three lines to capture the entirety of what you want in plan mode. For simplify it, you can just write "simplify it" and it gets it. Yeah, that makes sense. [51:27] Cool. Yeah. Okay. Now we're, we can, um, that's interesting. Yeah. But, but the stuff like, you know, it still feels just so early. Yeah. You know, like we were talking before, before the recording about like kind of where are we on the adoption curve and it's still the Haussian curve or whatever. It just feels, it just feels like we're, you know, like first 10% still like the stuff is going to change so fast. It's going to keep changing. [51:55] Even when I talk to researchers outside of Anthropic who've used COD code, they also get stuck on things like this, like not realizing that they can just tell the LLM to simplify it. And I think that just goes to show that even for people who are like working in this industry, they don't always realize that you can just talk to the model. That's the thing is like, I think that there's this underlying expectation that using AI shouldn't have to be a skill. [52:19] Like, because it just does whatever you say. And you're like, well, I mean... [52:24] whatever you say is going to matter for what it does. So if you can say things better, it's going to do better. [52:30] Yeah, I mean, it changes with every model, though. That's the hard part. Like, you know, prompt engineer was a job, and now famously it's not a job anymore. And there's going to be more jobs that are then like not jobs anymore. These kind of like little micro skills that you have to learn to use this thing. And as the model gets better, it can just like interpret it better.
[52:48] But I think that's also like for us, this is part of this kind of humility that we have to have building a product like this, that we just really don't know what's next. [52:55] And we're just trying to figure it out kind of along with everyone else. We're just here for the ride. And that's why it's cool that you're building it for yourself. Because I think that's the best way to know that. It's just like you're, and this is what we do too, is like, [53:07] You're sort of living in the future, you're using it all the time, and it's pretty clear what's missing. You're like, I just want this thing, and you can just do the next thing, rather than being like, hmm, let me ask some enterprise product manager at some gigantic company, what kind of AI feature do you want? And they're like, I don't know, put a little chatbot on the side of my IDE, and you're like, okay. [53:28] Yeah, this is the luxurious thing about building DevTools, right? You're your own customer. I think it's also really a unique thing about AI, because... [53:37] It sort of reset the game board for all software. So [53:43] you know [53:45] We have Quora, this email assistant. And we have Sparkle, which organizes your files. And it's like anything that you do for something that you want to use on your computer, if you're building it with AI, there's a good chance that hasn't been done before because the whole... [53:58] whole landscape has been reset and so it's a it's a uniquely exciting time to build stuff for yourself [54:04] Totally. I think it totally opens the playing field too. It's like, [54:08] any individual can now build an app to fill their need and then distribute it to everyone else. Yeah. It's really cool. [54:15] I've been prototyping all these random pet projects.
[54:19] Um... [54:20] um i just moved into a new apartment and it's empty and so i've been um i've been building this like shopping advisor assistant on like the cloud agent stk because who has time to like read all the reviews and like look at all the options and find their pricing and everything's like really hard to discover and so it just like asks me a bunch of questions and i tell it what i want and it shows me a bunch of yeah exactly and it shows me a bunch of photos like different sofas and options and what people say online and then i tell it what i don't like and it literally feels [54:50] like working with a shopping assistant. [54:53] It's been really cool. That's really cool. I also have my little email response agent that drafts responses for me. But I don't use email that much, so... [55:03] Oh, and I knew it wasn't you responding. [55:07] That's why it's seven days delayed. [55:11] The agent's just doing a very thorough job. [55:15] - HSDK's cool though. [55:17] Yeah, it always just feels amazing how much we're able to build with such a small team. [55:24] yeah so i feel like the other thing that's really cool is that i think people are just shifting their mindset from docs to demos like internally our currency is actually demos it's like [55:34] You want people to be excited about your thing? Show us 15 seconds of what it can do. And we find that everyone on the team now has this kind of indoctrinated in them. Democulture, for sure. And I think that's better because there's a lot of things that you might have in your head that,
[55:51] if you're a great writer maybe you could figure out how to explain it but it's just even then it's just really hard to explain but if someone can see it they like get it immediately and i think that's happening for product building but it's also happening for [56:02] like all sorts of other types of creative endeavors, like making a movie, for example, like you had to [56:08] pitch it. But now you can just be like, I made the Sora video and like, you know, check that you can kind of see like the glimmer of the thing you're trying to make for very cheap. And so that means you don't have to spend time convincing people as much. You're gonna be like, here, I made it. [56:22] And also as a builder, you can just make it and then make it again and then make it again until you're happy. I feel like the flip side is you used to make a doc or whiteboard something. I would draw stuff in Sketch or Figma or whatever, and now we'll just build it until I like how it feels. And it's just so easy to get that feeling out of it now. I think it's like you could see it visually before or you could describe it in words, but it's like you could never get the vibe. And now the vibe is really easy. [56:52] plan mode like three times. Yeah. [56:55] Because of this. Like you built it and then you threw it out and rebuilt it and then threw it out and rebuilt it? Yeah, or like to do's, like Sid built the original version, like also like three or four, he built like three or four prototypes. And then I prototype maybe like 20 versions after that, like in like a day. [57:10] I think this is like a lot of pretty much everything we released. There was at least a few prototypes behind it. How do you like [57:18] keep track of and carry forward the things you learn from prototype to prototype. And especially if it's like, you know, some one person is prototyping it and then you're like, I'm going to take it over. I'm going to do 20 more. Like, how do you
[57:29] how do you maximize what you get out of that? [57:32] You know, it's like there's maybe a few elements of it. One is the style guide. So there's like some elements of style that we discover. And I think a lot of this is like building for the terminal or like we're kind of discovering a new design language for the terminal and kind of building it as we go. And I think some of this you can codify in a style guide. So this is our QuadMD. But then there's this other part of it that's like kind of product sense where we're [57:56] I don't think the model totally gets it yet. [57:59] And I think maybe we should be trying to find ways to teach the model, this kind of product sense about this works and this doesn't. Because in product, you want to solve the person's problem in the simplest way possible and then delete everything else that's not that and just get everything out of the way. So you align the product to the intent. [58:17] as cleanly as possible and maybe the model doesn't totally get that yet [58:21] Yeah, it's never... [58:23] It doesn't really feel what it's like to use quad code. Like the model doesn't use quad code. [58:29] Yeah. [58:31] Yeah. And so I think like when, you know, a quad code can like test itself and it can kind of use itself. [58:37] We do this one developing and it can see UI bugs and things like that. [58:42] I don't know. Maybe we should just try prompting it though. [58:45] It could, like honestly a lot of the stuff is as simple as that like [58:48] When there's some new idea, usually you just prompt it, and often it just works. Maybe we should just try that. [58:54] A lot of the prototypes are actually the UX interactions. And so I think once we discover a new UX interaction like shift tab for auto accept, I think Boris figured out.
[59:09] That was Igor, actually. Oh, Igor. Yeah, we went back and forth. Then new things can fit into that. [59:14] We did like doing prototypes for like a week. [59:18] Yeah, shift have felt really nice. And then one of the [59:21] the now current plan mode iteration, use a shift tab, because it's actually just like another way to tell the model how agentic it should be. And so I think as... [59:33] as more features use the same interaction, you form like a stronger mental model for what should go where. Yeah. Like thinking, I think, is another really good one. Like first we were like, before we released quad code, or maybe it was like the first thinking model was like 3.7. I forget what the first one was. But yeah, and it was like, it was able to think and we're like brainstorming, like how do we like toggle thinking? And then someone was just like, what if you just like ask the model to think in natural language and it knows how to think? And we're like, okay, sweet. Let's do that. [1:00:03] And so we did that for a while, and then we realized that people were accidentally toggling it. So they were like, don't think. And then the model's like, oh, I should think. They just started thinking. And so we had to kind of like... [1:00:14] tune it out so you know don't think didn't trigger it but then it still wasn't obvious but then we made a ux improvement to like highlight the yeah i saw that yeah and i was like that was so fun and it felt really magical when you do ultra think it's like rainbow or yeah and then with uh with sonnet 4.5 we actually find like a really really big performance improvement when you turn on extended thinking um and so uh we made it really easy to toggle it because sometimes you want it sometimes you don't because you you kind of for a really simple task you don't
[1:00:44] for like five minutes. You wanted to just do the thing. And so we used tab as the interaction to toggle it and then we unshipped a bunch of the thinking words. [1:00:51] Although I think we kept UltraThink just for like sentimental reasons. It was such a cool UX. [1:00:57] Interesting. Do you think there's some... [1:01:01] Thank you. [1:01:02] There's some new metric that's about what you deleted. And I think programmers have always felt like, you know, deleting a bunch of code feels really good. But there's something about... [1:01:11] Because you can build stuff so fast, it becomes more important to like also delete stuff. [1:01:18] I think my favorite kind of diff to see is a red diff. This is the best. Whenever I say one, I'm like, yeah, bring it on, another one, another one. But it's hard because anything you ship, people are using it, and so you've got to keep people happy. And so I think generally our principle is if we unship something, we need to ship something even better that people can take advantage of, that kind of matches that intent. [1:01:40] uh even better um and yeah i think this is kind of back to like how do you measure like quad code and the impact of it and this is something like every company every customer asks us about and i think like in so in turn away at anthropic i think we like doubled in size since january or something like that but then productivity per engineer has increased like almost 70 percent in that time [1:01:59] measured by I think we actually measured it in a few ways but PRs are the simplest one and the main one [1:02:06] But like you said, this doesn't capture the full extent of it. Because a lot of this is making it easier to prototype, making it easier to try new things, making it easier to... These things that you never would have tried because they're way below the cut line. You're launching a feature and there's this kind of wish list of stuff. Now you just do all of it because it's so easy. You just wouldn't have done it. So yeah, it's really hard to talk about it. And then there's this flip side of it where more code is written, so you have to delete more code. You have to code review more carefully and automate code review as much as you can.
[1:02:35] There's also like an interesting like new product management challenge because you can ship so much that you end up [1:02:41] It ends up not feeling as cohesive because you could just like add button here and like a tab there and like a little thing here. Like it's just it's much easier to build a product that has all the features you want, but doesn't have any sort of organizing principle because you're just shipping lots of stuff all the time. [1:02:56] I think we try to be pretty disappointed about this and making sure that [1:03:00] all the abstractions are really easy to understand for someone, even if they just hear the name of the feature. We have this principle that I believe Boris brought to the team that I really like where we don't want a new user experience. Everything should be so intuitive that you just drop in and it just works. And I think that's really set the bar really high for making sure every feature is really intuitive. How do you do that with a conversational UI? [1:03:30] and knobs and it's just a blank text box to start. How do you think about making it intuitive? [1:03:36] There's a lot of like little things that we do like we teach people that they can [1:03:41] use the question mark to see tips. We show tips as Cloud Code is working. We have the change log on the side. We tell you about, oh, there's a new model that's out. Or we show you at the bottom, we have a notification section for thinking. I think there's just subtle ways in which we tell users about features. I think the other thing that's really important is to just make sure that all the primitives are very clearly defined.
[1:04:09] Like hooks have a common meaning in the developer ecosystem. Plugins have a very common meaning in the developer ecosystem and just making sure that what we build matches what like. [1:04:20] the average developer would immediately think of when they hear that. There's also this progressive disclosure thing. [1:04:27] Like, you know, to any, anytime in quad code, when you run it, you can hit control O to see like the full raw transcript, the same thing the model sees. And we don't like show you this until it's actually relevant. So when there's a tool result that's collapsed, then we'll say use control O to see it. So we kind of, we don't want to put too much complexity on you at the start because this thing can do, you know, anything. [1:04:48] I think there's this other kind of new principle, which we've just started exploring, which is like the model teaches you how to use the thing. [1:04:57] of knows to look up its own documentation to tell you about it. But we can also go even deeper. Like, for example, slash commands are a thing that people can use. [1:05:05] but also the model can call /commands and maybe you see the model calling it. [1:05:08] And then you'll be like, oh, yeah, I guess I can do that, too. Yeah, yeah, yeah, yeah. [1:05:11] Interesting. How has it changed when you first started doing this? [1:05:16] codcode was this sort of like singular thing the singular way of thinking about you know using ai through a cli other people had stuff like this but it felt like this shift and now there's a whole landscape of everyone is like going cli cli cli like how has that changed how you think about building how it feels to build and how are you dealing with the sort of pressure of the race that you're in i think for for me like imitation is the greatest flattery um so it's like you know
[1:05:46] is building inspired by this. And I think this is ultimately the goal is to inspire people to build this next thing for this just incredible technology that's coming. And that's just really exciting. Personally, I don't really use a lot of other tools. So usually when something new comes out, I'll maybe just try it to get a vibe. [1:06:04] But otherwise, I think we're pretty focused on just solving problems that we have and our customers have and kind of building the next thing. [1:06:12] Cool. [1:06:15] Sweet. [1:06:16] Um... [1:06:17] I love this part of the interview too. Did we answer all of your team's questions? Oh, did we get through all my team's questions? Let's see. [1:06:26] I think we did. [1:06:29] Uh. [1:06:32] I'm curious also how you would answer the unshipping question. [1:06:35] Because also if you're doing this kind of like AI driven development, you ship a lot. You have a small team, so it's a lot of operational load. The reason I ask that is because I don't think we do a good job of that. [1:06:45] Um... [1:06:47] And... [1:06:49] I have this feeling that some of the products [1:06:52] are like a little bit messy because of that. [1:06:55] And I think particularly for Korra, [1:06:59] Um... [1:07:01] There's just a big product surface area and it can do a lot of different things like it. We have an email assistant so you can ask it like, you know, tell me about the trip I'm taking and it'll go through all your emails and, you know, summarize the trip. Or we have this feature that it automatically archives any email that you don't need to respond to immediately. And then twice a day you get a brief that summarizes all the stuff that you probably need to see, but you don't need to like actually do anything with and you just scroll through it and you're done.
[1:07:30] And there's just like all this [1:07:33] There's all this complexity that around, you know, for example, [1:07:36] how are emails categorized? So now we have a whole view of, we have all these categorization rules and you can order them and whatever, but like, [1:07:44] it's just complicated and hard to communicate. And, and, uh, and I want to retain a lot of the [1:07:52] like all the power and flexibility, but also you can't look at a screen and be like, I have no idea what's going on. This is like way too complicated. So... [1:07:59] That's, I'm just like... [1:08:00] I'm processing all that stuff. So the kind of like deletion, you know, on an unshipping idea feels like an interesting. [1:08:09] cultural principle that we haven't really explored. [1:08:13] Yeah, it's really hard. I think there's a social cost to it too, where you kind of want to be the person who tells your coworker to unship their thing. It's definitely tricky. It's more than just the code. I definitely run this at Instagram, honestly. Because I think Facebook does a terrible job at unshipping. And we had this problem where every time we—I think even unshipping pokes was like, [1:08:35] really spicy because there's a bunch of these old timers. They're like, no, Pokes, you're never going to take it away. But if you look at the data, no one really uses it anymore. [1:08:43] But for sentimental reasons, they were kind of tied to it. And so, like, for Facebook, it always, maybe nothing ever got unshipped. It always got moved to, like, a secondary place, like, you know, like an overflow menu somewhere that no one looks at, like a graveyard. And I think Instagram was just very principled. There was, like, you know, very strong in product and design point of view. Those, like, if this thing isn't used by, like, half of people, you know, 50% of WoW or whatever, we're just going to delete it and deal with it. And then we'll figure out some next thing that's used by more people.
[1:09:11] I love it. Well, thank you. This is amazing. I'm really glad I got to talk to you and keep building. [1:09:17] Thank you for having us. Yeah, thanks. [1:09:19] Oh my gosh, folks. You absolutely positively have to smash that like button and subscribe to AI&I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard. But instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. [1:09:49] you on the edge of your seat. [1:09:51] craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life. And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.
Want to learn more?