I started writing this as a LinkedIn post but it got a little too long so I decided to publish it here instead.
Imagine this: Your newly hired engineer implements a new feature. The feature works but now a bunch of tests are failing.
When you ask about it, the engineer says: “Oh yeah I saw the failing tests. The feature is working though so we can leave them for now, sound good? ☺️”
What would you do?
This is exactly what Kiro.dev did:
Read on for my full experience using Amazon’s latest Agentic Coding tool.
What I set out to build
Amazon released Kiro.dev a couple of weeks ago and I was lucky enough to get in early on the public beta to try it out. Amazon describes Kiro as such:
Kiro helps you do your best work by bringing structure to AI coding with spec-driven development.
Wonderful! Just what Vibe Coding has been missing!
With my hands on the public beta, I needed an idea that was both useful and achievable within a few days — and I had just the thing: a CLI tool to autonomously create Substack drafts of my weekly special articles.
In fact, the latest edition was posted this morning and I used this tool to draft it.
At a high-level, here’s what the tool needs to do:
Call the OpenAI API with a custom prompt I crafted with the help of the o3 reasoning model
The target model should be o4-mini (faster and cheaper via the API) with Web Search tool enabled
Once the response comes back, the tool automates a web browser (Chrome in my case) and creates a new draft in my substack account (but doesn’t post it)
Finally, the tool outputs the draft link to the terminal so I can visit at a later time to tweak the format, content and finally post it.
It was time to get my hands dirty.
Specs
When you start Kiro, you’ll notice you have two modes: “Vibe” and “Spec”. I went with “Spec” since that’s what I was interested in testing.
In Spec mode, you’re encouraged to describe your features in natural language and let Kiro guide you through 3 distinct phases: Requirements, Design and Tasks.
Requirements
After a few turns in the chat, Kiro created a comprehensise requirements doc that I could review, adjust and validate:
I was quite impressed by how detailed it was, and made only minor adjustments. Once you’re happy with it, you’re ready to move to the design phase.
Design
The Design document reminds me of many Architecture Decision Records (ADRs) I’ve created and reviewed over the years. It’s very detailed (I’m only showing a snippet above) and talks to things like application interfaces, testing strategies and configuration management.
Again, after minor tweaks, I was ready for the tasks phase.
Tasks
Finally, we have tasks. It’s not too far fetched to think that these could easily end up in JIRA, Trello or Linear:
I particularly like how each task has references to which requirements they relate to. It definitely helps Kiro keep track of things but as an engineer, I appreciate the clarity and added context.
With these key phases out of the way, it was time to unleash Kiro in autopilot mode.
The Good
Overall I was impressed by Kiro. It really does put an emphasis on good software engineering principles with the idea that you should be able to publish what you built to a production environment. Overall, it worked:
Key highlights:
Clarifying requirements and design choices before jumping into implementation ensures you and Kiro are on the same page. This also forms the basis of guardrails you can get Kiro to follow
The tasks lists is update as Kiro works through each task
The autopilot flow feels in line with what I would have done myself: constantly running/writing tests as it updates/implements features
Each time Kiro needs to run a command, it first asks for permission. You can then trust the command so you can be more hands off. The cool thing is that you can trust the full command or just a wildcard version. This is important as I’d feel comfortable trusting `
npm test *
` but I would not trust `npm *
`.When the chat history grows too large, it automatically offers to summarise it and start a new session. For this small application, I didn’t notice any significant loss in context.
The Less Good
I mean, this is a simple tool. Does it need 675 tests? Absolutely not! If I had implemented this all myself, I would have focused on the highest value tests only. But comparing this to what normally happens with vibe coded apps (i.e.: no tests), it’s still a win. In addition:
Being a public beta, I ran into tons of rate limiting issues with the provided model (Claude Sonnet 4). And no, Kiro doesn’t yet support a Bring Your Own Model (BYOM) mode. This meant the experiment took way longer than it should have.
Refactoring is an area that needs improvement. While Kiro writes tests as you implement features, refactoring oftentimes would leave a bunch of failed tests (which, as shown in the beginning of this article, Kiro would refuse to fix!)
Instruction following is hit and miss. On one of my reafactoring tasks, I asked Kiro to rename a few keywords across the documentation, source code and test files. It did that, but also decided to change my API implementation from OpenAI `
/responses
` API to the older `/chat/completions
` API without I ever explicitly telling it to do so.Similarly, as shown earlier, it can be a bit lazy about fixing the tests. I found that appealing to team spirit (e.g.: I can’t check this in while broken! It’ll impact the team!) did the trick.
Conclusion
Kiro is a major step in the right direction. Spec driven development can gradually build the context needed to guide the models into doing the right thing more often than not.
It’s also worth noting that Kiro isn’t the only tool doing this. Kilocode has a similar focus on the different phases of development and, best of all, is entirely open-source. I recommend you check it out for a different take.
If you’ve stayed with me until now, it’s only fair I share the code that Kiro built for me. Enjoy!