Kubernetes Podcast from Google: Episode 230 - Observability & Engineering Management, with Charity Majors

#230 July 9, 2024

Observability & Engineering Management, with Charity Majors

Hosts: Abdel Sghiouar, Kaslin Fields

Charity Majors is the co-founder and CTO of honeycomb.io. She pioneered the concept of modern Observability, drawing on her years of experience building and managing massive distributed systems at Parse (acquired by Facebook), then subsequently at Facebook, and at Linden Lab building Second Life. She is the co-author of Observability Engineering and Database Reliability Engineering (O’Reilly). She loves free speech, free software and single malt scotch.

Do you have something cool to share? Some questions? Let us know:

News of the week

Links from the interview

Honeycomb.io
O’Reilly Book: Observability Engineering
O’Reilly Book: Database Reliability Engineering
Charity’s blog site: charity.wtf
Charity Blog: Questionable Advice: “My boss says we don’t need any engineering managers. Is he right?”
Daniel H. Pink book: “Drive: The Surprising Truth About What Motivates Us”
- In which, “He examines the three elements of true motivation—autonomy, mastery, and purpose-and offers smart and surprising techniques for putting these into action in a unique book that will change how we think and transform how we live.”
Charity blog on Stack Overflow: “Generative AI is not going to build your engineering team for you”
- In which she talks about how the tech industry is an apprenticeship industry.
Charity Majors in the Google Cloud Next 2024 Developer Keynote
honeycomb.io blog: “How Time Series Databases Work—And Where They Don’t” by Alex Vondrak
honeycomb.io blog: “Why Observability Requires a Distributed Column Store” by Alex Vondrak

Links from the post-interview chat

Transcript

Show full transcript

KASLIN FIELDS: Hello and welcome to the Kubernetes Podcast from Google. I'm your host, Kaslin Fields.

ABDEL SGHIOUAR: And I am Abdel Sghiouar.

[MUSIC PLAYING]

KASLIN FIELDS: This week, we talk with Charity Majors. Charity is the co-founder and CTO of honeycomb.io. She pioneered the concept of modern observability. We dive into observability, engineering management, and blogging. But first, let's get to the news.

ABDEL SGHIOUAR: Vitess is a CNCF graduated project for deploying, scaling, and managing large clusters of database instances. On June 27, the community announced the release of Vitess 20, along with version 2.13 of the Kubernetes operator. Version 20 focuses on usability and maturity of existing features and continues to build on the solid foundation of scalability and performance established in previous versions.

KASLIN FIELDS: On June 20, Anthropic released Claude 3.5 Sonnet, their first release in the Claude 3.5 model family. The model costs $3 per million input tokens and $15 per million output tokens with a 200,000-token context window.

ABDEL SGHIOUAR: The call for proposals for KubeCon and CloudNativeCon India is now open until August 25. This is the first time KubeCon is coming to India. The event will be held in Delhi on December 11 and 12. If you are interested in speaking, make sure you get those submissions in.

KASLIN FIELDS: Azure announced support of the Open Container Initiative, or OCI, version 1.1.0 specification in Azure Container Registry. This enables ACR users to establish relationships between container images and artifacts; package, store, and distribute generic non-container artifacts such as OCI artifacts in ACR; and to discover and query artifact relationships.

ABDEL SGHIOUAR: VMware Tanzu announced Version 2.7 Greenplum. This release, announced on June 26, is designed to elevate data analytics and AI-driven initiatives. 2.7 introduced a variety of performance enhancements, optimization capabilities, ecosystem extensions, and more. The Tanzu team also announced a public beta for GenAI on Tanzu platform for Cloud Foundry.

KASLIN FIELDS: On June 20, Adobe published an end user journey report showcasing their contributions to the Cloud Native Computing Foundation, or CNCF, ecosystem. The report highlights their involvement in over 46 projects since 2015 and how this work has strengthened their engineering teams. And that's the news.

[MUSIC PLAYING]

I am honored and absolutely delighted today to be speaking with Charity Majors. Charity is the Co-Founder and CTO of honeycomb.io. She pioneered the concept of modern observability, drawing on her years of experience building and managing massive distributed systems at Parse, which was acquired by Facebook, then, subsequently, at Facebook and at Linden Lab building "Second Life."

She is the co-author of "Observability Engineering and Database Reliability Engineering" by O'Reilly. And she loves free speech, free software, and single-malt scotch. Welcome to the show, Charity.

CHARITY MAJORS: Thank you for having me. That was a mouthful.

KASLIN FIELDS: Yeah.

CHARITY MAJORS: I've ever heard it read out loud. I'm like, oof, wow. It should probably be shorter.

KASLIN FIELDS: Right? That's an adventure with bios. It's not like I have to read the whole bio, but I kind of want to because it's kind of awesome.

[LAUGHTER]

CHARITY MAJORS: Aww, thank you.

KASLIN FIELDS: So I am so excited to have you on the show today. I've been a big fan for a while, following you on social media. And something that I particularly love that you do are your blog posts. And a lot of your blogs focus on topics of both observability, which, obviously, you do a lot in that space--

CHARITY MAJORS: Do a lot, yeah.

KASLIN FIELDS: --and also engineering management. So today, I want to talk about both of those a bit. And I think we'll start with engineering management.

CHARITY MAJORS: Cool.

KASLIN FIELDS: So I know a lot of folks out there are probably like, I don't want to become a manager. I don't care about engineering management advice. But my philosophy on management-related content has always been, whether I go a management path or not, I'm always going to have a manager. So I hope everyone out there can find some benefit in learning a little bit about engineering management from you today, because I certainly do every time I read your stuff.

CHARITY MAJORS: I feel like management has this-- it comes with this baggage. Like, we have all these associations with management that come with hierarchy and being told what to do and being managed. Like, it's kind of a big word, right? And a lot of our assumptions come from the industrial days, where it was like, the eight-hour workday comes in the industrial time. And a lot of these concepts just map weirdly onto modern stuff.

I think that everyone should care about management, because if you care about engineering, you care about doing it well. And I think management is not the same as leadership because leadership is something that, as you become more senior in your career, everyone needs to be a leader. It is so tied up with who you are as a person. The kind of leader you become is so entwined with, who do you want to be when you grow up?

And I feel like approaching this from a more holistic sense of like, how do we do this really well? And then, like, what are the roles that need to happen? I think a lot of people never saw them being-- themselves as managers, but then they care about these things, and then a moment comes up when, oops, they're the best person for the job, or oops, they're the only person for the job. Somebody's got to do it.

As someone with a pretty anarchic streak-- I have never had any respect for my managers. I feel like anyone above me in the org chart is just somebody I get to beat up on. But it's taken me a while to come around to an idea of this where it's like, it's not about authority. It's not being told what to do. It's about, how do systems work?

And if you look at systems theory, hierarchy emerges in nature everywhere where you have self-organizing systems. And in a hierarchy, you look down for function and up for meaning. And it's the only way that you can scale an organization, because you can't have everyone talking to everyone. It just doesn't work.

And when you think of modularizing your code, you think about how much more efficient it is and how high-bandwidth your connection is to other stuff inside the modules, well, you have this abstraction. And I think that fits really well when you're talking about how teams function. You have a team with a mission and a charter, and your manager is the nervous system that connects you to other teams and charters.

So I feel like with a creative profession like engineering, it's actually really unhealthy to put a lot of hierarchy and authority onto this role when it's not one. You can envision the org chart as flipped over. It's more like, if you're a manager, it's your job to support. I've just been talking, talking. I can keep talking about this without stopping, so I'll pause for breath. TL;DR, you're right. Everyone should care about management, whether or not they are one or plan on being one.

KASLIN FIELDS: Exactly. I feel strongly about this as well, so we can really go in on this. But one thing that you said there that I really want to call out as an easy tidbit for folks to take away today, you look down for function and you look up for meaning. I think that'll resonate with a lot of people.

CHARITY MAJORS: Isn't that marvelous? I came to this topic when a friend of mine had this terrible experience as an engineering leader. He was hired as a VP to join this company and he had, like, 30 engineers under him and no managers. And his boss was like, well, we don't need managers. We're a startup. Everyone can report to you. Nobody needs-- you know? And he was just like, oh! And it was this existential crisis. Like, how do you-- about something that's so fundamental?

And that's why I wrote that blog post about, why do people need managers? And there's a really complicated answer that goes into systems theory and all this stuff. And then there's the really simple answer, which is that everybody uses engineering managers. It's a pattern that we figured out that works. You can hire people who know what to do with managers. You can read books and training.

It's like, why would you reinvent something this profound and complicated if you don't have to? If it's not your core function as a company, you should use boring technology and boring social systems.

KASLIN FIELDS: I think the concept of looking up for meaning also says a lot about how individual contributors can work with their managers. I think what you're saying there, too, is managers are-- they should be a support system for the engineers who are doing a lot of the functional work.

And so a lot of the job of management should be making sure that the engineers have the right opportunities to grow and also-- but also connecting that to the business, that you're working on the right things that are going to help the business grow. So I think there's a lot in that both for managers, but also for individual reports, of how do you manage up?

CHARITY MAJORS: Yeah. When I was young, it wasn't cool to care about the business. We were all like, ooh, we care about just the technology. But what is it that Daniel Pink said, that "What we all want out of our work is autonomy, mastery, and meaning." And what does your work mean if people don't use it? What does your work mean if it's not helping the company grow? And I feel like this is something that we're growing up as an industry and realizing that, oh, it's really cool to care about the business.

KASLIN FIELDS: Yeah. And I think that touches on a topic that I totally did not expect to go into today, but I think finding meaning in your work in some way, shape, or form-- hopefully, your managers can help you do that. But if you're not doing that, I think that just leads directly to burnout. So it's very important.

CHARITY MAJORS: Oh, yeah. Oh, my god, the times in my life that I have gotten close to burnout have not been the times when I was working hardest. They have been some of the times when I was working least. But--

KASLIN FIELDS: Yes.

CHARITY MAJORS: --what unified those experiences was that I didn't feel like what I was working on really made a difference. Or it shouldn't need to be done. Or something like-- or people were like, oh, you're amazing. And I'm like, I'm working an hour a day, this is not OK. Those are the times when I was like, is this industry for me?

I mean, the thing about working intensely is that some of the best times in my life at work have been some of the hardest times where I've worked the most. But it's because I was so invested in what was going on. And I feel like the thing about burnout-- and this goes back to hierarchy and authority-- is if the motivation is coming from someone else telling you what you have to do, it's burnout city.

But if it's coming from you and you're like, I'm invested, this is part of me, I care about this, then it's such a good sign. Yes, you need to pace yourself. Yes, you need to take breaks. But you're on fire, you know?

KASLIN FIELDS: And that is such a core concept that impacts everyone in engineering positions today. So maybe that is a good place to ask for advice on. If there are folks out there in the audience today who are engineering managers or ICs, what advice do you have for, perhaps, engineering managers and helping folks find a way to feel some ownership or feel more invested in their work?

CHARITY MAJORS: It's hard to give this advice kind of unilaterally because it changes so much. The first five to seven years of your career, you're just going to be doing what other people tell you a lot, and that's OK. They have your best interests at heart. It takes that long to forge a grown-up software engineer. We're an apprenticeship industry. But after that, yeah, you need to care about what you're doing.

I guess the biggest piece of advice that I would give all technical people is the role of management is changing really fast, and the field of engineering is changing really fast. And I think that it's not a good thing to get too attached to your identity as a manager or an engineer, because I think the strongest engineering managers are never more than four or five years away from writing code themselves. They need to go back to the well and freshen up those skills.

And I think the strongest super senior ICs that I've met have all been ones who spent time in management, because so many of these skills are sociotechnical. They're grounded in technology, but they build on top of that with people and human and organization and business skills. Thinking of yourself as a technologist is, I think, the key to a really long and fulfilling career.

I see too many managers who, like-- especially when you're young and you get "promoted," quote, unquote, to manager, and it's so flattering because you spent your whole life looking up to these people and feeling like you're being told what to do, and finally, it's your turn to tell people what to do, and then [MUTED] it, it doesn't actually work that way.

[LAUGHTER]

But they become managers really early and they get really attached to that. And the industry changes. It's really hard to have a 10-, 20-year career as an engineering manager without becoming one of those people who is just so detached from what's going on that you can't do your best for people and that they don't respect you as much as they would someone who really lives and breathes and walks their problems.

So yeah, that's my best advice. Think of yourself as a technologist. Think of your career as a long one. The best opportunities in tech have always come opportunistically for me. They're rarely the ones who I'm like, I want to be that in 10 years.

KASLIN FIELDS: Yeah.

CHARITY MAJORS: You just try to preserve optionality. You try and put yourself in interesting places with interesting people and who knows what's going to happen?

KASLIN FIELDS: I love that. And maybe that's a good transition, then, into talking about observability. So you've been involved in the whole concept of observability as it's evolved in the modern era. I bet that was one of those opportunistic moments for you, wasn't it?

CHARITY MAJORS: I did. And I was laughing because I kind of specialize in rage-driven development. I mean, I hate databases. And I've always ended up being the one running the databases. I hate monitoring, I hate graphs, I hate all this stuff. [LAUGHS] And somehow, these are the two books I've ended up writing, is databases and observability.

But hate is just another form of passion. It's like, apathy is, I guess, the opposite of love or hate. Love and hate are very tightly entwined. But yeah, observability wasn't really on the scene when Christine and I started Honeycomb in 2016. And in fact, we borrowed the term from control systems theory, where the definition of observability is, how can you understand the inner workings of a system just by observing its outputs?

And we read that and we were like, ah, classic light bulb moment. Just like, oh, my god, this is what we're trying to do! And it's been interesting. For the first few years, we were like, OK, here's observability, and you have to have high cardinality and high dimensionality and explorability and wide events and all this stuff.

And then around 2019, 2020, it started getting some attention. Then, all of a sudden, everybody in the world is like, we do observability, too! And so now, all of your monitoring and logging and tracing and APM and rum, everybody's like, we do observability, which threw us for a loop. These days, I guess I like to think of observability as a property of complex systems, just like maintainability, reliability, performance, which puts the emphasis on the system rather than the tools. And I really like that.

But it leaves-- it begs a big question, which is that there's a really big step function in different power, ease of use, cost, everything between what I think of as the observability 1.0 world and the 2.0 world. And 1.0 world is one where you've got three pillars. You've got metrics, logs, and traces.

And you're probably paying to store your data in a RUM tool, in a profiling tool, in a logging tool, in a metrics tool, in a dashboarding tool, in all these different tools. And you, the engineer, sits in the middle and what are you doing? Well, you're eyeballing shapes on graphs and going, that's probably the same as that. Or you're copy-pasting IDs around. You're like, well, this log line, I hope I captured this trace, and you paste it in.

And it's expensive. It's unwieldy. People can get really good at it, but I feel like this model has come to the end of its natural lifecycle. Observability 2.0 tools are ones where you have a single source of truth, arbitrarily wide structured data blobs with no indexing, no schemas, no having to predefine metrics. It's just anything, any data type you might want, you just throw it in.

And you emit these wide structured log events, either one per request per service or one per span. And then you put them in a backend that lets you explore. So it's a lot more like a business intelligence tool, where you can slice and dice, and you can zoom in, and you can zoom out, and you can-- because data is made valuable by context.

And the problem with the 1.0 world is that all your context is scattered around and you can't connect one piece to another. And in the 2.0 world, it's all connected, so it's just a data type. You don't have to worry about, oh, is this tagged, too much cardinality? Oh, is this number a counter or-- you just throw it in and then you just use it just as a data struct.

And it's so simple. What are we supposed to do for breaking changes? The semantic versioning says, oh, you should only do a breaking change when it's like a backwards-incompatible breaking change. And I think the one source of truth versus many sources of truth is definitely a breaking change.

But it means that your cost doesn't go up 7x your request traffic. It means your cost goes up as you get more value out of your tool. That's one of the biggest problems today, is as costs go up, the value that people are getting out of the tools is going down. [LAUGHS]

So it's changed a lot. But I think what we're starting to see a movement picking up steam where it's like, OK, the metric has been the source of truth that we've been building tools on for 40 years now. It was built for a world when hardware was incredibly expensive and there wasn't a lot of data. And the metric is a powerful tool for summarizing vast quantities of data. It doesn't help you explore your systems. It doesn't help you understand your systems or your code at all.

KASLIN FIELDS: I love the idea of having the system define what observability means. I don't know if that's a good way of saying it, but I'm actually thinking of this in terms of systems that are not technology systems. The first things that came to mind, actually, for me were, like, banking, honestly, trying to understand the different accounts that I have, and how much is in these accounts, and how are they invested, and all of those kinds of things.

That is something that I really want observability on. I'm actually working on a project right now to create an app to-- I hadn't thought about it this way, but to implement observability in my closet, because I have too many clothes and I don't know what clothes I have. And in both of these real-world cases, what I'm trying to do is answer questions about a system. And that's what you're trying to do here, too, in the technology observability sense.

CHARITY MAJORS: Yeah, absolutely. There are so many sociotechnical ripple effects and ramifications that come from this one small thing. Is it one source of truth or many? But one of my favorites is that, historically, in the observability 1.0 world, the debuggers of last resort, the people who understand the system best, are always the people who have been there the longest. They've been through the most outages. They remember writing the most code.

And it's depressing, I think, to join a team and know you can never catch up with the people who know it best. And that's just not true in an observability 2.0 world. In the observability 2.0 world, you don't have to make all these intuitive leaps and guesses and memories of past outages. The data is in the tool. The context is there. You can ask the questions you want to know.

And so the best debuggers are the people who are the most curious, the people who look at their changes every day, the people who are used to using the tools and used to taking a couple of minutes at the end of their lunch break and teasing apart what's going on. Honestly-- so I am not an early adopter. When I was at Facebook, I was firmly in the grumpy old man lane, because I knew how to use r/nagios and Ganglia tools. Like, I was a whiz at them.

And it wasn't until a couple of our engineers started feeding some data sets into a tool-- they're called Scuba. And the whippersnappers started debugging problems faster than I could. I didn't think that was possible. I'm like, how are you doing this? I was professionally humiliated, you know?

But it's really exciting. It's really exciting when people who are new to the codebase can pick it up and see things and understand things that even the people who have been working on it for 10 years couldn't see or understand using the old tools.

KASLIN FIELDS: I don't know how you phrased it earlier, but you had a short little tidbit about context really being the value in observability.

CHARITY MAJORS: Data is made valuable by context. A data island is not valuable. The more connected it is to other things, the more valuable it is.

KASLIN FIELDS: That, I feel like, is really at the core of all of this. So it's not just about having all of the data, like you were saying. The problem with traditional observability tools is that you have all the data, and the data just keeps getting bigger and bigger and bigger, but you can't understand it because you don't have the context. So what you're aiming for--

CHARITY MAJORS: Exactly.

KASLIN FIELDS: --is a way to have all of the data with its context so that you can answer questions more easily and--

CHARITY MAJORS: Exactly.

KASLIN FIELDS: --develop that understanding even if you don't have it innately.

CHARITY MAJORS: Most of our tools are built in the metric, and the metric is literally a number. All of its context was stripped away at right time. You can never get it back. That's why you're over here eyeballing the shape on this dashboard and that shape on that dashboard going, they're probably the same ones. Nothing connects them.

You have no way of answering that. You threw away that data when you gathered it. You're like, this metric is over here. That metric is over there. Never the twain shall meet. You can only guess. And it's just wild to me that it's 2024 and we're still storing all this incredibly expensive data that doesn't have any context.

KASLIN FIELDS: Yes. I actually was just having a conversation with some engineers who work on Kubernetes-- open source mainly-- about, how do you do that? Kubernetes logs are at so many different levels because we've got the application, you've got the container, you've got the system of Kubernetes itself.

And so you have all of this complexity and you have logs at the different levels. And if you have something that goes wrong and you get the error at the top level, it often doesn't tell you anything. And you have to dive down into the deeper levels to find it. So having that connection of, this is where that came from, and having more context, is how you can actually debug things.

CHARITY MAJORS: Exactly.

KASLIN FIELDS: Cool. So this is what you're aiming for with Honeycomb. Can you tell me a little bit about how Honeycomb addresses this challenge?

CHARITY MAJORS: Yeah. Honeycomb was built to spec for what we call observability 2.0. We're big fans of open telemetry. But honestly, we're agnostic about how the data gets bundled and shows up. When it shows it up in our APIs, we accept structured data. And we recommend that people-- I mean, tracing is-- I think tracing has gotten a bad rap because the first few generations of tracing tools were so hostile to users, so let's put it that way.

Every place I know of that rolled out a tracing tool, two or three years later, you don't have a team where every engineer uses the tracing tool. You have a tool where the people who set up the tracing tool use the tracing tool. And the engineers who traces go to those people and ask them for help when they need a trace. They're just absurdly walled off and difficult.

But Honeycomb accepts arbitrarily wide structured data blobs, and we store the raw events. In a 1.0 world, you did your aggregation at right time. You made your decisions about which questions you were going to be able to ask when you wrote the data out, and you can't really ask new ones. With Honeycomb, we store these raw wide events, and then we do the aggregation at read time. So you can ask anything. You can be like-- and not only that, but you can ask the machine to do it for you.

So one of my favorite things is called BubbleUp, which is just this thing-- because we've stored all these raw events, any graph, any dashboard, any SLO that you have, if you see something, you're like, huh, I wonder what that is, you draw a little bubble around it, and we compute for all the dimensions inside the bubble versus the baseline outside the bubble and sort and lift them. So you're like, what's this spike?

And then immediately, you can see, oh, inside the spike are-- they're only events going from Android devices using this version from this language-- using this language pack, using this build ID, using this application ID, using these feature flags. Because so much of debugging is, here's the thing I care about. Why? Why do I care about it? What's different about this thing?

And this is where I feel like the ways we've debugged in the past have relied on you knowing what the answer is before you can ask the question. It's very search-first. You have to know what string to search for, what metrics to look for. You have to know what you're looking for before you can find it, which is just so ass-backwards. But anyone can go, I care about that.

Why? And when it comes to debugging, the debugging workflow should usually be either, here's my SLO, it's burning down faster than it should. So tell me, what's different about these events that are violating my SLO versus the baseline? Oh, now I immediately know what to go fix. That's scenario 1.

And scenario 2 is-- see, another thing about 1.0 versus 2.0 is that observability 1.0 was very much about debugging, fixing problems, operating your code. And 2.0 is like the substrate of observability-driven development.

It's like, it's what your development is based on. It's what allows you to form these really tight feedback loops where you're instrumenting your code as you go, and you deploy it and it's there, and you're like, OK, looking at the lens of the instrumentation I just wrote, what's different? Is it doing what I expected it to do? Does anything else look weird? You don't know if anything's wrong or not. But that point right there is your best chance to ever find a problem.

KASLIN FIELDS: And that's kind of what you did in the Google Cloud Next keynote. You gave an example of this. And I think, also, on social media-- I've seen you post a few times, usually a retweet of someone who's like, hey, this happened. And it was observability-driven development. Those are very interesting situations.

CHARITY MAJORS: Yeah. Because the cost of finding and fixing bugs goes up exponentially from the moment that they're written. If you don't find it shortly after you've deployed it, it's probably not going to be you who finds the problem. It's going to be a user or a customer or another engineer at some point down the line, months, years, never.

And so finding that feedback loop and making it fast, it just-- and it's what makes engineering fun, is having this constant conversation with your code, just being able to move fast with so much confidence. The whole debate about, should we deploy on Fridays or not, that speaks of so much fear to me, just this fear of, so often, we deploy things and we have no idea if they work or not. We don't know what's going to pages. We are not confident that this worked.

And of course you aren't confident, because all they have are aggregates and random exemplars. They can't actually see what they deployed and how it's different from before they deployed. Of course they're not confident. So I think it's really just like-- it's a sea change in how we think about building and shipping our code.

KASLIN FIELDS: So it feels like this observability 2.0 concept is really about being an engineer who is just fascinated with your code, and you're excited to learn about how it works when it's out there in the field. And if you have the ability, the tooling, where you can do something and then just go and ask questions about it, I feel like everyone's asking those questions, but you usually can't see the answers.

CHARITY MAJORS: Exactly. Not only that, but you're used to getting punished for your curiosity. I remember when we put a new engineer on call and they start out curious, they're like, what is this? What is this? And you're just like, oh, grasshopper, don't pick up that rock.

[LAUGHTER]

It's going to take you on a long and winding road. It's going to be five days later, you're not going to have gotten anything done. Just don't even look at it, you know? Well, that sucks.

KASLIN FIELDS: And you're going to end up in all of these weird places and it's just going to be pain and suffering.

CHARITY MAJORS: Because it's so true that right now, everyone's distributed systems are broken in so many ways. Like, so many ways. One of the most entertaining things is like, we'll often do these POCs with prospects. And without fail, we'll be pairing with them, writing instrumentation or something. And our sales engineers will start going, well, what's that? Are you about-- is something wrong over here? Five minutes later, someone gets paged and they're like, how did you do that?

Or sometimes, they'll be like-- they're like, ah, we have to stop and fix all these things. And our sales engineers would just be like, oh, no, we're never going to get through the POC if we don't just get through this. They've been broken for a long time, trust me. It's just that you never had the ability to see it. And it's so satisfying. It's so satisfying, and it's so nice to feel confident about your work.

KASLIN FIELDS: This is making me very curious how databases factor into your book, and I need a book club to tell me, go read that book, because I actually have a meeting later today about databases. So I feel like this is going to come back into that.

CHARITY MAJORS: So I'll send you a couple links after this, because someone who worked for us briefly wrote two beautiful-- like, my mom and dad would understand these pieces, I think-- just essays about database internals and how they roll up to observability. And they're just such fun reads. You'll love them.

KASLIN FIELDS: I'm very interested. So we'll make sure to include those in the show notes.

CHARITY MAJORS: Yes.

KASLIN FIELDS: I look forward to it. And so I think we should start wrapping things up. This is so fascinating, and I feel like it's just opening a door. Observability, I'm realizing, is what I want from so many of the systems in my life.

CHARITY MAJORS: Yes!

KASLIN FIELDS: So I just want to go learn about observability-- this 2.0 concept because it's just about being fascinated and observing your systems, learning from your systems.

CHARITY MAJORS: It's a reasonable request. Everyone wants to understand their lives. Yes.

KASLIN FIELDS: Very interesting. I'm excited to dive into that. But one last thing that I want to ask you about, since I am such a big fan of your blog. I have noticed that any time I attend a CoffeeOps event, which, for anyone listening who is not familiar with CoffeeOps, it's this meetup series. It's a global concept where--

CHARITY MAJORS: It's super fun.

KASLIN FIELDS: Yeah. Folks get together and you sit in round tables, usually, and you have Post-it notes, and everybody writes ideas of things they want to talk about on Post-it notes. And then everybody votes. You have a certain number of votes. And so you talk about the top topics for five minutes, and then folks decide if they want to move on to the next topic or not.

Anyway, every time I go to one of these, I bring up blogging, and everybody wants to talk about it. So as someone whose blogging I really admire, do you have any advice for folks out there who want to write blogs?

CHARITY MAJORS: Sure. I have a goal for myself of writing one post a month, and I have never actually achieved it.

KASLIN FIELDS: Can relate.

CHARITY MAJORS: But goals, goals are important. So one easy tip is, if you ever write a talk, write a blog post out of it, too. The reach of blogs is actually farther than even the best talks. There are indexed. People will link them. And if you've already done the work to pull together the material, it's so-- like, at any time you write a blog post that's good, honestly, you can convert it into a talk. It's just really easy.

Number two, I think people get all up in their own heads about, is this original or not? Has anyone ever talked about this? Is anyone going to be interested in this? If you're interested in it, write about it, because it's the interest and the passion that comes across. Nobody really wants-- nobody really cares, oh, has this been said before? The best messages are the ones that get said over and over and over because they affect lots and lots of people. So it doesn't matter.

By chiming in, you are saying, yes, this matters to me, too. And that means you're contributing to the growth and the distribution of things that you care about. And don't feel like you have to be the voice of authority. You're the voice of the authority of your experience. And if you've had an experience that is sticking with you, it will probably resonate. It doesn't have to resonate with everyone. It can resonate with other people like you.

Some of my favorite talks and writings are about subjects that I know very well. It's just really fun to see a new, fresh take. It's fun to see someone experience it for the first time. These are perennial topics. People are always learning these things. And so think less about what you should do and think about what does interest you and what does stick with you and write about those things. Also, shorter is better than longer. Advice that I never take. [LAUGHS]

KASLIN FIELDS: I will admit that your posts are long, but every time-- and I have a lot of trouble reading long blog posts, but I never have trouble reading yours because they are just so interesting. The passion really comes across.

And I think that's a theme throughout our whole conversation today, is that fascination and passion, whether we're talking about, engineering management and doing your job as an engineer, you need to have that passion and that enthusiasm for it. Observability is about feeding that passion and that fascination with the work that you're doing. And blogging is about sharing that fascination and passion with the world.

CHARITY MAJORS: Yeah. I love that. Thank you, Kaslin.

KASLIN FIELDS: That's always my advice for folks who want to give talks or write blogs or anything, honestly. The best content that you create is going to be about something that you just really care about. It doesn't matter what it is. Doesn't matter if it's been done a million times. If you care about it, then it'll be interesting.

CHARITY MAJORS: Definitely. Agreed.

KASLIN FIELDS: Wonderful. Thank you so much, Charity. As I said at the beginning, it really is an honor to talk to you, because I just love your work. I love what you're doing. Can't wait to see more of it.

CHARITY MAJORS: Thank you so much for having me. This was really fun.

KASLIN FIELDS: Yeah. Thank you for being on.

ABDEL SGHIOUAR: Well, thank you, Kaslin, for that interview.

KASLIN FIELDS: And welcome back, Abdel. You've been just all over the place lately.

ABDEL SGHIOUAR: Yes, yes. So many things happening in Europe, it's crazy.

KASLIN FIELDS: You even gave a keynote, right?

ABDEL SGHIOUAR: Yes. I did a keynote in front of 1,300 people. OK, let's not-- hold our horses. I did two minutes in a 50-minutes keynote in front of 1,300 people and it was mostly a demo.

KASLIN FIELDS: Still counts. Still counts.

ABDEL SGHIOUAR: Still counts, yes. That was I/O Connect in Berlin. That was pretty cool.

KASLIN FIELDS: That's awesome. And then you've had a bunch of other events lately, too, right? It's just the season.

ABDEL SGHIOUAR: Yeah. I did KCD Munich, where I am actually recording right now. I did a keynote in that one. Well, shared the keynote with one of the organizers called Max. And yeah, a bunch of events, a bunch of workshops. And this was supposed to be my last trip. I was supposed to be back home tomorrow and just relax for the summer. But no, I am going to Berlin, because there is an event [? to ?] sponsor in Berlin and they need somebody.

KASLIN FIELDS: Nice.

ABDEL SGHIOUAR: And then I'll be home.

KASLIN FIELDS: [LAUGHS] Well, I'm glad that we've been able to do all of the things that we need to do for the podcast while you've been busy traveling.

ABDEL SGHIOUAR: Yes. Just for the people who listen to this podcast, we do this on the road-- well, I do this on the road.

KASLIN FIELDS: Yeah, he does this on the road. I don't do that. [LAUGHS]

ABDEL SGHIOUAR: Yeah. Kaslin-- each time Kaslin talked to me, I'm in a different hotel, background, random place, Paris, whatever.

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: I'm always carrying my microphone with me.

KASLIN FIELDS: Yeah. That's intense. I don't want to have to carry my microphone around. [LAUGHS]

ABDEL SGHIOUAR: I mean, honestly, it's very cool. A bunch of those events that I have done are KCD events. I've done three of them. I've done the Zurich one, I've done the Barcelona one, and I've done the one in Munich. And usually, we meet people who listen to the podcast. Just literally today, I was talking to three or four people who are active listeners. A lot of good feedback, a lot of good interactions. So I'm glad that we get to do this because then you meet the community.

KASLIN FIELDS: Yeah. I have been considering trying to run a Kubernetes Community Days in Seattle, but that's a lot of work. Running events is a lot of work, so we'll see if that ever--

ABDEL SGHIOUAR: That is a lot of work. Well, you do DevOps Days Seattle, right?

KASLIN FIELDS: Yeah. I don't know if I'll do it next year. It was a heavy lift this year, so I might define the role and let someone else handle it for a year or two. We'll see.

ABDEL SGHIOUAR: It is a lot of work. It is a lot of work. And they posed, actually, KCDs for 2024, and they're not accepting any more in 2025 because they're changing the rules. So they're updating the rules. And they think in the upcoming rules-- or it's already existing. If you want to organize a KCD, you have to have either-- one of the organizers has to be either an end member-- so works for a company which is an end member, CNCF end member, or be CNCF ambassador.

KASLIN FIELDS: Interesting. I'll have to keep an eye out for that change. And the reason that I wanted to talk about all of the events that you're doing, aside from the fact that they're awesome and you're awesome, is that you haven't been able to listen to the episode yet, so I'll have to tell you about it.

ABDEL SGHIOUAR: Yes, please. I will listen to it tomorrow, probably, on the way back to home. What did you talk with the Charity about?

KASLIN FIELDS: I was so excited to get to talk with Charity. I feel like maybe people will be a bit embarrassed by me being embarrassed by getting to talk with Charity. [LAUGHS]

ABDEL SGHIOUAR: Imposter syndrome?

KASLIN FIELDS: Yeah, a little bit. She is amazing. So the first thing that really drove me toward Charity's content was her blogs about engineering management. My philosophy, as I mentioned at the beginning of the interview, is that regardless of what you want your career path to be, learning a little bit about management is probably always going to be beneficial, because whether you go a management path yourself or not, you're always going to have a manager.

And so we talked a good bit about how it's useful for anyone to learn a little bit about engineering management. And we covered a couple of different topics. I know one that I didn't expect for us to talk about, actually, was burnout, because we talked about how an engineering manager's job should be to, of course, support their team. And a large part of that is trying to help them find ways to do work that they care about.

If you're doing work that you don't care about, that you don't have any kind of passion in at all, it's a lot easier to burn out. And so engineering managers have a very important job to do in helping their team do work that they're interested in, grow in ways that they're interested in, that also benefit the business and kind of helping everyone avoid burnout while supporting the business.

ABDEL SGHIOUAR: Yeah. And I guess, also, to your point, even if you are just an individual contributor, understanding how management work would probably help you even have a conversation with your manager--

KASLIN FIELDS: Exactly.

ABDEL SGHIOUAR: --in the sense that you can know how to ask for the right support, know how to ask for the right things, settle the expectations in the correct way.

KASLIN FIELDS: Yeah. I think I mentioned it in the interview, too, but one of my favorite zines by Julia Evans-- love Julia Evans' zines-- is "Help! I have a manager!"

And when I mentor students who are moving into the technology industry upon graduation or nearing the completion of our mentorship period, I will try to get them a copy of that zine, because-- I think that's one of the hardest things, when you come into a working environment, especially. Even if you've worked in other industries or something before, having a tech engineering manager, I feel like, can be a little intimidating. And so I think it's really useful advice that both Charity and Julia Evans do.

ABDEL SGHIOUAR: Yeah. And I think we can also add to that [? Aja, ?] your manager. She is also publishing quite a lot of writing about this topic.

KASLIN FIELDS: Yeah. I'll grab some of those blog posts and put them in the show notes. She's done some really useful ones, and not just about engineering management, either, kind of engineering culture. There's one that I always think about by [? Aja ?] that is talking about the different ways that people approach problems and using a toaster as an example. Do you want to know how the toaster works or do you just want to toast toast? [LAUGHS]

ABDEL SGHIOUAR: That's a very good analogy.

KASLIN FIELDS: [? Aja ?] does some great content.

ABDEL SGHIOUAR: Yeah. Yeah, she's awesome also.

KASLIN FIELDS: And then we also talked about, of course, observability because honeycomb.io is an observability tool based on Charity's experience as an engineer, which, we talked a little bit about her experience, and about the concept of observability 2.0. And what I really got out of that was that observability 2.0 is about enabling you to answer the questions that you want to answer.

So instead of being just a whole bunch of logs and metrics that are really hard to sort through and parse, and you can only answer the questions that those logs and things are designed to answer, collecting things-- a lot of things-- in such a way that they are very easy to use to answer those questions. And that's kind of the concept behind Honeycomb and the concept behind observability 2.0 as a concept, which is something that I've heard around in observability spaces of these observability tools are designed for the concept of observability 2.0.

ABDEL SGHIOUAR: Yeah. And also, Honeycomb is one of the companies that have been around for a while. So they are a really well-established, mature, known provider.

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: So I've heard about them. I think the first time I heard about them, I was in an SRE Conference in Dublin, like, 10 years ago or something. Yeah. They've been around. I see observability as-- you remember, we talk about this all the time. We say when we go to KubeCon. Every KubeCon has like that one topic--

KASLIN FIELDS: Yep.

ABDEL SGHIOUAR: --that is, like, the thing. And of course, 2024 Paris was observability. That was like the major one.

KASLIN FIELDS: Observability has been up there a couple of times, for sure.

ABDEL SGHIOUAR: Yeah. I feel like now there's even more and more of it even in the smaller events that I do. And there is also a lot of observability for AI and ML that started to bubble up as a topic, because that's a different type of workloads, and they probably require different types of, how do you observe this thing and how do you look at it? So I think it's a pretty cool topic.

KASLIN FIELDS: Yeah. What I've been hearing from observability folks, mostly, about AI workloads is that it's mostly like observing any other type of workloads, maybe with a little bit more focus on making sure that you're using your resources efficiently because they're so resource-intensive.

ABDEL SGHIOUAR: Well, yes and no in the sense that, yes, it's a regular workload, so your typical metrics are relevant, but when you are talking about an LLM, input and output tokens are important. And when you load the model into a GPU, the performance of the GPU also is important. So there's some extra bits and pieces.

I don't know if you saw-- and this is not about observability, what I'm about to talk about. But there was actually a doc shared with the dev mailing list for Kubernetes-- so dev@kubernetes.io-- where they are proposing a-- it's a proposal. And it's a proposal for building a gateway for LLMs. It's literally called LLM Gateways.

KASLIN FIELDS: Interesting. No, I haven't read that yet.

ABDEL SGHIOUAR: And so it's part of the Gateway API, but it will be a specific gateway for LLMs because LLMs have those different ways of managing traffic into an LLM. So it's not-- well, it's HTTP traffic, but it's-- those input tokens are important, and they are designing something very, very specific for open LLMs. If you go to the dev@kubernetes.io mailing list, you will be able to find the doc there.

KASLIN FIELDS: Interesting, yeah. I had not thought about monitoring the input and output tokens of an LLM as part of observability. It's definitely part of quality control if you're using any kind of AI component or feature in your applications, because we all know how AIs are right now. You've got to make sure that they're saying something at least around the right lines. It might look right, but hallucinations and all of that.

So quality control is very important. And so monitoring both what you're putting into the LLM and what you're getting out of it for quality, and then having tools to rerun it or add additional context or things like that, is definitely part of the app design I think that's going into AI apps, or apps with AI features, right now. But I hadn't thought about that as part of observability, but you could characterize it that way. Certainly.

ABDEL SGHIOUAR: Yeah. I mean, just, very simple example would be scaling. Like, if you need to scale based on the tokens or based on how many users, because your input traffic into the LLM doesn't look like typical HTTP or gRPC traffic.

KASLIN FIELDS: Interesting.

ABDEL SGHIOUAR: Yes, it's an HTTP call, but the amount of tokens in that HTTP call are important, and you might need to scale based on that. So it's just like there is a bunch of little bits and pieces that makes LLMs special. They're not different. They're just special.

KASLIN FIELDS: Interesting. We didn't really talk about that much in the episode. Now I kind of wished that I had asked Charity about that. [LAUGHS]

ABDEL SGHIOUAR: Or probably for the good, I think our--

KASLIN FIELDS: Probably for the best, yes.

ABDEL SGHIOUAR: --listeners don't really want to, yes. [LAUGHS] Exactly.

KASLIN FIELDS: There's plenty of more interesting topics about observability in general.

ABDEL SGHIOUAR: So actually, the funny thing, when I did I/O Connect events-- so there was, of course, quite a lot of AI and ML. By far, the largest stage at the event was the AI one, and there was one stage for cloud, one stage for web, and one stage for mobile, which were smaller. The next day, we had the GDE, the Google Developer Experts forum. So there's a summit for GDEs in Berlin.

And I had a talk there about GKE. And I showed up and I was like, OK, we're just going to talk about GKE. We're not going to talk about AI/ML. And I could see the relief in people's eyes.

[LAUGHTER]

KASLIN FIELDS: There's a lot of that, for sure.

ABDEL SGHIOUAR: Exactly, exactly.

KASLIN FIELDS: Yeah.

ABDEL SGHIOUAR: Cool. Awesome.

KASLIN FIELDS: And then the last thing that I asked her about, to close things up, was advice for blogging. [LAUGHS]

ABDEL SGHIOUAR: Oh, cool. That would be nice.

KASLIN FIELDS: And what I kind of liked about her response was that it wasn't like, oh, you have to do this and this in order to be successful at blogging. You have to make sure that you blog every week, or you have to follow these rules about the structure of your blog, or anything like that. It was just like, write about things that you're passionate about, which is often my advice to folks who are submitting talks to conferences. Another piece of advice from her was, if you do a talk at a conference, just turn that into a blog post.

ABDEL SGHIOUAR: Yeah. Repurpose.

KASLIN FIELDS: And the best content is always content that you're excited about. So it can run long. A lot of her blog posts are really long, I must say, but they are fantastic. And so I read them in their entirety even though I usually don't do that. So it's all about the passion. If you write about something that you're passionate about, the rest of the rules can be a little flexible.

ABDEL SGHIOUAR: Nice. Awesome.

KASLIN FIELDS: Yeah. Thanks for joining me, Abdel.

ABDEL SGHIOUAR: Thank you.

KASLIN FIELDS: I hope you get a break soon.

ABDEL SGHIOUAR: I hope so, yes, as well.

[LAUGHTER]

And thank you for listening to us. Thank you for sticking until the end.

KASLIN FIELDS: Thank you. That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media at @KubernetesPod or reach us by email at <kubernetespodcast@google.com>.

You can also check out the website at kubernetespodcast.com, where you'll find transcripts, show nodes, and links to subscribe. Please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening, and we'll see you next time.

[MUSIC PLAYING]

View More Episodes