Notes on Google SRE Experience

Simplify Complexity
2 min readAug 7, 2022

Google created the SRE role. The idea of software engineering for operations is an exciting approach. My observation for a few months of SRE work is as follows:

SRE does a lot of development

Unlike a typical ops role, a development project is normal to your work. Each quarter's planning includes a couple of medium sizes development projects. The projects could be related to creating a new system flow, improving configs, rollouts, security, scalability, etc.

Oncall is the first priority

SREs are mostly first responders to critical problems. In the first few months, SREs learn to perform oncall duties after understanding the systems. The oncall tasks include triage of a problem, mitigating it, and seek for solutions, yourself, or asking the devs.

System understanding is vital for survival

SREs need to learn a lot of components, their interaction, and how things will go wrong. It needs a good memory, an excellent way to organize information, and putting in hours of work to understand systems interactions.

The Good, Bad, and Ugly of SRE

I liked the engineering excellence of the systems, their scale, and their complexity. The whole SRE work is well structured. There is a clear path to progress in your career. Lots of brilliant engineers work in SRE and motivate you.

The bad part is the running systems without knowing everything about them. A lot of unknowns.

The ugly part is memorizing so many details. It may become easy after a few more months of work. But the number of things to keep on top of your head is pretty overwhelming.

Conclusion

SRE is not dev-ops. Your skills in system engineering and a knack for going in-depth about a problem come in handy and are often used. It is a different career track though. If you love running things, not afraid of a production outage, you might love the SRE job.

Simplify Complexity

Golang, Distributed Systems, File Systems, Python, C/C++, Linux