Whenever the engineering support steps in, the problem gets solved fast or the issue gets acknowledged. However, over the last 15 months, ~50% of times our GCP support cases end up in frustration with no resolution. We have more success finding issues in the corresponding GitHub repo and opening an issue there.
We have the expensive off-the-shelve support option (I think 450/seat/month) for 1h response.
In most cases, we spend more time back and forth with support than it would take you to figure it out. I'm talking about issues that span over weeks with tens of hours spent. We end up reiterating the original support case problem (i.e. the support engineer doesn't bother reading the actual problem) whenever the engineer changes.
We've had:
P1s where the support engineer told us we'll get an update the following day, only to figure out that there was a breaking release on their side that exactly matched our description.
While investigating a load balancer issue, the support engineer looked at the LB logs, saw a ton of logs coming from penetration scans (e.g. GET /phpmyadmin), and suggested that the solution was to open up those addresses.
The cynical part of me expects that this case was handled so well because a) the support people found the issue fascinating and fun to work on, b) the post-mort on it would make an excellent blog post.
On the flip side, it's encouraging that they have people somewhere in the support chain who are capable enough to read Linux kernel code and submit fixes upstream.
Author of the article here. I only thought of the possibility of making a blog post after the case was closed and I started telling my colleagues about it, and realized I would have loved to read about this.
The case was indeed fun to work with, but the main reason why it had such a fast and happy resolution was because the customer was very responsive and very cooperative.
I cannot talk for every Technical Solution Engineer, but I can tell you that I have no particular interest in simply closing a ticket: I want to go down the rabbit hole and solve technical issues, and I know many of my colleagues feel the same.
I am also far from being the most senior or skilled TSE in Google Cloud Support, I just wrote an article about one of most interesting cases I had.
I'm inspired by how much you seem to know about the details of computer network stuff. Is that a required knowledge to become a Google Tech Support person or you are just above average in terms of that among your peers?
Also, I wonder how you learn all these knowledge (that is, asking for recommendation on a few books/resources for learning) if you don't mind sharing. Thanks in advance!
I don't have deep knowledge of details of compute networks, there is a team of TSE who deal with network cases who know more than me. But the whole point of troubleshooting is not knowing what is wrong, but being able to find what is wrong. In order to do that you need good basis, and those you can make by studying how networks and linux systems work (someone here posted some titles) and with experience (I have some grey hair myself). But every time you troubleshoot something you end up touching something you don't know, and that's where you learn something new you might use next time. For example I didn't know about dropwatch, a colleague suggested it to me.
During the interview process at Google we don't expect candidates to be able to get to this level of depth, but we try to hire candidates that could, over time and depending on their skill set, potentially reach a similar level of depth and ability to troubleshoot cases.
I'm one of the TSEs who handle networking cases. True to what was said, I was hired with very little networking background, but plenty of development and hardware information.
I've since taken the mantle for handling most of the cases dealing with Interconnects and VPNs. I enjoy it too!
Thank you! This is encouraging. My background is mostly in Python programming and SQL. But in an alternate universe, I wish I am a network ninja like you guys and I will definitely check out Google TSE jobs when I can look for new jobs (currently hoping to get my green card done). If there's a book or two that is the most useful for you to be an efficient TSE, please feel free to share. Have a restful weekend!
Thank you for the reply! My background is mostly in Python programming and SQL. But in an alternate universe, I wish I am a network ninja like you. If there's a book or two that is the most useful for you to be an efficient TSE, please feel free to share. Have a restful weekend!
The front-line support is the same as anywhere else. But Google Cloud has really really good second and third line support, if the first tier can't figure it out. And in many cases, it'll get escalated directly to the implementing engineers.
In my experience, Google Cloud is better than most organizations about escalating hard issues up to the chain. Admittedly, this happened at a company with substantial spend, and I can't say one way or another whether a smaller player would get the same quality of support.
If you're spending that much money (this isn't GCP specific -- this is any cloud) you should be establishing a 1-3 year min-commit contract, and in practice, this will get negotiated through the CFO. This will get you massive discounts -- 20-30% under list price, in exchange for spending $X million/year over Y years.
It will also get you a dedicated sales rep and sales team, and they will absolutely crack the whip on internal teams to get issues resolved. At those spends, you can almost get an in-house support team of PSOs to bounce problems off of.
Google only gives you good service if they respect you as engineers. We'd say stupid stuff and get the cold-shoulder, and then later would find some cool bug with encrypted VPNs dropping packets (with no monitoring in GCP, only our tcpdump from various places) and got some very skilled network engineers looking at the data and making code changes. They still muted us for long periods of time while talking amongst themselves, but did deliver.
I don’t really think they care about you as an engineer. I had some “cool” problems which they totally neglected for long periods of time and general vibe has been “you need to prove to us this is our fault”. I think the real reason is org incentives aren’t setup to make infra team happy
Contract-level commits will generally work on top of Committed Use Discounts, not instead of (but obviously this comes down to your own SKU by SKU negotiating).
Even on platinum plans they are terrible in my experience. Endless pinballing of tickets and trying to blame issue on the customer. We had two day outage several times in my previous gig because google support refused to acknowledge the problems