Solution to the specifically mentioned problem: Don't use string-based errors, use sentinel errors [1].
More generally: Don't produce code where consumers of your API are the least bit inclined to rely on non-technical strings. Instead use first-level language constructs like predefined error values, types or even constants that contain the non-technical string so that API consumers can compare the return value againnst the constant instead of hard-coding the contained string themselves.
Hyrum's Law is definitely a thing, but its effects can be mitigated.
The frustrating thing is that the error in question already is a sentinel error -- Grafana (the top-level culprit in the linked search) should be using `errors.As(&http.MaxBytesError{})` rather than doing a string compare.
The whole point of Hyrum's Law is that it doesn't matter how well you design your API: no matter what, people will depend on its behavior rather than its contract.
Good catch. So in a sense this isn't really Hyrum's Law (which would be more appropriate to things like the Sim City / Windows 3.x UAF bug described in a sibling comment); it's more like, if people need to do something, and you don't give people an explicit way to do it, they'll find an implicit way, and then you're stuck supporting whatever that happened to be.
Early Go lacked lots of features such as errors.As. It was and still is sometimes idiomatic to generate Go because it is so featureless and writing it is often a chore. So it is very much about how well you design your API.
In your example, the onus is on the consumer not the provider. I could still be writing code that checks if `err.String() == "no more tea available."`. I agree, I shouldn't do that, but nothing is preventing me from doing that. Additionally, errors.Is is a relatively recent addition to Go, so by the time people would check for errors like this, it was just easier to check the literal string. But as an API provider in Go, you cannot prevent your consumers from checking the return values of .String().
Unfortunately true. The Go maintainers might not agree with me on this, but I think in this case consumers have to learn the hard way. Go tries to always be backwards compatible, but I don't think that trying to be backwards compatible with incorrect usage is ever the right choice.
Code that checks raw error strings is just plain bad and should be exempt from Go’s backwards compatibility guarantees. There is almost never an excuse for it, especially in stdlib.
yes, yes, yes! see the Linux Kernel for plenty of such good and readable uses of go-to, considered useful: "on error, jump there in the cleanup sequence ..."
Hah, I wrote the crypto/rsa comments. We take Hyrum's Law (and backwards compatibility [1]) extremely seriously in Go. Here are a couple more examples:
- We randomly read an extra byte from random streams in various GenerateKey functions (which are not marked like the ones in OP) with MaybeReadByte [2] to avoid having our algorithm locked in
- Just yesterday someone reported that a private ECDSA key with a nil public key used to work, and now it doesn't, so we probably have to make it work again [3]
- Iterating over a map uses a randomized order to avoid exposing the internals
- The output of rand.Rand is considered part of the compatibility promise, so we had to go to great lengths to improve it [4]
- We discuss all the time what commitments to make in docs and what behaviors to disclaim, knowing we can never change something documented and probably something that's not explicitly documented as "this may change" [6]
The nil key case really makes me wonder how sane it is to support these cases. You will be forced to lug this broken behavior with you forever, like the infamous A20 line (https://en.wikipedia.org/wiki/A20_line).
As a user of your code this is true, and I'm very grateful indeed that you take this approach.
I would add as a slight caveat that to benefit from this policy, users absolutely must read the release notes on major go versions before upgrading. We recently didn't, and we were burnt somewhat by the change to disallow negative serial numbers in the x509 parser without enabling the new feature flag. Completely our fault and not yours, but I add the caveat nevertheless.
We have gotten a liiiiittle more liberal ever since we introduced the new GODEBUG feature flag mechanism.
I've been meaning to write a "how to safely update Go" post for a while, because the GODEBUG mechanism is very powerful but not well-known and we could build a bit of tooling around it.
In short, you can upgrade your toolchain without changing the go.mod version, and these things will keep working like they did, and set a metric every time the behavior would have changed, but didn't. (Here's where we could build a bit of tooling to check that metric in prod/tests/CLIs more easily.) Then you can update the go.mod version, which updates the default set of GODEBUGs, and if anything breaks, try reverting GODEBUGs one by one.
Breaking changes in major version updates is a completely normal thing in most software and we usually check for it. Ironically the only reason we weren't previously bothering in go is that the maintainers were historically so hyper-focused on absolute backwards compatibility that there were never any breaking changes!
An interesting topic is how to fight Hyrum's law.
A possibility is to add randomness in things you don't want people to rely on.
If I remember well, this is what the QUIC protocol does. Some fields are unused in the current version, but required by the specification to be set to random values, not null bytes, so that routers don't start relying on them to identify the packets.
> The value in the Unused field is set to an arbitrary value by the server. Clients MUST ignore the value of this field. [...] Note that other versions of QUIC might not make a similar recommendation.
I think they call it "greasing", to prevent "ossification".
This is a good example of "stringly typed" software. Golang designers did not want exceptions (still have them with panic/recover), but untyped errors are evil. On the other hand, how would one process typed errors without pattern matching? Because "catch" in most languages is a [rudimentary] pattern matching.
It has typed errors, except every function that returns an error returns the 'error' interface, which gives you no information on the set of errors you might have.
In other statically typed languages, you can do things like 'match err' and have the compiler tell you if you handled all the variants. In java you can `try { x } catch (SomeTypedException)` and have the compiler tell you if you missed any checked exceptions.
In go, you have to read the recursive call stack of the entire function you called to know if a certain error type is returned.
Can 'pgx.Connect' return an `io.EOF` error? Can it return a "tls: unknown certificate authority" (unexported string only error)?
The only way to know is to recursively read every line of code `pgx.Connect` calls and take note of every returned error.
In other languages, it's part of the type-signature.
Go doesn't have _useful_ typed errors since idiomatically they're type-erased into 'error' the second they're returned up from any method.
Exceptions in Python and C are the same. The idea with these is, either you know exactly what error to expect to handle and recover it, or you just treat it as a general error and retry, drop the result, propagate the error up, or log and abort. None of those require understanding the error.
Should an unexpected error propagate from deep down in your call stack to your current call site, do you really think that error should be handled at this specific call-site?
Also in most languages "catch Exception:" (or similar expression) is considered a bad style. People are taught to catch specific exceptions. Nothing like that happens in Go.
Matching the underlying type when using an interface never feels natural and is definitely the more foreign part of Go's syntax to people who are not super proficient with it. Thus, they fall back on what they know - string comparison.
When I clicked on the link to codebases relying on the specific error string, I was expecting to see random side projects. Wasn't expecting to see Grafana and Caddy on the list.
Weren’t there a couple of anecdotes where Windows couldn’t fix a bug because some popular game (maybe SimCity?) depended on it, so the devs hardcoded a SimCity check inside Windows and made the bug happen if it was running?
It was not a bug in windows, it was a bug in SimCity: it would UAF some memory, but the Windows 3.x allocator did not unmap / clear that memory so it worked.
Windows 95 changed that, and so one of the compatibility shims it got is that the allocator had a 3.x adjacent mode, which would be turned on when running SimCity (and probably other similarly misbehaving software as well).
Nowadays this is formalised in the compatibility engine (dating back to windows do), which can enable special modes or compatibility shims for applications (windows admins trying to run legacy or unmaintained applications can manage the application of compatibility modes via the “compatibility administrator”).
Jon Ross, who wrote the original version of SimCity for Windows 3.x, told me that he accidentally left a bug in SimCity where he read memory that he had just freed. Yep. It worked fine on Windows 3.x, because the memory never went anywhere. Here’s the amazing part: On beta versions of Windows 95, SimCity wasn’t working in testing. Microsoft tracked down the bug and added specific code to Windows 95 that looks for SimCity. If it finds SimCity running, it runs the memory allocator in a special mode that doesn’t free memory right away. That’s the kind of obsession with backward compatibility that made people willing to upgrade to Windows 95.
As another commenter pointed out, this is to a point what Go does as well; for example, map iteration is randomised so no implementation will rely on insertion order.
Immediately reminded of this: https://externals.io/message/126011 that is an ongoing conversation in php-internals about removing a quirky/buggy behavior from PHP that, at the very end (at least of this comment's time) someone jumps in and says "yep, its useful, please keep it"
And this isn't even quirky/buggy, it's just the string representation of an error. That said, Go took a while to improve its core error mechanisms and add utilities for matching errors by type instead of its string representation.
Sure... but this is why we have sem versioning and release notes. It's always nice to try and support all users but sometimes you just need to ship breaking changes...
While in principle you're correct, Go the language is very dedicated to backwards and forwards compatibility; while there's been talk of a Go 2 for a long time now, they're not eager to go there and if they do, they intend to make the transition low impact.
That said, I'd say this is an excellent candidate to deprecate or warn about now, and to make impossible in a version 2. Then again, how would you even stop this? A string representation of an error is common in any language, you need it to log things.
I think at best there will be a static analysis rule (in e.g. go vet) that tries to figure out if any matching is done on the string representation of an error.
> I think at best there will be a static analysis rule (in e.g. go vet) that tries to figure out if any matching is done on the string representation of an error.
Solution to the specifically mentioned problem: Don't use string-based errors, use sentinel errors [1].
More generally: Don't produce code where consumers of your API are the least bit inclined to rely on non-technical strings. Instead use first-level language constructs like predefined error values, types or even constants that contain the non-technical string so that API consumers can compare the return value againnst the constant instead of hard-coding the contained string themselves.
Hyrum's Law is definitely a thing, but its effects can be mitigated.
[1]: https://thomas-guettler.de/go/wrapping-and-sentinel-errors
The frustrating thing is that the error in question already is a sentinel error -- Grafana (the top-level culprit in the linked search) should be using `errors.As(&http.MaxBytesError{})` rather than doing a string compare.
The whole point of Hyrum's Law is that it doesn't matter how well you design your API: no matter what, people will depend on its behavior rather than its contract.
But it looks like that until 3 years ago, this string comparison was the only way to do it. https://github.com/golang/go/pull/49359/files
Good catch. So in a sense this isn't really Hyrum's Law (which would be more appropriate to things like the Sim City / Windows 3.x UAF bug described in a sibling comment); it's more like, if people need to do something, and you don't give people an explicit way to do it, they'll find an implicit way, and then you're stuck supporting whatever that happened to be.
Early Go lacked lots of features such as errors.As. It was and still is sometimes idiomatic to generate Go because it is so featureless and writing it is often a chore. So it is very much about how well you design your API.
In your example, the onus is on the consumer not the provider. I could still be writing code that checks if `err.String() == "no more tea available."`. I agree, I shouldn't do that, but nothing is preventing me from doing that. Additionally, errors.Is is a relatively recent addition to Go, so by the time people would check for errors like this, it was just easier to check the literal string. But as an API provider in Go, you cannot prevent your consumers from checking the return values of .String().
Unfortunately true. The Go maintainers might not agree with me on this, but I think in this case consumers have to learn the hard way. Go tries to always be backwards compatible, but I don't think that trying to be backwards compatible with incorrect usage is ever the right choice.
Code that checks raw error strings is just plain bad and should be exempt from Go’s backwards compatibility guarantees. There is almost never an excuse for it, especially in stdlib.
Using string error comparisons was the only way to do this few years ago; and Go has a backwards compatibility promise.
Honestly, this is so much worse than "catch". It's what a "catch" would look like in "C".
It might look worse than catch, but it's much more predictable and less goto-y.
goto was only bad when used to save code and jump indiscriminately. To handle errors is no problem at all.
yes, yes, yes! see the Linux Kernel for plenty of such good and readable uses of go-to, considered useful: "on error, jump there in the cleanup sequence ..."
Hah, I wrote the crypto/rsa comments. We take Hyrum's Law (and backwards compatibility [1]) extremely seriously in Go. Here are a couple more examples:
- We randomly read an extra byte from random streams in various GenerateKey functions (which are not marked like the ones in OP) with MaybeReadByte [2] to avoid having our algorithm locked in
- Just yesterday someone reported that a private ECDSA key with a nil public key used to work, and now it doesn't, so we probably have to make it work again [3]
- Iterating over a map uses a randomized order to avoid exposing the internals
- The output of rand.Rand is considered part of the compatibility promise, so we had to go to great lengths to improve it [4]
- We discuss all the time what commitments to make in docs and what behaviors to disclaim, knowing we can never change something documented and probably something that's not explicitly documented as "this may change" [6]
[1]: https://go.dev/doc/go1compat
[2]: https://pkg.go.dev/crypto/internal/randutil#MaybeReadByte
[3]: https://go.dev/issue/70468
[4]: https://go.dev/blog/randv2
[5]: https://go.dev/blog/chacha8rand
[6]: https://go-review.googlesource.com/c/go/+/598336/comment/5d6...
The nil key case really makes me wonder how sane it is to support these cases. You will be forced to lug this broken behavior with you forever, like the infamous A20 line (https://en.wikipedia.org/wiki/A20_line).
> You will be forced to lug this broken behavior with you forever
Yep, welcome to my life.
As a user of your code this is true, and I'm very grateful indeed that you take this approach.
I would add as a slight caveat that to benefit from this policy, users absolutely must read the release notes on major go versions before upgrading. We recently didn't, and we were burnt somewhat by the change to disallow negative serial numbers in the x509 parser without enabling the new feature flag. Completely our fault and not yours, but I add the caveat nevertheless.
We have gotten a liiiiittle more liberal ever since we introduced the new GODEBUG feature flag mechanism.
I've been meaning to write a "how to safely update Go" post for a while, because the GODEBUG mechanism is very powerful but not well-known and we could build a bit of tooling around it.
In short, you can upgrade your toolchain without changing the go.mod version, and these things will keep working like they did, and set a metric every time the behavior would have changed, but didn't. (Here's where we could build a bit of tooling to check that metric in prod/tests/CLIs more easily.) Then you can update the go.mod version, which updates the default set of GODEBUGs, and if anything breaks, try reverting GODEBUGs one by one.
That sounds good.
Breaking changes in major version updates is a completely normal thing in most software and we usually check for it. Ironically the only reason we weren't previously bothering in go is that the maintainers were historically so hyper-focused on absolute backwards compatibility that there were never any breaking changes!
From the Go architects, that comment reads like a capitulation!
In other parts of the language they have tried to help programmers avoid depending on incidental behaviour. e.g.
When you iterate over a map in they added code to make the iteration order random: https://stackoverflow.com/questions/55925822/why-are-iterati...
The select statement will choose a channel at random if more than one is ready: https://go.dev/tour/concurrency/5
Therefore an unhinged solution would be to have error messages in several languages, and pick one to return at random.
An interesting topic is how to fight Hyrum's law. A possibility is to add randomness in things you don't want people to rely on. If I remember well, this is what the QUIC protocol does. Some fields are unused in the current version, but required by the specification to be set to random values, not null bytes, so that routers don't start relying on them to identify the packets.
EDIT.
I think I found the source: https://www.rfc-editor.org/rfc/rfc9000#section-17.2.1
> The value in the Unused field is set to an arbitrary value by the server. Clients MUST ignore the value of this field. [...] Note that other versions of QUIC might not make a similar recommendation.
I think they call it "greasing", to prevent "ossification".
This is a good example of "stringly typed" software. Golang designers did not want exceptions (still have them with panic/recover), but untyped errors are evil. On the other hand, how would one process typed errors without pattern matching? Because "catch" in most languages is a [rudimentary] pattern matching.
https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...
Go has typed errors, it just didn't use it in this case.
Nobody teaches people to use them. There is no analog to "catch most specific exceptions" culture in other languages.
It has typed errors, except every function that returns an error returns the 'error' interface, which gives you no information on the set of errors you might have.
In other statically typed languages, you can do things like 'match err' and have the compiler tell you if you handled all the variants. In java you can `try { x } catch (SomeTypedException)` and have the compiler tell you if you missed any checked exceptions.
In go, you have to read the recursive call stack of the entire function you called to know if a certain error type is returned.
Can 'pgx.Connect' return an `io.EOF` error? Can it return a "tls: unknown certificate authority" (unexported string only error)?
The only way to know is to recursively read every line of code `pgx.Connect` calls and take note of every returned error.
In other languages, it's part of the type-signature.
Go doesn't have _useful_ typed errors since idiomatically they're type-erased into 'error' the second they're returned up from any method.
Exceptions in Python and C are the same. The idea with these is, either you know exactly what error to expect to handle and recover it, or you just treat it as a general error and retry, drop the result, propagate the error up, or log and abort. None of those require understanding the error.
Should an unexpected error propagate from deep down in your call stack to your current call site, do you really think that error should be handled at this specific call-site?
Nope, exceptions in Python are not the same. There are a lot of standard exceptions
https://docs.python.org/3/library/exceptions.html#concrete-e...
and standard about exception type hierarchy
https://github.com/psycopg/psycopg/blob/d38cf7798b0c602ff43d...
https://peps.python.org/pep-0249/#exceptions
Also in most languages "catch Exception:" (or similar expression) is considered a bad style. People are taught to catch specific exceptions. Nothing like that happens in Go.
The consumer didn't, but the error in the example is typed, it's called `MaxBytesError`.
Matching the underlying type when using an interface never feels natural and is definitely the more foreign part of Go's syntax to people who are not super proficient with it. Thus, they fall back on what they know - string comparison.
When I clicked on the link to codebases relying on the specific error string, I was expecting to see random side projects. Wasn't expecting to see Grafana and Caddy on the list.
Never underestimate the mediocrity of known large codebases, lol.
(just kidding, they're not mediocre, but they're not infallible or perfect either)
In Docker's error response for `docker rmi'; the fifteenth word is "container" and the sixteenth is the container ID.
Should this not be handled by checking "resp.status == 413" ?
Weren’t there a couple of anecdotes where Windows couldn’t fix a bug because some popular game (maybe SimCity?) depended on it, so the devs hardcoded a SimCity check inside Windows and made the bug happen if it was running?
It was not a bug in windows, it was a bug in SimCity: it would UAF some memory, but the Windows 3.x allocator did not unmap / clear that memory so it worked.
Windows 95 changed that, and so one of the compatibility shims it got is that the allocator had a 3.x adjacent mode, which would be turned on when running SimCity (and probably other similarly misbehaving software as well).
Nowadays this is formalised in the compatibility engine (dating back to windows do), which can enable special modes or compatibility shims for applications (windows admins trying to run legacy or unmaintained applications can manage the application of compatibility modes via the “compatibility administrator”).
Still a pretty good example of having to support something which is definitely not part of the official spec.
Had it been open source, they could have just fixed the software instead
Fixing the upstream would not have updated it on the millions of machines running it, which is what they wanted to not break.
https://www.joelonsoftware.com/2000/05/24/strategy-letter-ii...
Jon Ross, who wrote the original version of SimCity for Windows 3.x, told me that he accidentally left a bug in SimCity where he read memory that he had just freed. Yep. It worked fine on Windows 3.x, because the memory never went anywhere. Here’s the amazing part: On beta versions of Windows 95, SimCity wasn’t working in testing. Microsoft tracked down the bug and added specific code to Windows 95 that looks for SimCity. If it finds SimCity running, it runs the memory allocator in a special mode that doesn’t free memory right away. That’s the kind of obsession with backward compatibility that made people willing to upgrade to Windows 95.
Corollary: uptime is part of the defacto spec being relied on.
One of the SRE practices is breaking your service on purpose to bring the actual service level closer to what is promised and supported.
As another commenter pointed out, this is to a point what Go does as well; for example, map iteration is randomised so no implementation will rely on insertion order.
another one, you pay me below market rate and you get below market rate code
Related XKCD: https://xkcd.com/1172/
This is why we have semantic versioning.
It's like an inverted game of cat and mice
1 - Lang/OS/Lib developer puts out a quirky or buggy API (or even just an ok API)
2 - Developers rely on a quirky, weird or unexpected side effect because it's easier/more obvious or it just works this way due to a bug
3 - Original developer can't fix it because it would break compatibility
4 GOTO 1
Immediately reminded of this: https://externals.io/message/126011 that is an ongoing conversation in php-internals about removing a quirky/buggy behavior from PHP that, at the very end (at least of this comment's time) someone jumps in and says "yep, its useful, please keep it"
And this isn't even quirky/buggy, it's just the string representation of an error. That said, Go took a while to improve its core error mechanisms and add utilities for matching errors by type instead of its string representation.
Sure... but this is why we have sem versioning and release notes. It's always nice to try and support all users but sometimes you just need to ship breaking changes...
While in principle you're correct, Go the language is very dedicated to backwards and forwards compatibility; while there's been talk of a Go 2 for a long time now, they're not eager to go there and if they do, they intend to make the transition low impact.
That said, I'd say this is an excellent candidate to deprecate or warn about now, and to make impossible in a version 2. Then again, how would you even stop this? A string representation of an error is common in any language, you need it to log things.
I think at best there will be a static analysis rule (in e.g. go vet) that tries to figure out if any matching is done on the string representation of an error.
> I think at best there will be a static analysis rule (in e.g. go vet) that tries to figure out if any matching is done on the string representation of an error.
First they'd need to export the errors the stdlib returns https://news.ycombinator.com/item?id=41507714
I wouldn't hold my breath on that one.