What is going on in Unix with errno's limited nature

utcc.utoronto.ca

58 points by ingve 4 days ago

rwmj 7 hours ago

By far the largest issue with errno is that we don't record where inside the kernel the error gets set (or "raised" if this was a non-C language). We had a real customer case recently where a write call was returning ENOSPC, even though the filesystem did not seem to have run out of space, and searching for the place where that error got raised was a multi-week journey.

In Linux it'd be tough to implement this because errors are usually raised as a side effect of returning some negative value, but also because you have code like:

  err = -EIO;
  ... nothing else sets err here ...
  return err;

But instrumenting every function that returns a negative int would be impossible (and wrong). And there are also cases where the error is saved in (eg) a bottom half and returned in the next available system call.

saagarjha 3 hours ago

Yeah, I've had to kernel debug my way through this several times. It sucks greatly.

userbinator 10 hours ago

I don't think this behaviour is "peculiar" as the author says it is; why does the error number matter if you know the call succeeded? GetLastError() on Windows works similarly, although with the additional weird caveat that (undocumentedly) some functions may set it to 0 on success.

The system call wrappers could all have explicitly set errno to 0 on success, but they didn't.

Because it's plainly unnecessary. It'd be a waste today, and even more so on a PDP-11 in the 1970s.

aa-jv 5 hours ago

I agree with you, its not so peculiar as one might think. (Disclaimer: been writing software for Unix since before POSIX...)
This design choice reflects the POSIX philosophy of minimizing overhead and maximizing flexibility for system programming. Frequent calls to write(), for example, would be hindered by having to reset errno with each call/check of write() return value - especially in cases where a lot of write()'s are queued.
Or .. a library function like fopen() might internally call multiple system calls (e.g., open(), stat(), etc.). If one of these calls sets errno but the overall fopen() operation succeeds, the library doesn’t need to clear errno. For instance, fopen() might try to open a file in one mode, fail with an errno of EACCES (permission denied), then retry in another mode and succeed. The final errno value is irrelevant since the call succeeded.
This mechanism minimizes overhead by not requiring errno to be reset on success.
It allows flexible, efficient implementations of library and system calls and encourages robust programming practices by mandating return-value checks before inspecting errno.
It supports complex, retry-based logic in libraries without unnecessary state management - and it preserves potentially useful debugging information.
You only care about errno when you know an actual error occurred. Until then, ignore it.
This is similar to other systems-level things that can occur in such environments, for example when setting a hard Reset-Reason or Fail-Reason register in static/non-volatile memory somewhere, for later assessment.
IMHO, the thing thats most peculiar about this is that folks these days think of it as weird/quaint - when in fact, it makes a lot of sense if you think about it.
- amelius 4 hours ago
  
  Does Valgrind give a warning when you do check errno after a successful system call?
  
  aa-jv 4 hours ago
  
  Valgrind doesn't specifically (although it might catch side-effects of this bug), but Coverity would, under most conditions (such as having MISRA or CERT checkers enabled), catch it.
  If its important to you to find these cases, use clang's static analyzer to perform deep source code analysis via scan-build. See also cppcheck and pc-lint, or PVS-Studio, each of which have means by which you can catch this error.
- saagarjha 3 hours ago
  
  I mean, it definitely sucks. It's cute if you look at it as a product of its time but it makes no sense to do things this way anymore.
  
  aa-jv 3 hours ago
  
  I don't agree with you at all that 'it sucks'. I think it makes perfect sense and works as intended.
  Plus, we're talking about POSIX here. You don't have a time machine. Shall we argue about just how much POSIX software is out there, working perfectly fine with this technique?
  Sure, in New Fangled Language De Jour™, return as many tuples as your heart desires.
  But don't expect POSIX to play along ..
  
  MangoToupe 4 minutes ago
  
  > Plus, we're talking about POSIX here. You don't have a time machine. Shall we argue about just how much POSIX software is out there, working perfectly fine with this technique?
  I'd argue this is in spite of choosing a path that makes maintainable software more difficult than it needs to be. Constraints change over time, and the thought process that made this practice rational no longer coheres with modern-day constraints. Maintaining software is now (much, much) more expensive than the performance minutae that led to this cost.
  > Sure, in New Fangled Language De Jour™, return as many tuples as your heart desires.
  This isn't related to language at all—C may not have tuples, but structs are an equivalent.
  
  saagarjha 3 hours ago
  
  It sucks because people miss error codes, or they overwrite them, or misunderstand what they mean. You read the blog post right? That’s what I’m talking about. I deal with that all the time (maybe more than the average person, to be fair, but still it’s a lot).
  I don’t really blame the people in 1970 for coming up with this design but it’s 2025 now; we can agree that it has problems. Tape recorders were also a neat idea but I can record a thousand times more on my phone now, often at higher quality. By modern standards, they suck.

jezze 7 hours ago

I think it would have been better if they had designed it so that the error message from the kernel came in a seperate register. That would mean you didnt have to use signed int for the return value. The issue is that one register now is sort of disambigious. It either returns the thing you want or the error but these are seperate types. If you had them in seperate registers you would have the natural type of the thing you are interested in without having to convert it. This would however force you to first check the value in the error register before using the value in the return register but that makes more sense to me than the opposite.

bhawks 6 hours ago

A whole separate register?
That is quite expensive. Obviously you need to physically add the register to the chip.
After that the real work comes. You need to change your ISA to make the register addressible by machine code. Pdp11 had 8 general purpose registers so they used 3 bits everywhere to address the registers. Now we need 4 sometimes. Many op codes can work on 2 registers, so we need to use 8 out of 16 bits to address both where before we only needed 6. Also pdp11 had fixed 16 bits for instruction encoding so either we change it to 18 bit instructions or do more radical changes on the ISA.
This quickly spirals into significant amounts of work versus encoding results and error values into the same register.
Classic worse is better example.
- dwattttt 5 hours ago
  
  > A whole separate register?
  There are quite a few registers (in all the ISAs I'm familiar with) that are defined as not preserved across calls; kernels already have to wipe them in order to avoid leaking kernel-specific data to userland, one of them could easily hold additional information.
  EDIT: additionally, it's been a long time since the register names we're familiar with in an ISA actually matched the physical registers in a chip.
  
  JdeBP 4 hours ago
  
  It is distinctly odd to watch people in the 2020s laboriously explaining how difficult all this stuff would be, when the reality was that the register scarcity that prompted this sort of double-duty in 1979 was already going away in mass-market computers in 1982.
  By 1983, operating system vendors designing their APIs ab initio were already making APIs that just used separate registers for error and result returns. Sinclair QDOS was one well-known example. MS-DOS version 2 might have done things the PDP-11 way, but by the time of MS-DOS version 4 people were already inventing INT calls that used multiple registers to return things. OS/2 was always returning a separate error value in 1987. Windows NT's native API has always been returning a separate NTSTATUS, not doubled up with anything else, since the 1990s.

toast0 8 hours ago

Imho, this is an area where the limitations of C shine through.

Some kernels return error status as a CPU flag or otherwise separately from the returned value. But that's very hard to use in C, so the typical convention for a syscall wrapper is to return a non-negative number for success and -error for failure, but if negative numbers are valid as the return, you've got to do something else.

JdeBP 5 hours ago

Hard‽ Small POD structures returned in register pairs has been a feature of C compiler calling conventions for over 3 decades, and was around in C compilers in the 16-bit era.
* https://jdebp.uk/FGA/function-calling-conventions.html#Watca...
- saagarjha 3 hours ago
  
  UNIX is 50 years old.

badc0ffee 10 hours ago

Something worth mentioning would have been those libc calls where the only way to tell if a return value of 0 is an error is to check errno. And of course, as the article says, errno is only set in error, you need to set it to 0 before making that libc call.

I think strtol was one such function, but there were others.

Asmod4n 10 hours ago

getservbyname Is also one of those functions, it returns NULL on error or for signaling it couldn’t find what you looked for.
ethan_smith 6 hours ago

strtol() isn't actually such a case - it sets errno on range errors but returns 0 for valid input "0"; the classic examples are actually getpriority() and sched_yield() which can legitimately return -1 on success.
- aaronmdjones 3 hours ago
  
  It is such a case, it's just not about zero. If the input is too large to be represented in a signed long, strtol(3) returns LONG_MAX and sets errno to ERANGE. However, if the input was the string form of LONG_MAX, it returns LONG_MAX and doesn't set errno to anything. In fact my strtol(3) manpage explicitly states "This function does not modify errno on success".
  Thus, to distinguish between an overflow and a legitimate maximum value, you need to set errno to 0 before calling it, because something else you called previously may have already set it to ERANGE.

tptacek 9 hours ago

Further gnarliness, hopefully long past relevance:

https://cr.yp.to/docs/connect.html

JdeBP 4 hours ago

You might think so. (-:
* https://github.com/jdebp/nosh/blob/trunk/source/socket_conne...
kqueue() can apparently return the error right in the data of the kevent, but I'm still using poll() so cannot confirm; whilst I can confirm that kqueue/kevent is alas not as truly consistent as one might expect. (Someone recently tried to move FreeBSD devd to kqueue, and hit various problems of FreeBSD devices that are still, even in version 14, not yet kqueue-ready.)
Asmod4n 3 hours ago

i had hopes to find a elegant solution to this issue in modern async io libs. Those I have found just simply ignore the error and forward it to the user.

0xbadcafebee 2 hours ago

For the kids just learning about software design today, remember that the way things are now is not necessarily ideal, and you do not have to go along with it. You can color outside the lines.

One of the great misfortunes of traditional software [and network] design is a lack of visibility throughout the stack. The author here talks about "multiple return values", which is to find out multiple pieces of information from some other piece of code. But that code calls other code, and that other code calls more, all with its own information that might be useful for you to know.

Good software design is cohesive and loosely coupled. That means you should not know, or depend on, the internal workings and variables of some other component. But at the same time, when problems happen, it is useful to know what happened in some other component, maybe even 3 components down the line. In particular, you can usually determine the cause of failures by examining just the inputs and the outputs of a function. Examine the inputs and outputs of every function in the system, and that's enough to identify or recreate most bugs. (i/o, system resource pressure, and network interruption are the last bits of info you need, but harder to gather)

But I'm not aware of this capability (examining the input and output of components multiple levels away from the current code) existing as a software design pattern. Within one component, sure, but outside it? If you load a different component into yours, maybe it exposes attributes and whatnot to you. But what about the components that component uses? And what if we're leaving the immediate computing environment? I still want to know what was going on further down the line.

Such a solution exists within systems-of-systems, as in distributed tracing. But only to an omnipotent observer in a faraway land. I want my code to know what happened elsewhere, if only to report a more accurate error message than "500 internal server error". I can count at least 20 times in the past 6 months that I have encountered a web app whose frontend literally did nothing when I clicked a button, and only upon opening up the browser's inspection tools did I see a backend API returning an actual error message. But an equal number of times, just "500 error" or similar. I want to see "the add-user api call failed because you do not have permissions to add users", or "the server that tried to process this request ran out of disk space", and I want to press a button that automatically composes an e-mail to the company with a bug report that includes all the details. Can you build that today with existing software design?

Sure, if you spend 120 hours building the distributed tracing and observability infrastructure (or pay Datadog a quarter million for it), and 80 more hours to train the devs how to use it. But we shouldn't need infrastructure. The software can carry relevant data to-and-fro; let it carry more than just "errno".

amelius 4 days ago

Why didn't they mention threads?

bartvk 4 days ago

Oh gosh, that's interesting. I bet that complicates using using errno. Or is errno somehow copied into a local variable?
- Vogtinator 4 days ago
  
  errno is in thread-local storage (TLS)
  
  wahern 10 hours ago
  
  Notably, the POSIX Threads API itself (i.e. pthread_ routines) returns errors directly rather than through errno.
  
  JdeBP 4 hours ago
  
  It was a common design of new operating system APIs, not encumbered by PDP-11 compatibility, in the 1980s.
  * https://tty0.social/@JdeBP/114816928464571239
  Even some of the later augmentations in MS/PC/DR-DOS did things like return an error code in AX and the result in (say) CX instead of using AX and CF.
  
  amelius 4 days ago
  
  Yes. It is too bad that they didn't use a similar solution for the current working directory. Chdir() is process-wide, not thread local :(
  
  o11c 10 hours ago
  
  `openat` has basically solved that since 2.6.16 (which came out in 2006). There are still some uncommon APIs have been slow to gain `at` variants but there's usually a workaround (for example, `getxattrat` and family were only added in 6.13 (this January), but can be implemented in terms of `openat` + `fgetxattr`)
  
  hvenev 6 hours ago
  
  > can be implemented in terms of `openat` + `fgetxattr`
  Except for symlinks. `fgetxattr` requires a file opened for read or write, but symlinks can only be opened as `O_PATH`.
  
  account42 4 hours ago
  
  This is good because the current directory conceptually process wide for the user as well. If your program isn't a shell or performs a similar function then you probably should not change the working directory at all.
  If you need thread-specific local paths just use one of the *at() variants that let you explicitly specify a directory handle that your path is relative to.