remram 17 hours ago

Rust has the "non_exhaustive" attribute that lets you declare that an enum might get more fields in the future. In practice that means that when you match on an enum value, you have to add a default case. It's like a "other" field in the enum except you can't reference it directly, you use a default case.

IIRC a secret 'other' field (or '__non_exhaustive' or something) is actually how we did thing before non_exhaustive was introduced.

  • kibwen 16 hours ago

    Note that the stance of the OP here is broadly in agreement with what Rust does. His main objection is this:

    > The word “other” means “not mentioned elsewhere”, so the presence of an Other logically implies that the enumeration is exhaustive.

    In Rust, because all enums are exhaustive by default and exhaustive matching is enforced by the compiler, there is no risk of this sort of confusion. And then the fact that his proposed solution is:

    > Just document that the enumeration is open-ended

    The non_exhaustive attribute is effectively compiler-enforced documentation; users now cannot forget to treat the enum as open-ended.

    Of course, adding non_exhaustive to Rust was not without its own detractors; it usage for any given enum fundamentally means shifting power away from library consumers (who lose the ability to guarantee exhaustive matching) and towards library authors (who gain the ability to evolve their API without causing guaranteed compilation errors in all of their users (which some users desire!)). As such, the guidance is that it should be used sparingly, mostly for things like error types. But that's an argument against open-ended enums in general, not against the mechanisms we use to achieve those (which, as you say, was already possible in Rust via hacks).

    • tyre 16 hours ago

      Maybe there should be a compiler option or function to assert that a match is exhaustive. If the match does not handle a defined case, it blows up.

      • aecsocket 16 hours ago

        Rust already asserts that a match is exhaustive at compile time - if you don't include a branch for each option, it will fail to compile. This extends to integer range matching and string matching as well.

        It's just that with #[non_exhaustive], you must specify a default branch (`_ => { .. }`), even if you've already explicitly matched on all the values. The idea being that you've written code which matches on all the values which exist right now, but the library author is free to add new variants without breaking your code - since it's now your responsibility as a user of the library to handle the default case.

        • ffminus 12 hours ago

          Library users can force a compile error when new variants get added, using a lint from rustc. It's "allow" by default, so it's opt-in.

          https://doc.rust-lang.org/rustc/lints/listing/allowed-by-def...

          • WiSaGaN 9 hours ago

            Does this require nightly? If so, #[warn(clippy::wildcard_enum_match_arm)] will do the samething but no need for nightly, and from clippy instead of rustc natively.

          • codetrotter 9 hours ago

            That's pretty neat. I still don't completely understand why #[non_exhaustive] is so desirable in the first place though.

            Let's say I am using a crate called zoo-bar. Let's say this crate is not using non-exhaustive.

            In my code where I use this crate I do:

              let my_workplace = zoo_bar::ZooBar::new();
              
              let mut animal_pens_iter = my_workplace.hungry_animals.iter();
              
              while let Some(ap) = animal_pens_iter.next() {
                  match ap {
                      zoo_bar::AnimalPen::Tigers => {
                          me.go_feed_tigers(&mut raw_meat_that_tigers_like_stock).await?;
                      }
                      zoo_bar::AnimalPen::Elephants => {
                          me.go_feed_elephants(&mut peanut_stock).await?;
                      }
                  }
              }
            
            I update or upgrade the zoo-bar dependency and there's a new enum variant of AnimalPens called Monkeys.

            Great! I get a compile error and I update my code to feed the monkeys.

              diff --git a/src/main.rs b/src/main.rs
              index 202c10c..425d649 100644
              --- a/src/main.rs
              +++ b/src/main.rs
              @@ -10,5 +10,8 @@
                         zoo_bar::AnimalPen::Elephants => {
                             me.go_feed_elephants(&mut peanut_stock).await?;
                         }
              +          zoo_bar::AnimalPen::Monkeys => {
              +              me.go_feed_monkeys(&mut banana_stock).await?;
              +          }
                     }
                 }
            
            
            Now let's say instead that the AnimalPen enum was marked non-exhaustive.

            So I'm forced to have a default match arm. In this alternate universe I start off with:

              let my_workplace = zoo_bar::ZooBar::new();
            
              let mut animal_pens_iter = my_workplace.hungry_animals.iter();
            
              while let Some(ap) = animal_pens_iter.next() {
                match ap {
                  zoo_bar::AnimalPen::Tigers => {
                    me.go_feed_tigers(&mut raw_meat_that_tigers_like_stock).await?;
                  }
                  zoo_bar::AnimalPen::Elephants => {
                    me.go_feed_elephants(&mut peanut_stock).await?;
                  }
                  _ => {
                    eprintln!("Whoops! I sure hope someone notices this default match in the logs and goes and updates the code.");
                  }
                }
              }
            
            When the monkeys are added, and I update or upgrade the dependency on zoo-bar, I don't notice the warning in the logs right away after we deploy to prod. Because the logs contain too many things no one can go and read everything.

            One week passes and then we have a monkey starving incident at work.

            After careful review we realize that it was due to the default match arm and we forgot to update our program.

            So we learn from the terrible catastrophe with the monkeys and I update my code using the attributes from your link.

              diff --git a/src/main.rs b/src/main.rs
              index e01fcd1..aab0112 100644
              --- a/wp/src/main.rs
              +++ b/wp/src/main.rs
              @@ -1,3 +1,5 @@
              +#![feature(non_exhaustive_omitted_patterns_lint)]
              +
               use std::error::Error;
               
               #[tokio::main]
              @@ -11,6 +13,7 @@ async fn main() -> anyhow::Result<()> {
                 let mut animal_pens_iter = my_workplace.hungry_animals.iter();
               
                 while let Some(ap) = animal_pens_iter.next() {
              +    #[warn(non_exhaustive_omitted_patterns)]
                   match ap {
                     zoo_bar::AnimalPen::Tigers => {
                       me.go_feed_tigers(&mut raw_meat_that_tigers_like_stock).await?;
              @@ -18,8 +21,12 @@ async fn main() -> anyhow::Result<()> {
                     zoo_bar::AnimalPen::Elephants => {
                       me.go_feed_elephants(&mut peanut_stock).await?;
                     }
              +      zoo_bar::AnimalPen::Monkeys => {
              +        // Our monkeys died before we started using proper attributes. If they are hungry it means they have turned into zombies :O
              +        me.alert_authorities_about_potential_outbreak_of_zombie_monkeys().await?;
              +      }
                     _ => {
              -        eprintln!("Whoops! I sure hope someone notices this default match in the logs and goes and updates the code.");
              +        unreachable!("We have an attribute that is supposed to tell us if there were any unmatched new variants.");
                     }
                   }
                 }
            
            And next time we update or upgrade the crate version to latest, another new variant exists, but thanks to your tip we get a lint warning and we happily update our code so that we won't have more starving animals.

              diff --git a/wp/src/main.rs b/wp/src/main.rs
              index aab0112..4fc4041 100644
              --- a/wp/src/main.rs
              +++ b/wp/src/main.rs
              @@ -25,6 +25,9 @@ async fn main() -> anyhow::Result<()> {
                       // Our monkeys died before we started using proper attributes. If they are hungry it means they have turned into zombies :O
                       me.alert_authorities_about_potential_outbreak_of_zombie_monkeys().await?;
                     }
              +      zoo_bar::AnimalPen::Capybaras => {
              +        me.go_feed_capybaras(&mut whatever_the_heck_capybaras_eat_stock).await?;
              +      }
                     _ => {
                       unreachable!("We have an attribute that is supposed to tell us if there were any unmatched new variants.");
                     }
            
            But what was the advantage of marking the enum as #[non_exhaustive] in the first place?
            • kelnos 7 hours ago

              Consider a bit of a different case. I run a service that exposes an API, and some fields in some response bodies are enums. I've published a Rust client for the API for my customers to do, and (among other things) it has something like this:

                  #[derive(serde::Serialize, serde::Deserialize)]
                  pub struct SomeEnum {
                      AValue,
                      BValue,
                  }
              
              My customers use that and all is well. But I want to add a new enum value, CValue. I can't require that all my customers update their version of my Rust client before I add it; that would be unreasonable.

              So I add it, and what happens? Well, now whenever my customers make that API call, instead of getting some API object back, they get a deserialization error, because that enum's Deserialize impl doesn't know how to handle "CValue". Maybe some customer wasn't even using that field in the returned API object, but now I've broken their code.

              Adding #[non_exhaustive] means I at least won't break my customers' code when I add a new enum value.

            • ffminus 8 hours ago

              It lets you have a middle ground, with the decision of when breaking happens left up to library users. Without non_exhaustive, all consumers always get your second scenario. With non_exhaustive, individual zoos get to pick their own policy of when/if animals should starve.

              Each option has its place, it depends on context. Does the creator of the type want/need strictness from all their consumers, or can this call be left up to each consumer to make? The lint puts strictness back on the table as an opt-in for individual users.

        • tialaramex 10 hours ago

          Importantly #[non_exhaustive] applies to your users but not you. In the defining crate we can write exhaustive matches and those work - the rationale is that we defined this type, so we should know how to do this properly. Our users however must assume they don't know if it has been extended in a newer version.

          #[non_exhaustive] is most popular for the variants of an enumeration but is permissible for published structure types (it means we promise these published fields will exist but maybe we will add more and thus change the size of the structure overall) and for the variants of a sum type (it means the inner details of that variant may change, you can pattern match it but we might add more fields and your matches must cope)

          • 3836293648 8 hours ago

            Wait what. I thought it existed for FFI purposes, regardless of if that's with C or network protocols. The defining crate getting away with it undermines this.

            • tialaramex an hour ago

              No. If you mean "I don't know" that's not non_exhaustive that's "I don't know".

              For a network protocol or C FFI you probably want a primitive integer type not any of Rust's fancier types such as enum, because while you might believe this byte should have one of six values, 0x01 through 0x06 maybe somebody decided the top bit is a flag now, so 0x83 is "the same" as 0x03 but with a flag set.

              Trying to unsafely transmute things from arbitrary blobs of data to a Rust type is likely to end in tears, this attribute does not fix that.

  • kpcyrd an hour ago

    It's still a gotcha in Rust, I've seen code like:

      #[non_exhaustive]
      pub enum Protocol {
        Tcp,
        Udp,
        Other(u16),
      }
    
    It allows you to still match on the unrecognized case (like `Protocol::Other(1)`, which is nice), but an additional enum variant may eliminate that case, if our enum gets extended to:

      #[non_exhaustive]
      pub enum Protocol {
        Tcp,
        Udp,
        Icmp,
        Other(u16),
      }
    
    Even though we can add additional variants in a semver-nonbreaking way due to `#[non_exhaustive]`, other people's code may now be broken until they've changed `Protocol::Other(1)` to `Protocol::Icmp`.

    Having had this in the back of my head for quite some time, I think instead of an `Other` case there should be two methods, one returns an `Option<Protocol>` and the other one returns the `u16` representation. Unless there's a match on one of your expected cases your default branch would inspect the raw numeric type, which would keep working even if that case is added to the enum.

  • sunshowers 16 hours ago

    There is currently a missing middle ground in stable Rust, which is to lint on a missing variant rather than fail compilation. There's an unstable option for it, but it would be very useful for non-exhaustive enums where consumers care about matching against every known variant.

    You can practically use it today by gating on a nightly-only cfg flag. See https://github.com/guppy-rs/guppy/blob/fa61210b67bea233de52c... and https://github.com/guppy-rs/guppy/blob/fa61210b67bea233de52c...

    • eru 13 hours ago

      Couldn't clippy do that for you?

      • sunshowers 13 hours ago

        Not at the moment. The unstable lint is implemented in rustc directly, not in clippy, though I guess it could move to clippy in the future.

  • rendaw 10 hours ago

    I absolutely _hate_ this. Since you're forced to add a default case, if a new field is added in the future that you need to actively handle it won't turn into a compile error _or_ surface as a runtime error.

    I think half of it is developers presuming to know users' needs and making decisions for them (users can make that decision by themselves, using the default case!) but also a logic-defying fear of build breakage, to the point that I've seen developers turn other compile errors into runtime errors in order to avoid "breaking changes".

    • bobbylarrybobby 9 hours ago

      I agree, this is the one place where upstream crates should be allowed to make breaking changes for downstream users. As a consumer of another crate’s enum, it's easy to enough opt into “never break my code” by just adding default cases, but I'd like to have to opt into that so that I'm notified when new variants are added upstream. Maybe this should even be a Cargo.toml setting — when an upstream crate is marked non-exhaustive, the downstream consumer gets to choose: require me to add default cases (and don't mark them as dead code), or let me exhaustively match anyway, knowing my match statement might break in the future.

    • michaeljsmith 10 hours ago

      Not sure about Rust, but Typescript allows you to have the default handling but still flag a compile error if a new field is added (the first is useful e.g. if a separate component is updated and starts sending new values).

      https://stackoverflow.com/a/39419171/974188

  • hchja 16 hours ago

    This is why language syntax is so important.

    Swift allows a ‘default’ enum case which is similar to other but you should use it with caution.

    It’s better to not use it unless you’re 110% sure that there will not be additional enums added in the future.

    Otherwise, in Swift when you add an additional enum case, the code where you use the enum will not work unless you handle each enum occurrence at it’s respective call site.

    • layer8 15 hours ago

      The better solution is to have two different “default” cases in the language, one that expresses handling “future” values (values that aren’t currently defined), and one that expresses “the rest of the currently defined values”. The “future” case wouldn’t be considered for exhaustiveness checks.

      • mayoff 14 hours ago

        Swift allows an enum to be marked `@frozen`, which is an API (and ABI) stability guarantee that the enum will never gain more cases. Apple uses this quite sparingly in their APIs.

        Swift also has two versions of a `default` case in switch statements, like you described. It has regular `default` and it has `@unknown default`. The `@unknown default` case is specifically for use with non-frozen enums, and gives a warning if you haven't handled all known cases.

        So with `@unknown default`, the compiler tells you if you haven't been exhaustive (vs. the current API), but doesn't complain that your `@unknown default` case is unreachable.

        • layer8 14 hours ago

          Ah, thanks, I wasn’t aware of these two “default” variants in Swift.

      • SkiFire13 15 hours ago

        What would the "future" default case actually do though? When you're in the past there's no value for it, and the moment you get to the future the values will become part of the "present" and will still not fall under the "future" case. You would need some kind of versioning support in the enum itself, but that's a much bigger change.

        • layer8 14 hours ago

          “Future” values only become defined (“present” in your sense) at compile-time, but may occur before that at runtime. Note that this mostly presumes a language with separate compilation, or situations like coding against a remote-API spec, where the server may deploy a newer version but your client remains unchanged. Once you compile against the new spec, you’d get errors/warnings about the new, not explicitly handled values, but your existing binary would nevertheless handle those values under the “future” case.

          The issue with traditional “default” cases is that they shadow warnings/errors about unhandled cases, but you’d still want to have some form of default case for forward compatibility.

          • eru 13 hours ago

            > “Future” values only become defined (“present” in your sense) at compile-time, but may occur before that at runtime. Note that this mostly presumes a language with separate compilation, [...]

            Separate compilation is a technical implementation detail that shouldn't have an impact on semantics. Especially since LTO (link time optimisation) is becoming more and more common; 'thin' LTO is essentially free in Rust at least in terms of extra build time. LTO blurs the lines between separate compilation units.

            On the flip side, Rust can use multiple codegen units even for the same crate, thus introducing separate compilation where a naive approach, like in classic C, would only use a single one.

            • layer8 13 hours ago

              Separate compilation is relevant, because it means the version of the interface you compile against may not be the same version you run against. This is fine if the newer version is compatible with the older version. And for the present discussion, we consider an added enum value to not constitute a compatibility break. Nevertheless, it means that the client code can now receive a value that it couldn’t receive before. And it’s useful to be able to define a case distinction for such unknown future values, while at the same time having the compiler check that all currently defined values have been duly considered.

              In other words, you want to ensure that you have the most appropriate behavior for whatever values are currently known, and a fallback behavior for the future values that by definition you can’t possibly know at the present time. Of course, this is more or less only practical in languages where the interface version you compile against is only updated deliberately, while the implementation version at runtime can be any newer compatible version.

kstenerud 5 hours ago

I use the "other" technique when it's necessary for the user to be able to mix in their own:

    enum WidgetFlavor
    {
        Vanilla,
        Chocolate,
        Strawberry,
        Other=10000,
    };
Now users can add their own (and are also responsible for making sure it works in all APIs):

    enum CustomWidgetFlavor
    {
        RockyRoad=Other,
        GroovyGrape,
        Cola,
    };
And now you can amend the enum without breaking the client:

    enum WidgetFlavor
    {
        Vanilla,
        Chocolate,
        Strawberry,
        Mint,
        Other=10000,
    };
  • qingcharles 4 hours ago

    It's code like this that ends in a terminal choco-banana shake hang:

    http://www.technofileonline.com/texts/chocobanana.gif

    • fingerlocks 4 hours ago

      What is the context here? Is this just a silly nonsense tech support page/meme or an actual product from the late 90s?

      • sandblast 3 hours ago

        Is the context not clear for you from the information that the article applies to "DreamWorks Interactive, Someone's in the Kitchen, version 1.0" and other clues that it's a game?

qbane 2 hours ago

How about putting Other at the top? You can convince yourself that the value zero (or one if you like) is reserved for unknown values.

  • Cthulhu_ 20 minutes ago

    That's the Go approach, where every value is zeroed so it makes sense for enum values to have a 'none' or 'other' or 'unknown' value as the first value.

    (note that Go doesn't have enums as a language feature, but you can use its const declaration to create enum-like constants)

  • shakna an hour ago

    This is what I tend to do. Because 0 is "default", it means "unspecified" in a lot of my API designs.

zdw 17 hours ago

I wonder how this aligns with the protobuf best practice of having the first value be UNSPECIFIED:

https://protobuf.dev/best-practices/dos-donts/#unspecified-e...

  • bocahtie 17 hours ago

    When the deserializing half of the protobuf definitions encounter an unknown value, it gets deserialized as the zero value. When that client updates, it will then be able to deserialize the new value appropriately (in this case, "Mint"). The advice on that page also specifies to not make the value semantically meaningful, which I take to mean to never set it to that value explicitly.

    • chen_dev 14 hours ago

      > it gets deserialized as the zero value

      It’s more complicated:

      https://protobuf.dev/programming-guides/enum/

      >> What happens when a program parses binary data that contains field 1 with the value 2?

      >- Open enums will parse the value 2 and store it directly in the field. Accessor will report the field as being set and will return something that represents 2.

      >- Closed enums will parse the value 2 and store it in the message’s unknown field set. Accessors will report the field as being unset and will return the enum’s default value.

      • vitus 9 hours ago

        Ugh. I hate how we (Google) launched proto editions.

        It used to be that we broadly had two sets of semantics (modulo additional customizations): proto2 and proto3. Proto editions was supposed to unify the two versions, but instead now we have the option to mix and match all of the quirks of each of the versions.

        And, to make matters worse, you also have language-dependent implementations that don't conform to the spec (in fact, very few implementations are conformant). C++ and Java treat everything imported by a proto2 file as closed; C#, Golang, and JS treat everything as open.

        I don't see a path forward for removing these custom deprecated field features, or else we'd have already begun that effort during the initial adoption of editions.

    • dwattttt 13 hours ago

      > The advice on that page also specifies to not make the value semantically meaningful, which I take to mean to never set it to that value explicitly.

      I've taken to coding my C enums with the first value being "Invalid", indicating it is never intended to be created. If one is encountered, it's a bug.

  • jmole 17 hours ago

    The example code used added “other” as the last option, which was the source of the problems he described.

    This doesn’t happen when you make the first value in the enum unknown/unspecified

    • plorkyeran 17 hours ago

      No, the problem described in the article is entirely unrelated to where in the enum the Other option is located. There is a different problem where keeping the Other option at the end of the enum changes the value of Other, but that is not the problem that the article is about.

      • jmole 16 hours ago

        Well it simplifies the logic considerably - if you see an enum value you don’t recognize (mint), you treat it as uninitialized (0).

        So any future new flavor will be read back as ‘0’ in older versions.

  • seeknotfind 17 hours ago

    This is the same as a null pointer, and the requirement is very deeply tied to protobuf as it is used on large distributed systems that always need to handle version mismatch, and this advice doesn't necessarily apply to API design in general.

    • eddd-ddde 17 hours ago

      Even in the simplest web apps you can encounter version mismatch when a client requests a response from a server that just updated.

      • seeknotfind 15 hours ago

        This implies an API where the server has a single shared implementation. Imagine for instance that the server implements a shim for each version of the interface, then there isn't a need for the null in the API. Imagine another alternative, that the same API never adds a field, but you add a new method which takes the new type. Imagine yet again an API where you are able to version the clients in lockstep. So, it's a decision about how the API is used and evolves that recommends the API encoding or having a null default. However in a different environment or with different practices, you can avoid the null. Of course the reason to avoid the null is so that you can statically enforce this value is provided in new clients, though this also assumes your client language is typed. So in the end, protobuf teaches us, but it's not always the best in every situation.

      • hansvm 16 hours ago

        Hence the advice to make that situation not happen. Update the client and server to support both versions and prefer the new one, then update both to not support the old version. With load balancers and other real-world problems you might have to break that down into 4 coordinated steps.

        • Joker_vD 15 hours ago

          That only really works if you control the clients, or can force them to update.

          • LoganDark 14 hours ago

            > or can force them to update.

            I've used a few clients that completely lock me out for every tiniest minor version update. Very top-tier annoying imho.

            • eru 13 hours ago

              But it does make the authors' jobs easier.

  • MarkMarine 7 hours ago

    I don’t mind the zero value for the proto enums, makes sense, but I require converting to my inner logic to not include this “unknown” and error during the conversion if it fails.

    I’ve seen engineers bring those unknowns or unspecified through to the business logic and that always made my face flush red with anger.

    • fmbb 6 hours ago

      Why the anger?

      If you are consuming data from some other system you have no power over what to require from users. You will have data points with unknown properties.

      Say you are tracking sign ups in some other system, and they collect the users’ browser in the process, and you want to see conversion rate per browser. If the browser could not be identified, you prefer it to say ”other” instead of ”unknown”?

      I think I prefer the protobuf best practices way: you have a 0 ”unknown”/”unset” value, and you enumerate the rest with a unique name (and number). The enum can be expanded in the future so your code must be prepared for unknown enumerated values tagged with the new (future for your code) number. They are all unique, you just don’t yet know the name of some of the enum values.

      You can choose to not consume them until your code is updated with a more recent schema. Or you can reconcile later, annotating with the name of you need it.

      Now personally, I would not pick an enum for any set och things that is not closed when you are designing. But I’m starting to think that such sets hardly exist in the real world. Humans redefine everything over time.

    • crabbone 23 minutes ago

      I wrote my own Protobuf implementation (well, with some changes). Ditching the default values was one of the changes I made. I don't see any reason to have that. But I don't think that Protobuf is a reasonable or even decent protocol in general. It has a lot of nonsense and bad planning. Having default values is probably not in the ten worst things about Protobuf.

  • beart 17 hours ago

    "Unspecified" is semantically different from "other". The former is more like a default value whereas the latter is actually "specified, but not one of these listed options".

    • hamandcheese 16 hours ago

      Standard practice in protobuf is to never assign semantic meaning to the default value. I think some linters enforce that enum 0 is named "unknown" which is actually more semantically correct than "other" or "unspecified".

NoboruWataya 17 hours ago

> Just document that the enumeration is open-ended, and programs should treat any unrecognized values as if they were “Other”.

Possibly just showing my lack of knowledge here but are open-ended enumerations a common thing? I always thought the whole point of an enum is that it is closed-ended?

  • sd9 17 hours ago

    I’ve worked on systems which where the set of enum values was fixed at any particular point in time, but could change over time as business requirements changed.

    For instance, we had an enum that represented a sport that we supported. Initially we supported some sports (say FOOTBALL and ICE_HOCKEY), and over time we added support for other sports, so the enum had to be expanded.

    Unfortunately this always required the entire estate to be redeployed. Thankfully this didn’t happen often.

    At great expense, we eventually converted this and other enums to “open-ended” enums (essentially Strings with a bit more structure around them, so that you could operate on them as if they were “real” enums). This made upgrades significantly easier.

    Now, whether those things should have been enums in the first place is open for debate. But that decision had been made long before I joined the team.

    Another example is gender. Initially an enum might represent MALE, FEMALE, UNKNOWN. But over time you might decide you have need for other values: PREFER_NOT_TO_SAY, OTHER, etc.

  • hansvm 16 hours ago

    It's common when mixing many executables over time.

    I prefer to interpret those as an optional/nullable _closed_ enum (or, situationally, a parse error) if I have to switch on them and let ordinary language conventions guide my code rather than having to understand some sort of pseudo-null without language support.

    In something like A/B tests it's not uncommon to have something that's effectively runtime reflection on enum fields too. Your code has one or more enums of experiments you support. The UI for scaling up and down is aware of all of those. Those two executables have to be kept in sync somehow. A common solution is for the UI to treat everything as strings with weights attached and for the parsers/serializers in your application code to handle that via some scheme or another (usually handling it poorly when people scale up experiments that no longer exist in your code). The UI though is definitely open-ended as it interprets that enum data, and the only question is how it's represented internally.

  • furyofantares 12 hours ago

    This is not really the case mentioned (not API design), but I somewhat often have an enum that is likely to be added to, but rarely (lots of code will have been written in the meantime) and I would like to update all the sites using it, or at least review them. Typically it looks something like this:

        enum WidgetFlavor
        {
            Vanilla,
            Chocolate,
            Strawberry,
        
            NumWidgetFlavors
        };
    
    And then wherever I have switch(widgetFlavor), include static_assert(NumWidgetFlavors==4). A bit jealous of rust's exhaustive enums/matches.
  • int_19h 16 hours ago

    Both are valid depending on what you're modelling.

    As far as programming languages go, all enums are explicitly open-ended in C, C++, and C#, at least, because casting an integer (of the underlying type) to enum is a valid operation.

    • jay_kyburz 16 hours ago

      My pet hate is when folks start doing math on enums or assuming ranges of values within an enum have meaning.

      • DonHopkins 13 hours ago

        Like pesky Hex<=>Decimal conversion with the gap between the numbers and the letters, and upper/lower case letters too.

    • eru 13 hours ago

      Yeah, C, C++ (and C#) aren't very good at modelling data structures.

  • fweimer 15 hours ago

    Enumerations are open-ended in C and C++. They are just integer types with some extra support for defining constants (although later C++ versions give more control over the available operations).

  • gauge_field 14 hours ago

    Sometimes, one case where I made use of this is enumeration of uarch for different hardware to read from the host machine. The update for for new uarch type is closed ended until there is new cpu with new uarch, which is long time. So, for a very long time it is open-ended with very low velocity in change. It is ideal for enums (for a very long time), but you still need to support the change in list of enum variants to not break semver.

  • XorNot 16 hours ago

    The first time you have to add a new schema value, you'll realise you needed "unknown" or similar - because during an upgrade your old systems need a way to deal with new values (or during a rollback you need to handle new entries in the database).

    • sitkack 14 hours ago

      Your comment is the only in the entire discussion that mentions "schema". Having an "other" in a schema is a way to ensure you can run n and n+1 versions at the same time.

      It is Data Model design, of which API design a subset.

      You can only ever avoid having an other if 1) your schema is fixed and 2) if it is total over the universe of values.

  • tbrownaw 16 hours ago

    Does a foreign key count as an enum type?

oytis 2 hours ago

Worth noting that in C and C++ enum-typed variable holding a value not in the enum is a UB. Had some funny bugs because of that.

jffhn 5 hours ago

>"programs should treat any unrecognized values as if they were “Other”"

Having such an "Other" value does not prevent from considering that the enum is open-ended, and it simplifies a lot all the code that has to deal with potentially invalid or unknown values (no need for a validity flag or null).

That's probably why in DIS (Distributed Interactive Simulation) standard, which defines many enums, all start with OTHER, which has the value zero.

In STANAGs (NATO standards), the value zero is used for NO_STATEMENT, which can also be used when the actual value is in the enum but you can't or don't need to indicate it.

I remember an "architecture astronaut" who claimed that NO_STATEMENT was not a domain value, and removed it from all the enums in its application. That did not last long.

That also reminds me of Philippe Khan (Bordland) having in some presentation the ellipse extend the circle, to add a radius. A scientist said he would do the other way around, and Khan replied: "This is exactly the difference between research and industry".

  • ivan_gammel 4 hours ago

    >That also reminds me of Philippe Khan (Bordland) having in some presentation the ellipse extend the circle, to add a radius. A scientist said he would do the other way around, and Khan replied: "This is exactly the difference between research and industry".

    My favorite question on interviews on the OOP topic. It can be correct either way or both can be wrong, so the good answer would be "It depends". When developers rush to give a specific answer, they do not demonstrate due attention to the domain and it may mean that they will assume thousand other falsehoods from those articles on Github.

dataflow 13 hours ago

I think there are multiple concerns here, and they need to be analyzed separately -- they don't converge to the same solution:

- Naming: "Other" should probably be called "Unrecognized" in these situations. Then users understand that members may not be mutually exclusive.

- ABI: If you need ABI compatibility, the constraint you have is "don't change the meanings of values or members", which is somewhat stronger. The practical implication is that if you do need to have an Other value, its value should be something out of range of possible future values.

- Protocol updates: If you can atomically update all the places where the enum is used, then there's no inherent need to avoid Other values. Instead, you can use compile-time techniques (exhaustive switch statements, compiler warnings, temporarily removing the Other member, grep, clang-query, etc.) to find and update the usage sites at compile time. This requires being a little disciplined in how you use the enum during development, but it's doable.

- Distributed code: If you don't have control over all the code using your enum might, then you must avoid an Other value, unless you can somehow ensure out-of-band that users have updated their code.

jasonkester 6 hours ago

This got me wondering what I actually do in practice. I think it's this:

  const KnownFlavors {
    Vanilla: "Vanilla",
    Chocolate: "Chocolate",
    Strawberry: "Strawberry"
  }
Then, use a string to hold the actual value.

  doug.favoriteFlavor = KnownFlavors.Chocolate;
  cindy.favoriteFlavor = "Mint"

  case: KnownFlavors.Chocolate:
Expand your list of known flavors whenever you like, your system will still always hold valid data. You get all the benefits of typo-proofing your code, switching on an enum, etc., without having to pile on any wackiness to fool your compiler or keep the data normalized.

It acknowledges the reality that a non-exhaustive enum isn’t really an enum. It’s just a list of things that people might type into that field.

  • Boldened15 5 hours ago

    Sorry I don't get the example, are both code blocks meant to be client-side code?

    > It acknowledges the reality that a non-exhaustive enum isn’t really an enum. It’s just a list of things that people might type into that field.

    I would say the opposite, the kinds of enums that map a case to a few hardcoded branches (SUCCESS, NETWORK_ERROR, API_ERROR) are often an approximation of algebraic data types which Rust implements as enums [0] but not most languages or data formats. Since often using those will require something like a `nullthrows($response->getNetworkError())` once you've matched the enum case.

    The kind of enum that's just a string whitelist, like flavors or colors, which you can freely pass around and store, likely converting it into a human-readable string or RGB values in one or two utils, is the classic kind of enum to me.

    [0] https://doc.rust-lang.org/std/keyword.enum.html

  • PartiallyTyped 4 hours ago

    The way we do this with AWS SDK in rust is by leveraging #non_exhaustive, and matching the (_@other) pattern, this is forward compatible and allows us to do something like ( _@other) if other.name() == “foo” for known cases without upgrading down the road or if the user uses an older version than our API.

layer8 15 hours ago

Slight counterpoint: Unless there is some guarantee that the respective enum type will never ever be extended with a new value, each and every case distinction on an enum value needs to consider the case of receiving an unexpected value (like Mint in the example). When case distinctions do adhere to that principle, then the problem described doesn’t arise.

On the other hand, if the above principle is adhered to as it should, then there is also little benefit in having an Other value. One minor conceivable benefit is that intermediate code can map unsupported values to Other in order to simplify logic in lower-level code. But I agree that it’s usually better to not have it.

A somewhat related topic that comes to mind is error codes. There is a common pattern, used for example by the HTTP status codes, where error codes are organized into categories by using different prefixes. For example in a five-digit error code scheme, the first three digits might indicate the category (e.g. 123 for “authentication errors”), and the remaining two digits represent a more specific error condition in that category. In that setup, the all-zeros code in each category represents a generic error for that category (i.e. 12300 would be “generic authentication error”).

When implementing code that detects a new error situation not covered by the existing specific error codes, the implementer has now the choice of either introducing a new error code (e.g. 12366 — this is analogous to adding a new enum value), which has to be documented and maybe its message text be localized, or else using the generic error code of the appropriate category.

In any case, when error-processing code receives an unknown — maybe newly assigned — error code, they can still map it according to the category. For example, if the above 12366 is unknown, it can be handled like 12300 (e.g. for the purpose of mapping it to a corresponding error message). This is quite similar to the case of having an Other enum value, but with a better justification.

KPGv2 11 hours ago

> Rust has the "non_exhaustive" attribute that lets you declare that an enum might get more fields in the future.a

Is there a reason, aside from documentation, that this is ever desirable? I rarely program in Rust, but why would this ever be useful in practice, outside of documentation? (Seems like code-as-documentation gone awry when your code is doing nothing but making a statement about future code possibilities)

  • LegionMammal978 11 hours ago

    Normally, when you match on the value of an enum, Rust forces you to either add a case for every possible variant, or add a default arm "_ => ..." that acts as a 'none of the above' case. This is called exhaustiveness checking [0].

    When you add #[non_exhaustive] to an enum, the compiler says to external users, "You're no longer allowed to just match every existing variant. You must always have a default 'none of the above' case when you're matching on this enum."

    This lets you add more variants in the future without breaking the API for existing users, since they all have a 'none of the above' case for the new variants to fall into.

    [0] https://doc.rust-lang.org/book/ch06-02-match.html#matches-ar...

  • jeroenhd 10 hours ago

    If your library processes data from another language, you'll probably need to deal with the possibility that the library returns open ended enums.

    I believe I've also seen this declaration for generated bindings for a JSON API that promises backwards compatibility for calls and basic functionality at least. Future versions may include more options, but the code will still compile fine against the older API.

    I don't think it's a great tool to use everywhere, but there are edge cases where Rust's demand for exhaustive matches conflicts with the non-Rust world, and that's where stuff like this becomes hard to avoid.

esafak 16 hours ago

Just add a free-form text field to hold the other value, and revise your enum as necessary, while migrating the data.

  • AceJohnny2 16 hours ago

    I can't even tell if you're trolling.

akamoonknight 15 hours ago

One of the tactics I end up using in Verilog, for better or worse, is to define enums with a'0 value (repeat 0s for the size of the variable), and '1 value (repeat 1s for the size of the value)

'0 stays as "null"-like (e.g INVALID), and '1 (which would be 0xFF in an 8 bit byte for instance) becomes "something, but I'm not sure what" (e.g. UNKNOWN).

Definitely has the same issues as referenced when needing to grow the variable, and the times where it's useful aren't super common, but I do feel like the general concept of an unknown-but-not-invalid value can help with tracking down errors in processing chains Definitely do run into the need to "beware" though with enums for sure.

coin 17 hours ago

Just call it "unknown" or "unspecified" or better yet use an optional to hold the enum.

  • 101011 16 hours ago

    This ended up being the preferred pattern we moved into.

    If, like us, you were passing the object between two applications, the owning API would serialize the enum value as a String value, then we had a client helper method that would parse the string value into an Optional enum value.

    If the original service started transferring a new String object between services, it wouldn't break any downstream clients, because the clients would just end up with Optional empty

    • janci 16 hours ago

      How that works when you need to distinguish between "no value provided" and "a value that is not in the list" - in some applications they have different semantics.

o11c 17 hours ago

The approach in the link is fine for consumers, but for producers you really do need some way of saying "create a value that's not one of the known values". Still, there's nothing that says this needs to be pretty.

moomin 4 hours ago

Also Microsoft: your enum should have an explicit Unknown entry with value 0.

sgondala_ycapp 14 hours ago

Random tidbit: We use LLM to identify document types and use an enum to show a list of options.

Initially, we didn’t include an "Other" category - which led the LLM to force-fit documents into existing types even when they didn’t belong. Obv this wasn't LLM's fault.

We realized the mistake and added "Other". This significantly improved output accuracy!

bob1029 15 hours ago

Making things into enums that shouldn't be enums is a fun trap to fall into. Much of the time what you really want is a complex type so that you can communicate these additional facts. In this case I'd do something like:

  class Widget 
  { 
    WidgetFlavor Flavor; //Undefined, Vanilla, Chocolate, Strawberry
    string? OtherFlavor;
  }
This is easy to work from a consumer standpoint because if you have a deviant flavor to specify, you don't bother setting the Flavor member to anything at all. You just set OtherFlavor. Fewer moving pieces == less chance for bad times.

The first (default) member in an enum should generally be something approximating "Undefined". This also makes working with serializers and databases easier.

  • IshKebab 14 hours ago

    This is not a good design. You've introduced representable invalid states (Flavor=Vanilla, Other flavor="DarkChocolate").

    At the least you want this...

      enum Flavor {
        Chocolate,
        Banana,
        Strawberry,
        Other(String),
      }
    
    But that's not right either. What you really want is

      #[non_exhaustive]
      enum Flavor {
        Chocolate,
        Banana,
        Strawberry,
      }
    
      impl ToString for Flavor ...
  • msy 14 hours ago

    Until you get a

      Widget.OtherFlavor = 'Vanilla'
  • ryanschaefer 14 hours ago

    > Fewer moving pieces == less chance for bad times.

    Is this not a case for explicitly specifying all flavors? Other flavor has essentially introduced infinite moving pieces.

sylware 2 hours ago

As another example: vulkan3D made the mistake to use enum in its API.

Now, they must be sure it is a signed 32bits on 32 or 64 bits systems, namely check the compiler behavior. You can check the code, they always add a 0x7fffffff as the last enum value to "force" the compiler and tell developers (which have enough experience) "hey, this is a signed 32bits"... whoopsie!

We should eat the bullet: remove the enum in vulkan3D, and use the appropriate primitive type for each platform ABI (not API...), so the "fix" should be transparent as it would no break the ABI. But all the "code generators" using khronos xml specifications and static source code are to be modified in one shot to stay consistent. This ain't small feat.

[NOTE: enum is one of those things which should be removed from the "legacy profile" of C (like tons of keywords, integer promotion, implicit cast, etc).]

hello12343214 11 hours ago

Good idea. I appreciate that he thought through future compatibility with old versions.

vadim_phystech 15 hours ago

...since the set of all possible behaviour, that is not specified, it much greater, and densier, than one would initially feel and assume, one might cause lot's of possible bad outcomes and success-breaking-points if use "Other" type in their API. Because "Other" if the 1st thing to look for vulnerabilities, for attack vectors. Because the spirit of UB the Terrible lurks there! The spirit of UB feeds upon thee juices of "Other" omnimorphic (fel) type! скверный бесформенный "ЛЮБОЙ" тип! разврат и дисгармоничность! разложение и редуцирующие гетероморфизмы! decomposition, descriptive semantic matrix rank reduction, richness degradation, devolution...empoorness...scarcity pressure increase...

</shutting_the_fuck_up_my_wetware_machine_whispering_kek>

_3u10 15 hours ago

I usually use Unknown / Other as 0.

1oooqooq 16 hours ago

jr: add other option

sr: omit other option

illuminated: add other option in front end only and alert when the backend crashes.

DonHopkins 14 hours ago

https://en.wikipedia.org/wiki/Tony_Hoare#Research_and_career

>Speaking at a software conference in 2009, Tony Hoare apologized for inventing the null reference, his "Billion Dollar Mistake":

>"I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years." -Tony Hoare

Anders Hejlsberg brilliantly points out how JavaScript doubled the cost of that mistake:

>"My favorite is always the Billion-Dollar Mistake of having null in the language. And since JavaScript has both null and undefined, it's the Two-Billion-Dollar Mistake." -Anders Hejlsberg

>"It is by far the most problematic part of language design. And it's a single value that -- ha ha ha ha -- that if only that wasn't there, imagine all the problems we wouldn't have, right? If type systems were designed that way. And some type systems are, and some type systems are getting there, but boy, trying to retrofit that on top of a type system that has null in the first place is quite an undertaking." -Anders Hejlsberg

The JavaScript Equality Table shows how Brendan Eich simply doesn't understand equality for either data types or human beings and their right to freely choose who they love and marry:

https://dorey.github.io/JavaScript-Equality-Table/

Do any languages implement the full Rumsfeld Awareness–Understanding Matrix Agnoiology, quadrupling the cost?

Why stop at null, when you can have both null and undefined? Throw in unknown, and you've got a hat trick, a holy trinity of nihilistic ignorance, nothingness, and void! The Rumsfeld Awareness–Understanding Matrix Agnoiology breaks knowledge down into known knows, plus the three different types of unknowns:

https://en.wikipedia.org/wiki/There_are_unknown_unknowns

>"Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know. And if one looks throughout the history of our country and other free countries, it is the latter category that tends to be the difficult ones." -Donald Rumsfeld

1) Known knowns: These are the things we know that we know. They represent the clear, confirmed knowledge that can be easily communicated and utilized in decision-making.

2) Known unknowns: These are the things we know we do not know. This category acknowledges the presence of uncertainties or gaps in our knowledge that are recognized and can be specifically identified.

3) Unknown knowns: Things we are not aware of but do understand or know implicitly

4) Unknown unknowns: These are the things we do not know we do not know. This category represents unforeseen challenges and surprises, indicating a deeper level of ignorance where we are unaware of our lack of knowledge.

https://en.wikipedia.org/wiki/Agnoiology

>Agnoiology (from the Greek ἀγνοέω, meaning ignorance) is the theoretical study of the quality and conditions of ignorance, and in particular of what can truly be considered "unknowable" (as distinct from "unknown"). The term was coined by James Frederick Ferrier, in his Institutes of Metaphysic (1854), as a foil to the theory of knowledge, or epistemology.

I don't know if you know, but Microsoft COM hinges on the IUnknown interface. Microsoft COM's IUnknown interface takes the Rumsfeldian principle to heart: it doesn't assume what an object is but provides a structured way to query for knowledge (or interfaces). In a way, it models known unknowns, since a caller knows that an interface might exist but must explicitly ask if it does.

Then there's Schulz's Known Nothing Nesiology, representing the existential conclusion of all this: when knowledge itself is questioned, where does that leave us? Right back at JavaScript's Equality Table, which remains an unfathomable unknown unknown to Brendan Eich and his well known but knowingly ignorant War on Equality.

https://www.youtube.com/watch?v=HblPucwN-m0

Nescience vs. Ignorance (on semantics and moral accountability):

https://cognitive-liberty.online/nescience-vs-ignorance/

>From a psycholinguistic vantage point, the term “ignorance” and the term “nescience” have very different semantic connotations. The term ignorance is more generally more widely colloquially utilized than the term nescience and it is often wrongly used in contexts where the word nescience would be appropriate. “Ignorance” is associated with “the act of ignoring”. Per contrast, “nescience” means “to not know” (viz., Latin prefix ne = not, and the verb scire = “to know”; cf. the etymology of the word “science”/prescience).

>As Mark Passio points out, the important underlying question which can be derived from this semantic distinction pertains to whether our individual and global problems are caused by “ignorance” or “nescience”? That is, “ignoring” or “not knowing”? It seems clear that it is the later. We know about the truth but we actively ignore it for the most part. Currently people have all the necessary information available (literally at their fingertips). Ignoring the facts is a decision, an irrational decision, and people can be held accountable for this decision. Nescience, on the other hand, acquits from accountability (i.e., someone cannot be held accountable when he/she for not knowing something but for ignoring something). Quasi-Freudian suppression plays a pivotal role in this scenario. Suppression is very costly in energetic terms. The energy and effort which is used for suppression lacks elsewhere (cf. prefrontal executive control is based on limited cognitive resources). The suppression of truth through the act of active ignoring thus has negative implications on multiple levels – on the individual and the societal level, the cognitive and the political, the psychological and the physiological.

Brendan: While we can measure the economic consequences of your culpably ignorant mistakes of both bad programming language design and marriage inequality in billions of dollars, the emotional, social, and moral costs of the latter -- like diminished human dignity and the perpetuation of discrimination -- are, by their very nature, priceless.

Ultimately, these deeper impacts underscore that the fight for marriage equality, defending against the offensive uninvited invasion of your War on Equality into other people's marriages, is about much more than economics; it’s about ensuring fairness, respect, and equality for all members of society.

khana 4 hours ago

[dead]

hchja 16 hours ago

[dead]