Composable SQL

261 points by earnestinger 4 days ago

From the article:

> What you hope happens is that Postgres will recursively inline every view, merge them all together into a gigaquery, and shuffle the predicates up and down to maximally optimize the query. That was the conceit. And of course that never happens.

The author seems to be describing a Progress-specific problem. Oracle, SQL Server (and probably other "enterprise" databases as well) are pretty good at inlining views and then optimizing the whole resulting gigaquery as if the views were not there in the first place.

> The central problem is that views have to be written for the general case, and then you filter on the view’s output. Views can’t take parameters.

SQL Server supports inline table-valued functions, which are exactly that - parametrized and inlinable views.

Unfortunately, they are not as composable as they could be. For example, a function returning table and a function accepting the same type of a table as input cannot be directly composed - you'd first need to store the first function's result in a temporary table or table variable. But despite the limitations, functions are extremely useful for constructing complex queries which reuse logic.

bob1029 a day ago

Oracle, MSSQL, DB2, et. al., are essentially like magic when it comes to making hopeless piles of SQL run quickly. This is because these engines have all ostensibly targeted the business people over the decades more than the tech people.
Features like dynamically optimizing queries as they return results [0,1,2] isn't something you chase if your audience is willing to rewrite the universe at the first sign of incongruity.
[0] https://learn.microsoft.com/en-us/sql/relational-databases/p...
[1] https://www.ibm.com/docs/en/i/7.5?topic=overview-adaptive-qu...
[2] https://apex.oracle.com/pls/apex/features/r/dbfeatures/featu...
SigmundA 18 hours ago

I mentioned this down thread, MSSQL and Oracle cache query plans allow their optimizers to take their time which I imagine helps with doing more sophisticated optimization.
PG always re optimizes on every call unless you manually prepare the statement and that only last for the session. So PG's optimizer is designed to be very fast not very thorough.
- natmaka 10 hours ago
  
  > PG's optimizer is designed to be very fast not very thorough
  Some parameters weight on this policy: geqo_threshold, geqo_effort, constraint_exclusion, from_collapse_limit, join_collapse_limit, jit_above_cost, enable_partition*...
  As requiring heavy query optimization usually slows simple queries down, this is a matter of context.

willvarfar a day ago

The article talks about how CTEs are efficiency minefields and how Postgres doesn't do a good job of pushing down predicates, for example.

TBH, the solution is to improve the Postgres engine in this regard?

For example, in Bigquery I often use CTEs and see predictable speedups that are explained by predicate pushdown when slapping a WHERE clause on the user of a CTE etc. Bigquery has plenty of performance gotchas but I don't think CTEs are one of them; they seem to be transparent to the optimiser.

Something else that Bigquery supports is user-defined 'table' functions. These are like the 'functors' the article creates.

However, in practice they aren't very widely used, and I think this hints that they aren't as useful as the article is hoping for.

The thing I _really_ want is scoping. If only functions could be described like CTE WITH blocks, e.g.

   WITH visits AS (
      SELECT ...
      FROM web_logs
      WHERE ...
   ),
   foo(bar INT, baz STRING) AS (
      ... function body here
   )
   SELECT ..., foo(a, b), ...
   FROM visits
   WHERE ...

Apart from the tidiness of this, this allows inlining of functions into the SQL statement body rather than as separate statements. Often various tooling e.g. DBT makes it really hard to with separate function definitions etc, and being able to inline logic would make things possible.

Bigquery also supports user-defined aggregate functions. Although there is rarely the need to stray from the standard builtin aggregate functions, when you need them they are brilliant.

(Anyone working _on_ Bigquery reading? If so, please ping me, I have loads of feedback on what works well and where the user pain and toil is ;) )

sa46 a day ago

As of Postgres 12 (Oct. 2019) CTEs are no longer materialized by default and are eligible for predicate push down.
The planner may still struggle but it’s not true that a CTE is an optimization fence.
https://www.depesz.com/2019/02/19/waiting-for-postgresql-12-...
vprcic a day ago

Functions in WITH clause really are a blessing. Oracle DB supported these for a decade now and that was a gamechanger for me.
https://oracle-base.com/articles/12c/with-clause-enhancement...
zetalyrae a day ago

This suffers from the "sufficiently smart compiler" problem. The query planner that can do what I mean with maximal efficiency is always just over the horizon. There's always yet another query that can't be optimized automatically.
- LikesPwsh a day ago
  
  It's not a problem in MSSQL, so solving the fat head of problems is clearly possible.
  The escape hatch in MSSQL for the long tail is materialising an intermediate result to temp table.
  
  SigmundA 18 hours ago
  
  One significant difference between PG and MSSQL is MSSQL caches query plans allowing its optimizer to be much more sophisticated and slower.
  PG re-optmizes every query over and over unless you manually do a prepared statement and that only last for the session its prepared on. Therefore it's very important the optimizer not take much time before execution.
  
  nextaccountic 15 hours ago
  
  > PG re-optmizes every query over and over unless you manually do a prepared statement and that only last for the session its prepared on.
  But you really should using at least query builder that does the prepared statement thing under the hood, and also because writing dynamic queries with string concatenation sucks.
  
  SigmundA 15 hours ago
  
  Parameterized sql is a good thing regardless of plan caching, but it also helps with systems that do plan caching.
  That the client must explicitly prepare the statement and the preparation is tied to the connection is pretty clunky and still means the statement must be replanned for each separate connection.
  Also since it is assumed by the devs of PG that prepared statements are not the norm they seem to avoid making the optimizer take its time and shoot for a fast optimizer rather than the fastest query.
  DB's like MSSQL and Oracle concern themselves much less with how long it takes to optimize since repetitive queries will all use the same plan automatically it also allows interesting features like saving and loading plans from different systems and freezing them so they don't all the sudden go bad in production, but those systems also supports hints unlike PG.

yujzgzc a day ago

I work on a project that does tons of SQL based data transformations (thousands). A lot of it isn't backed by unit tests. Some of our people with more of a C++ background have got the idea that this is crazy, every bit of business logic should be encapsulated into a function just like in every other civilized language, so that it could be unit tested. They built a framework like this for SQL and implemented quite a few jobs with it.

The problem is most of the SQL-first devs hate it and would rather write untested code, or rely on alternative test automation techniques like bringing up a local DB instance with fake data, than have to refit their business logic into pure functions. The truth is that it's just not a way of working that comes naturally to people who write a lot of SQL for a living.

fifilura a day ago

I can definitely sympathise with the SQL developers here. (I have worked extensively both with SQL and C++).
There is a trade off between unit tests and having control and overview of what you are writing.
Also very often it is so much more about verifying that the input data is clean and if this becomes more cumbersome, you may have lost the ability to debug the biggest source of errors.
And of course sometimes it hits a nerve when the "real engineers" come and dictate exactly how you should do things. Reference also to Joel Spolsky's architecture astronauts.
- NortySpock a day ago
  
  dbt v1.8 added a feature to be able to unit-test a view or CTAS statement. (a natural extension of dbt since it was designed to assist with moving and renaming the inputs to a view)
  I haven't tried it yet but I look forward to doing so. Already dbt macro expansions with tests have provided me way more test coverage than I ever had in the pre-dbt days.
  It's a footnote in the article, but it seemed like a natural half-step forward in the direction indicated by the article.
thyristan a day ago

Pure functions in the context of databases are a rarity. A database is meant to be persisting data while upholding certain consistency, integrity and isolation guarantees. This basically means that most of the database-using SQL functions can never be pure, because having side effects is their whole purpose. Whats more, integrity and isolation are basically untestable, especially with unit testing.
So imho the DB people are totally right, to test any SQL, you need database tables with data in them.
svilen_dobrev a day ago

long time ago i had similar need.. did not want to directly churn SQL, for hundreds of small filters. So i did an expression parser and translator, and wrote all things in simple python functions, then translated those into SQL automatically. Both funcs and the translator were heavily tested. Apparently, that parser also grew few other "renderers" - like evaluator, documenter and what not.
https://github.com/svilendobrev/svd_util/blob/master/expr.py
smartmic a day ago

A lot of enterprise software tools that I know of are not tested as your C++ professional developers would expect. Especially when this software is not from the original software vendor, but is assembled/configured/enhanced by third party vendors or consultants. But of course even such software is "tested", albeit by the end user over a long period of use, unstructured, haphazardly, and in most cases unsatisfactorily. The problem is particularly acute if the industry is not deeply rooted in software development.
slt2021 15 hours ago

SQL relies on RDBMS's type system, just like C++ devs rely on C++ OOP.
the problem lies with lax/weak table design that allows too lax data (all string nullable fields for example), instead of strict types like int, float, varchar, foreign key constraints etc.
you dont need unit test if you table design is clean, because SQL operators are deterministic and are tested by the RDBMS vendor
just need integration test, dry run test, periodic data integrity checks at the system boundaries when data is ingested/egress
SQL represents operations on the Sets of data and their relations, so you need not unit test on a single row, but a test suite on your entire DataSet (backup copy or synthetic copy)
cbsmith 17 hours ago

> The truth is that it's just not a way of working that comes naturally to people who write a lot of SQL for a living.
I don't know... Refactoring Databases is pretty old now. Plenty of DBAs can handle this style and even prefer it. I think this is more of a cultural problem in the small than in the large.
TachyonicBytes a day ago

I assume that framework is not open-source or somewhere I can look at it?
- mousetree a day ago
  
  You could take a look at the open-source dbt framework. They have a good implementation of unit testing for sql
  https://docs.getdbt.com/docs/build/unit-tests
- yujzgzc 17 hours ago
  
  I guarantee you wouldn't want to use it
marcosdumay 21 hours ago

Hum...
You know that SQL is completely pure up to the point where you specify a data alteration command, right?
Your framework is ditching the queries, that are completely pure as enforced by the engine, and insists on using maybe-pure functions that all the engines have problems with.

geysersam a day ago

Similar functionality already exists in some SQL implementations, notably DuckDB:

  create table macro author_books() as   
  (
    select
        a.name as author,
        b.title,
    from
        a
    inner join
        b on b.author_id = a.author_id
  );

is the same as the example in the article and it is called like:

  with a as (...),
       b as (...)
       from author_books();

https://duckdb.org/docs/sql/statements/create_macro.html#tab...

ncruces a day ago

SQLite supports this with the help of a virtual table: https://github.com/0x09/sqlite-statement-vtab
Available in sqlean: https://github.com/nalgeon/sqlean/blob/main/docs/define.md#t...
And my Go driver: https://pkg.go.dev/github.com/ncruces/go-sqlite3/ext/stateme...
Mine is a strict port, though admittedly, this could be improved with additional pushdown.
_dark_matter_ a day ago

I really dislike leading off with "create table". My brain is unable to read it except as DDL.
typ a day ago

DuckDB macro doesn't support instantiating a table macro using the `query_table` function. So, it's not possible to expand a table macro like `foo(bar(baz()))`, making its use limited.
- chrisjc 16 hours ago
  
  Perhaps I don't fully understand what you're saying
  CREATE TABLE person (name VARCHAR, age BIGINT); INSERT INTO person VALUES ('Alice', 37), ('Ana', 35), ('Bob', 41), ('Bea', 25); CREATE OR REPLACE MACRO my_query_table(t) AS TABLE (SELECT * FROM query_table(t)); SELECT * FROM my_query_table('person');
  Or do you mean that you can't use `query_table` with a macro identifier in place of the table identifier as input (after all, where would put the macro args)? Perhaps using `query(...)` instead would work?
  CREATE OR REPLACE MACRO query_macro(m, i) AS TABLE (SELECT * FROM query(CONCAT('SELECT * FROM ', m::STRING, $$($$, i::STRING, $$)$$))); SELECT * FROM query_macro('my_query_table', 'person');
  Or maybe through some other kind of gymnastics https://duckdb.org/2024/03/01/sql-gymnastics.html
  
  typ 8 hours ago
  
  Here is a contrived example:
  CREATE MACRO foo(bar, baz) AS TABLE WITH base AS ( FROM query_table(bar) t ), refined AS ( FROM base JOIN query_table(baz) u ON base.id = u.id ) SELECT * FROM refined;
  I want to test many user-defined combinations of (bar, baz). How can I pass a macro instance to query_table like
  FROM ( (FROM foo(mybar1(some_arg1, some_arg2), mybaz101(500))) UNION ALL (FROM foo(mybar2(some_arg3, some_arg4), mybaz201(300))) UNION ALL ... ) SELECT *;
  Your second example using the 'query_macro' looks like something I was looking for. But it doesn't seem to be of general use that supports an arbitrary number of arguments.
  
  geysersam 5 hours ago
  
  FROM ( (with baz as (from baz101(500)), bar as (from bar1(arg1, arg2)) from foo ) UNION ALL (with baz as (from baz201(300)), bar as (from bar2(arg3, arg4)) from foo ) UNION ALL ... ) SELECT *;
  When you pass table arguments to a macro, don't pass them as a parameter to the macro, instead make a subquery and give it the name that `foo` expects.
  
  typ 3 hours ago
  
  Not as terse as I imagined it could be, but I really didn't expect that it could be done this way. Thanks!
- geysersam 12 hours ago
  
  It's possible. You just have to use ctes:
  with arg_to_bar as (from baz()), arg_to_foo as (from bar()), from foo()
  Where `arg_to_bar` is a table name used internally in `bar`.
  
  typ 7 hours ago
  
  I clarified what I meant by an example in the other reply.
krembo a day ago

Isn't this an abstraction of functions?
- tankenmate a day ago
  
  It is an abstraction, it may be arguable if it is an abstraction of functions.
  I once built a web app where we had a library of CTEs that we used to manage all the layers of indirection in the data, it was performant and made it much easier for users to be able to search and manipulate the data; especially for ad hoc data analysis.
jeltz a day ago

How do these differ from SQL functions in PostgreSQL? They look like the same thing.
- geysersam 12 hours ago
  
  Yes but they are transparent to the optimizer so the argument against them in the blog article doesn't apply.
n_plus_1_acc a day ago

How does it differ from a view?
- geysersam a day ago
  
  It can take arguments. I didn't use any though so it's not a good example

feoren a day ago

SQL cannot be "fixed". It's too broken, too ubiquitous, too customized, too optimized. Will these functors integrate with the thousands of optimizations modern database systems rely on?

This post has the format: "SQL is extremely terribly awful at writing business logic. But if people implemented this one half-baked, unrealistic idea, then it would suck a little bit less!" The first part is spot-on: SQL is a horrible language for writing business logic. So: stop. Stop doing that.

Stop writing business logic in SQL!

Imagine you joined a new team, and they said: our program uses a bunch of XML files as its data format, and that's why most of our business logic is implemented in XPath. Just sit there and ponder that situation for a little bit. This is what you all sound like to me. Stop it!

> Testing the simplest query requires building up a massive object graph, just to test an infinitesimal slice of it.

Yes, correct, because SQL is awful at representing business logic, and foreign key constraints are business logic! The fact that an employee has one supervisor who is also an employee (FK Employee.SupervisorID -> Employee.ID) is a business constraint. (And one that can easily change, by the way.) All these database engineers who can't pull themselves away from their stored procedures will argue with me to their graves about foreign key constraints, but they're generally terrible. They're a business constraint that just happens to be supported by the database engine, but they're still a business constraint. Stop putting your business logic in your persistence layer.

We've been getting better at building programming languages that actually support software engineering every year. Use one of those. SQL is an output of one of your program layers. Just like XPath would be an output of your program. Compose SQL in an actual real programming language that isn't horrible instead of doing any of this.

wvenable a day ago

> Yes, correct, because SQL is awful at representing business logic, and foreign key constraints are business logic!
This strikes me as a unnecessary all-or-nothing argument. I would agree that you shouldn't put all your business logic in the database but that doesn't mean you shouldn't put any business logic in your database. If foreign key constraints are business logic then so are table definitions and column types. It's not awful for those things as it's literally built right in.
My advice is to ensure, as best as possible, that invalid states are impossible to represent. If you can't have an address without being attached to a client, make that impossible to represent in the database. It's one less thing you'll ever have to worry about.
- feoren a day ago
  
  > This strikes me as a unnecessary all-or-nothing argument
  I completely understand why it comes across that way. HN Comment length is limited and the current practice is a comfy local optimum, so any one change away from it is going to feel bad, and it's hard to get across my full theory of database design in comment form. I would not recommend most existing teams simply stop using foreign keys, unless they also adapted a lot of other changes that coexist with it. Still, there is a better world for database design, and that world probably does not use foreign key constraints.
  > I would agree that you shouldn't put all your business logic in the database but that doesn't mean you shouldn't put any business logic in your database.
  I actually would (lightly) argue that does mean you shouldn't put any business logic in your database, as I find that highly coherent design philosophy leads to highly coherent code. The most important question to me is about how the system can handle change over time. Foreign keys are one part of a larger set of "design principles" that everyone seems to take for granted that lead to a system that cannot handle change.
  > If foreign key constraints are business logic then so are table definitions and column types.
  I disagree on this one. Table definitions and column types are a "level below" business logic, as evidenced by the fact that you could probably refactor a lot of schemas for better/different performance characteristics without changing any of the actual business logic that drive them. The business logic doesn't (shouldn't) care whether you've used a bitfield or a bunch of boolean columns or a child set-inclusion table to represent a bunch of yes/no facts about some entity. They're "persistence logic", and the database is exactly the right place for that.
  > My advice is to ensure, as best as possible, that invalid states are impossible to represent.
  The problem is that an "invalid state" is highly dependent on your perspective. A state that is invalid to use in one part of your codebase is perfectly valid in other parts. The ubiquity of "draft", "draft final", and "final" shows that humans want to represent "invalid states" all the time! Aren't you glad your IDE lets you represent invalid code? Okay, fine, that data is not ready for the annual submittal, but does that mean I'm really not allowed to save it at all?
  And these "invalid states" are extremely volatile as your focus expands around your domain. All contact info must have a primary contact method -- well, unless they're just "contact stubs". All facilities need a facility manager -- well, unless they're 3rd party facilities, or storage facilities, or in Region 9, or start with the letter 'P'.
  Haven't you been 300 hours into the development of a system when suddenly someone springs on you, "Oh yeah, Bob has two bosses. Why should that be a problem?" And you're screaming internally, "Of course it's a problem! You assured me six months ago that everyone only has one supervisor, and now you're telling me this one guy has two!? This changes EVERYTHING!"
  But why? Why does it change everything? Why does every little change to our database schemas always seem to propagate to the entire system? It's because we're designing databases wrong. And foreign keys are a (small) part of that wrongness.
  
  barnabee a day ago
  
  > The problem is that an "invalid state" is highly dependent on your perspective. A state that is invalid to use in one part of your codebase is perfectly valid in other parts. The ubiquity of "draft", "draft final", and "final" shows that humans want to represent "invalid states" all the time! Aren't you glad your IDE lets you represent invalid code? Okay, fine, that data is not ready for the annual submittal, but does that mean I'm really not allowed to save it at all?
  It's perfectly possible to define a schema that can represent draft or incomplete data, syntactically invalid code (as a VARCHAR/TEXT), etc. while still also enforcing that a complete and valid domain object must be available when the record is in a non-draft state.
  > Haven't you been 300 hours into the development of a system when suddenly someone springs on you, "Oh yeah, Bob has two bosses. Why should that be a problem?" And you're screaming internally, "Of course it's a problem! You assured me six months ago that everyone only has one supervisor, and now you're telling me this one guy has two!? This changes EVERYTHING!"
  Schema updates and migrations should not be this difficult in any reasonably designed system.
  
  galbar 20 hours ago
  
  For whatever it is worth, I have reached the same conclusion and I have been building systems like you describe for the last few years.
  Recently I changed jobs. In the new team they love their ORM with all the foreign keys and direct mapping between business logic data objects and the database schema.
  Needless to say, it is a mess. Despite all the automated tooling they have around migrations that "should make it easy" to change things, the reality is that the whole application depends on what the database schemas look like and migrating the existing representation to a new one would break everything.
  This has become an elephant in the room that nobody talks about and everyone works around.
  
  wvenable 18 hours ago
  
  > Haven't you been 300 hours into the development of a system when suddenly someone springs on you, "Oh yeah, Bob has two bosses. Why should that be a problem?" And you're screaming internally, "Of course it's a problem! You assured me six months ago that everyone only has one supervisor, and now you're telling me this one guy has two!? This changes EVERYTHING!"
  I have had this issue and it does change everything -- all the systems that depend on that data, the calculations involved, the user interface. The change in the database schema is actually relatively minor compared to the rest of the code. But with an enforced schema there's a never a case where it will be wrong -- the code with either work all the way one way or the other but never both at the same time.
  Neither of examples provided has anything to do with foreign key relationships. If Bob can have two bosses, that's changing a column into a relation. That's going to have an effect beyond the database. If the facilities manager is optional then that's changing a column to be nullable. Again most the work after that change is outside of the database.
  Even if drafts or temporary invalid states exist, I'm never going to want orphaned data. You said table definitions are column types are a "level below" business logic but the table definition is directly tied to what you're trying to store and how you're storing it. I think I would simply argue that foreign key definitions are also part of that "level below". It might be that not all facilities need a facility manager but if one is specified it better exist and you better not be able to remove it without affecting the facility.
  I have a bug right now where in some race-condition situation the database is catching the issue with the constraints. I haven't found the bug yet (it's low priority) but at least it's not filling my database with invalid data -- it is just crashing that operation.
  
  rangerelf 17 hours ago
  
  > I have a bug right now where in some race-condition situation the database is catching the issue with the constraints. I haven't found the bug yet (it's low priority) but at least it's not filling my database with invalid data -- it is just crashing that operation.
  This right here. The thing that all proponents for nosqls and "this one trick will turn sql hell into sql heaven" don't seem to take into account is that a properly designed schema will save your ass when the front end(s) have bugs; and yes, properly designing a schema is work, up-front work that needs to be thought about before writing the first line of code (or sql).
  Thanks for posting this, at least I know I'm not the only one that feels this way.
liontwist a day ago

What is business logic? What specifically is SQL bad at?
I’m not sold by your foreign key example. Constraints are great. I guess you can do that in python? Good like getting ACID right
As soon as your data stops resembling a tree and starts looking like a graph than all your “business logic” looks a lot like operations on relations and you are back to where you started.
dagss a day ago

I agree that SQL has problems as a language.
I disagree that we should be happy about moving data out of the storage layer into the backend in cases where it does not make sense performance wise.
The problem is doing some processing in SQL in some cases has a huge performance advantages over moving data to the backend.
If your business logic requires you to take a million rows from table A, apply some rule, and output a million rows into table B -- then piping all that data through a backend is a bad solution performance wise. I reject the idea that all that extra IO (+extra overhead also for the database) is worth it just to use another programming language than SQL.
What we need is better alternatives that execute at the storage layer. Probably starting with proper languages that compile to SQL, like we saw with JavaScript.
magicalhippo a day ago

Agree not to putting your business logic in stored procs, but hard neg on getting rid of all constraints.
I consider it defense in depth. Sure your application should absolutely ensure order lines aren't referencing a non-existing order, but bugs happen and I prefer the database returning an error to the database getting full of inconsistent data.
branko_d a day ago

I don't understand what you are trying to say about foreign keys. Are you advocating moving FKs from database to application?
If yes, you are opening yourself up to race conditions that will only become visible in concurrent environment. I had a distinct misfortune of working with an ERP system based on Oracle which didn't use database-level FKs. Predictably, their customers would occasionally complain about "dangling" data, which, to my knowledge, has never been properly resolved.
- thyristan a day ago
  
  It is possible to move foreign keys and other constraints and consistency checks into the application. But that comes at a huge cost: Either you need to be generous with your use of transactions to ensure that no invalid state is ever written, and your application has to know and use those transactions. Which is something application developers in my experience often fail at. Or you have to accept that broken inconsistent data is a fact of life that can happen at any time, and will increase in times of high throughput and concurrency (where the business part of the company will care most about things running smoothly). Also, you will limit yourself to exactly one application using the database, because otherwise you will have to duplicate your constraint logic in any application precisely.
  There is also a cost of doing constraints and consistency in SQL, the language is somewhat limited in what you can easily represent (anything is possible, but only some things are easy). And the handling of violations still needs to be implemented in the application, which often necessitates at least some duplication and fiddling to e.g. get friendly and useful error messages.
  So imho, yes, SQL has its problems, but it is still the lesser evil in almost all cases.
  
  feoren a day ago
  
  > Or you have to accept that broken inconsistent data is a fact of life that can happen at any time
  You have to accept this anyway. Data can be broken and inconsistent in so many different ways that you cannot protect against every possible way. People will type the wrong thing in sometimes, in ways you never expected. Someone pasting text into the wrong field is just as "broken and inconsistent" as an ID field pointing to a nonexistent entity. How important those errors are to the business depends on how important those columns are, not whether they're an ID vs. text. And as another person pointed out, data always ends up split/replicated among multiple systems anyway.
  > Also, you will limit yourself to exactly one application using the database, because otherwise you will have to duplicate your constraint logic in any application precisely.
  Layer your architecture? Expose an API? Microservices? There are all sorts of ways around this problem.
  Yes, it makes it harder for people to just log into your database and run their own SQL queries. Sometimes that may be politically hard to sell. But I heard this same argument in college when private variables were introduced: wait, but, what if someone wants to directly manipulate private variables of class members? That is not a virtue. It's perfectly reasonable to assert that some internals are simply private implementation details. You are reserving your right to change those details in the future as the application changes. You are separating the details of your persistence layer from the public interface that you're exposing. That is a virtue.
  
  vouwfietsman 16 hours ago
  
  I am not sure what your point is, as you have not yet described it fully. Do you mean to suggest that you just drop the C of ACID? What is there to gain exactly besides the freedom that is like the freedom of a blind man crossing the freeway.
  How do you regain the C from the application layer? Do you reject scaling horizontally? Or does the C just interfere with achieving the mellow developer nirvana.
- threatofrain a day ago
  
  It's inevitable that some data will be split between more than one service / database. I'm not saying we should have data whimsically split up, but foreign key constraint logic must eventually come to live in the app layer too.
default-kramer a day ago

> Compose SQL in an actual real programming language that isn't horrible instead of doing any of this.
Got any recommendations? That's a serious question, I've tried many (and built my own) and some are okay but none feel remotely good enough when you consider how much progress we've made basically everywhere else in the wider world of software engineering. Relational databases are so good; they deserve better than SQL!
- efromvt a day ago
  
  Shameless plug if you've tried a range of alternatives: I've tried building a version (Trilogy) targeted at analytics, and I'd be curious about how it stacks up to your experience with other attempts at this - can read more about the composition/reuse story here: https://trilogydata.dev/blog/composability_reuse.html
  
  default-kramer 16 hours ago
  
  I looked at it when it hit HN a while ago. Looks nice, but not exactly what I want because
  > Joins and source tables are automatically resolved in Trilogy. You won't ever explicitly specify one in your query; you're declaring what you want, not how to get it. The responsibility of how to get it is delegated to the semantic model.
  I don't fully understand how the semantic model is created (is this documented anywhere?), but I don't think I would enjoy a hard separation between the query layer and the semantic layer regardless.
  I would prefer a continuum of "how much indirection do I really want" to be available to the user. My own exploration of this topic is https://docs.racket-lang.org/plisqin/index.html and you can compare section 2 (Using define-schema) to section 7.2 (Plisqin Desugared) if you want to know what I mean about a continuum.
  Unrelated, the article we are commenting on has inspired me such that I think I have an answer to the big type system questions that eluded me when I decided to put Plisqin on the shelf. Maybe time to pick it up again... but probably not.
- HdS84 15 hours ago
  
  C# LINQ comes close.
  It basically allows to express queries independent of the target language. E.g. it's entirely possible that the same LINQ query can target PG, MSSQL Server, objects in local memory or RavenDB (NoSql). Quality of the generated query depends on the translating library ofc.
  Syntax is very nice and much more aligned with programming than SQL. E.g: someData.Where(x => x.id =="someId").OrderBy(x => x.LastModified).Take(10).ToListAsync()
  Note that the query is not executed until ToListAsync (or other materializing calls) is called. This allows for composabillity.
  There is an alternative form of writing it more resembling SQL, but I've never seen it much used.
- creamyhorror 18 hours ago
  
  https://kysely.dev/ (SQL query builder for Typescript) is good, although I'm not sure it lives up to your expectations. Instead of an ORM, build the underlying SQL queries efficiently.
- feoren a day ago
  
  > Got any recommendations?
  Find a good ORM that allows you to keep things at the "set logic" level for as long as possible and has a good internal query representation. Avoid or turn off any fancier features of the ORM like "navigation properties" or "implicit subtyping" or automatically detecting changes or anything that happens "magically" in the background: prefer simplicity and being explicit about everything. For C#, this is EntityFramework with most of its features turned off -- compose queries by joining IQueryables together, and only drop down into in-memory iteration when you have to.
  If that's not available, I'd probably look for a library that presents a structured representation of the SQL abstract syntax tree (probably a subset) as an object, along with a translation to the various dialects. Pass those ASTs around and compose them together. I'd probably slowly build abstractions on top of this that eventually look similar to what I described in the first paragraph.
- jeltz a day ago
  
  I have also tried many and so far none have even managed to be as good as the kludge that is SQL.
tacone a day ago

> Imagine you joined a new team, and they said: our program uses a bunch of XML files as its data format, and that's why most of our business logic is implemented in XPath. Just sit there and ponder that situation for a little bit.
Been there 20 years ago! It sucks at unimaginable levels!
dicytea a day ago
I agree with most of this, but I don't understand why you'd treat FKs as business logic. IMO, it's clearly a part of the data structure. Like, say if you have a domain that looks like:
```
    type User = {
      username: string;
      posts: Post[];
    }

    type Post = {
      content: string;
    }
```
(for the DDD-brained, assume User is an aggregate and Post is a value object here)
The natural way to represent this in the database is:
```
    CREATE TABLE users (
        username text PRIMARY KEY
    );

    CREATE TABLE posts (
        author text NOT NULL REFERENCES users (username),
        content text NOT NULL
    );
```
I just don't see why you'd do it in any other way.
- feoren a day ago
  
  <BossVoice>
  Hey dicytea, thanks for getting that blogging app up. Our users love it. Hey, I was talking to John, and he said he's got some guest columns he wants you to throw up there. Just some blog posts from thinkers he likes that he thinks would be a good fit for our site. Can you throw up some of those?
  What do you mean they need to be users of the site? They're just some people whose posts we want to feature. No, they shouldn't need to log in. What are you talking about? What does that have to do with it?
  </BossVoice>
  Oops, turns out the rule "all blog post authors are users of the site" was actually just a volatile business rule after all.
  
  lelanthran a day ago
  
  Yes, and isn't it wonderful that you get an error message when you try to change a business rule, forcing you to properly encode the new rule instead?
  A lot of these types of scenarios are missing the fact that, without these enforcements in the database, sooner or later a developer is going to make a change that violates an existing business rule without realising that they just broke a rule!
  Rules change. We know this. What is valuable is being told that some new rule conflicts with an existing rule.
  If you don't enforce the business rules in the database, how do you know when a new rule conflicts with some existing rule?
geysersam a day ago

I think the fact that these functions/functors/macros already exist in a few different varieties in mainstream database systems presents a challenge to your argument.
SQL can be fixed, has been fixed, and is a underutilized tool in my personal opinion.
- feoren a day ago
  
  > SQL can be fixed, has been fixed
  It has? Can you point me to an example?
  
  geysersam 12 hours ago
  
  I meant in the sense that the additions suggested by the blog post author already exist in multiple sql implementations.
FridgeSeal a day ago

I think you’re throwing the baby out with the bath water here a bit.
I agree that stored-price are rarely, if ever the solution.
I do however think that having something that enforces certain constraints across all your data, and forces you to handle schema evolutions reasonably over time is really powerful. Database implementations are (mostly) full of hard-fought correctness decisions, and I don’t think most devs are in a position to “whip out” an implementation that cares for a number of these on-demand.
We had the whole NoSQL thing, we saw what when we threw most of this stuff away.
Is sql good enough? For many problems yeah. Is it _really_ good enough? Nah probably not, we could do a lot better, but I think a better design looks more like a better sql-like language and less like a full programming language.
halflife a day ago

100% completely agree.
You just can’t scale SQL code. If you have a small app, then sure. But then I don’t know why you’d invest so much in testing. But if you need to scale your code to more than 3 people, SQL should only be your persistent state store. You probably already have a BL layer in another, more manageable programming language, so use that.
In my team I set a rule to remove all SQL functions and views, and move all the logic to our typescript backend. That resulted in code which is much easier to test, and is version controlled.
- agluszak a day ago
  
  And what are the results of your decision in terms of performance?
  
  halflife a day ago
  
  The DB is rarely our bottleneck. Our reads are very simple, mostly paged sets from a table with a 1 level join. When we need complex abstractions or transformations, instead of using function or views, we listen to table changes, and with a TS transformation layer we save a new row in a specific table for the abstraction. That way the BL is handled in TS, and performed in writes asynchronously instead of in-line in reads. The only downside is that the transformed data is not real time.
  That mechanism allowed us to scale our data very easily without any performance hits.
  
  cyberax a day ago
  
  These days? It doesn't really matter. For most purposes, you can just hold the entire important dataset in RAM.
  For example, a humongous company with 500000 employees (Amazon) can trivially hold on a fairly low-end server the database of all the personnel, and all their salary payments, and all their employment history, badge photos, and office access logs.
- baq a day ago
  
  Yeah that's what people who don't have much data do with good results. If you can do that, it's the right choice.
  Once you have data you move code to data and not data to code. It isn't a decision you get to make. You only decide how you do it.
  
  halflife a day ago
  
  I’m sorry, but we have a lot of data. We just implemented the right procedures to consume it
  
  baq a day ago
  
  If you can do processing outside of the database you don't have a lot, that's basically the definition of a lot of data. Don't be sorry, try to not have more data.
  
  nejsjsjsbsb a day ago
  
  That's an odd rule of thumb. You going to do some processing outside of the DB, e.g. transformations, views. Then you have the event streaming and aggregations world. Then different type of DB. Distributed DBs where you could do processing on each node in code for example.
  
  baq a day ago
  
  I think we agree. Transformations and views bring code to data (if you are willing to consider them code, which I am). Then when you say 'Distributed DBs where you could do processing on each node in code for example' it also sounds like exactly what I have in mind - code brought to data, not data to code.
  
  nejsjsjsbsb 11 hours ago
  
  Ah I see- you mean network-wise. I was thinking domain wise.
gregwebs a day ago

It seems like you are operating in an environment where a single code library can implement business logic and mediate all DB write access?
Can someone update the db directly with another programming language or do they have to go through an API?
The value of business logic in the DB is that all clients must obey it.
zetalyrae 14 hours ago

If I'm at the point of dropping foreign keys, why not just switch to DynamoDB?
hobs a day ago

I don't know why you are getting downvoted, and I generally don't like a lot of business logic in the persistence layer, but representing the things you absolutely dont want another part of your app doing via FKs is...pretty normal.
If you are arguing it spreads out your business logic into multiple places, that's fair, but so does... a lot of things. Databases with decent fks and constraints have a tendency to outlive the apps that build atop them, and communicating these constraints in long lived fashions can be ... good actually.
- feoren a day ago
  
  > representing the things you absolutely dont want another part of your app doing via FKs is...pretty normal
  "Normal" has never been compelling in a world where most software sucks and most projects overrun their budgets.
  > If you are arguing it spreads out your business logic into multiple places, that's fair, but so does... a lot of things
  Stop doing those things too!
  > Databases with decent fks and constraints have a tendency to outlive the apps that build atop them
  This is oft repeated, but I think there are some flaws with it:
  1. I'd ask the question: is that because those apps sucked? It's possible to have a mediocre database and then an even worse app on top of it; in that case it wouldn't be surprising if the app itself died first.
  2. Did the next app use the existing schema as-is, or did it adapt/migrate the schema to a different form that worked for that next app? There's a big difference between saying "this data turned out to be useful for other purposes" (of course!) and "this data representation turned out to be so perfect and elegant that we kept using it"
  3. Do you have an overly narrow definition of "apps" here? I'd argue if six teams are all doing their own things with the database -- some writing code, some building views, some hooking into it with Excel and PowerBI -- then I'm not overly interested in the fate of the "OG app" that the database was built for. If that one dies, you still have five other "apps" using that database. We're not talking about a schema sticking around because it's so useful, we're talking about an entrenched, locked-in tech that nobody can change because it's too costly. That's a predictable symptom of a bad representation! That's exactly what happens with bad code! Good systems can handle change in isolated parts without breaking things for everybody else. Database schemas are so entangled with themselves, and SQL is so bereft of proper abstraction, that they can't change easily once they have many users.
  
  hobs 21 hours ago
  
  To your questions:
  1. no, its because they were so useful they expanded beyond their original containers.
  2. The cluster of apps that then lived on the top of the schema (the four times I have worked on versions of this) mostly just extended things occasionally when they needed to, but oddly mostly did not and just read and wrote to the same schema; this required more coordination and slower development release cycles, but not really a problem for these companies.
  3. Somewhat fair, but the real world and real shippable software (which you seem interested in) has a lot of shared concerns, a lot of trouble when you ETL or use APIs for everything instead of just... here's the data. Short term that seems entirely reasonable instead of creating contracts everywhere and spending the time and effort understanding more and more of the changes introduced into smartly decoupled systems.
  In a nutshell you're right, SQL makes it easy to mix a bunch of concerns and ship them and let them grow organically. Which I would argue almost any software structure does, ESPECIALLY with data, except now you dont even get to know where all the bodies are buried.
  Edit: for instance, I have worked on/converted many useful databases that were created in the 1970s, before I was born. Can you imagine a stateful application that you could easily understand and utilize all the data that's been ingested on for 50+ years?
  
  feoren 17 hours ago
  
  > I have worked on/converted many useful databases that were created in the 1970s, before I was born
  "Converted" makes it sound like you found a database with lots of useful data and then developed a different schema for it that was useful for your own needs. This is kinda my point: when people say the database outlives its application, they mean the data itself. That's somewhat irrelevant to the discussion of how to properly architect database schemas.
  Did you find relational database schemas from the 1970s that made you go "ahh, this is perfect for what I need!"?
  > Can you imagine a stateful application that you could easily understand and utilize all the data that's been ingested on for 50+ years?
  I don't think I understand this question. Yes, I can certainly imagine an application that remains understandable, relevant, and maintainable for 50+ years. I hate the current expectation that apps and code have shelf-lives in the single-digit years. I hate the term "legacy code". These are immature attitudes from the "move fast and break things" eternal startup crowd whose only goal is a profitable exit. That is not "engineering", it's apathetic planned obsolescence, and everyone who participates in planned obsolescence should be ashamed of themselves. Code that could reasonably last 50+ years should be your default if you call yourself a "software engineer". Otherwise your title should be "YOLO coder-bro".
  But isn't that the opposite of letting every random team have unfettered access to the internal private details of a database and entrenching every little design decision under a mountain of unmaintainable SQL? "Growing organically" sounds great until you realize you're just describing cancer. It took life trillions of tries before "growing organically" ended up with something half-passing as kinda-elegant if you squint. Do you want bridge engineers to let the design of a suspension bridge "grow organically"? Will that bridge last 50+ years?

sixdimensional 20 hours ago

I found the argument that it’s impossible to test SQL (and some other hard statements in this article) a bit too absolute.

Testing SQL is not impossible – SQL unit testing is a thing – identify a unit of SQL, such as a query, function, or stored procedure - mock the necessary inputs and any state that is modified and check the result of the unit just as you would in any unit testing framework (e.g. arrange, act, assert). Perhaps it’s a little difficult, or different to wrap our heads around (and many do not take the time to see how this can work) but it’s not impossible.

Also, as others have already pointed out, depending on the implementation of the SQL engine itself, different engines can have very different behaviors, some of which make composable SQL possible already today (e.g. table valued functions, CTEs that are inlined, stored procedures, etc).

And for what it’s worth, if you look at a lot of the data processing engines we use today (such as Apache Spark), they really solve the problem of being able to compose and modularize code and SQL together, through data frames or things like language integrated query, integrations of languages that compile down to code that can run and be orchestrated on common execution engines, and so forth.

jojo_ an hour ago

You can do this with SQLite: https://ricomariani.github.io/CG-SQL-author

This is a fork of a facebook project.

throwaway201606 a day ago

I understand the primary premise about the difficulty with testing SQL and fully agree with it.

I do have a question though - while I understand how functors can help make the problem easier to tackle, I am not sure I fully understand how functors are different from a similar existing tool - stored procedures.

Some DB flavors:

- can take tables as arguments to stored procedures - can return tables - also offer the additional benefit of being able to run almost all flavors of SQL commands ( DDL, DML, DQL, DCL, TCL) in those stored procedures

Netezza stored procedures, for example, can do what you describe here:

https://www.ibm.com/docs/en/netezza?topic=nsg-return-result-...

As can SQL Server & Oracle (which both return cursors, which are just ordered tables):

https://learn.microsoft.com/en-us/sql/relational-databases/s...

https://docs.oracle.com/cd/B28359_01/appdev.111/b28843/tdddg...

Am I missing something fundamental here? How are functors different from stored procedures? To me, they seem to be just a sub-class / sub-type of stored procedures.

default-kramer a day ago

The goal is that the composable parts get woven into an efficient, planner-friendly query. Stored procedures completely undermine that unless something very exciting has happened since last I checked (SQL Server, but probably applies to all of them). You will likely end up in a "row by agonizing row" situation.
(Okay, maybe you can find a few cases where stored procs work decently, but in general both composability and performance will be much worse than the proposed functors.)
- throwaway201606 a day ago
  
  OK, this I understand, that is a good insight - cursors are row-processing based so its gonna be slow
  I think Netezza, SQL Server and Oracle are all cursor-based processing "by default" so this makes a lot of sense. I suspect that they all have bulk operation capability but can't immediately think of how I would have worked bulk processing in a way that maps to this article - maybe something like analytic functions like windowing, partitioning etc. that is definitely not row by row.
  Having said that, the examples I see for actual testing in the article are DQL / DML so would be multiple row processing by default .. yes, the functor definition / creation is a DDL process but it is a "do once and reuse the definition" thing (like, the author correctly observes, a view, which is the point of functors) and the functor in use would just be DML. In which case, functors go back to looking like stored procedures...
  I also understood composability as being built in for SQL - for example, in Oracle, packages allow composability of stored procedures, triggers, sequences etc allow composability of DML and views allow composability of queries and tables - which the author points out in the article.
  With functors, DDL, DML, DQL, DCL, TCL would still be the only command tools available unless a new command language was invented for SQL for testing - let call that something like DTL (Data Test Language), with a whole new bunch of associated new SQL keywords, capability and functionality that are built right into the core of the DB engine that are optimized for what functors are trying to achieve.
  
  gregw2 a day ago
  
  Regarding "can't immediately think of how I would have worked bulk processing in a way that maps to this article think of how to map composibility" ...
  I believe stored procedures where you construct dynamic sql and execute the results can basically provide the composability/performance described with bulk non row-based logic. If you keep it simple it can work ok.
andyferris a day ago

They seem somewhat like stored procedures, but not stored? As in a query can contain a functor in it and then use it immediately. I didn't see those `create functor` statements as anything other than ephemeral - or am I wrong?
EDIT: also stored procs that use imperative logic and cursors can be quite a bit slower than queries that achieve the same logic - the capability here is purposefully a subset of that and is just to help build standard SQL queries that can go through the standard query planner.
- throwaway201606 a day ago
  
  I think they have to be long lived else they cannot make sense for performant testing. ie they are created as DB objects, using DDL, in the same way tables, views, functions etc are made.
  They can certainly be created at test run time but that would slow things down a lot - you would essentially be creating a ton of objects every time you run the test which means having a setup to test if they exist or not, take them down if they do or fix them if they don't match spec ( e.g. column and data type changes etc etc )
  The more I think about this, the more complicated I realize it would be to manage this dynamically:
  You essentially have to build a test harness enviroment that figures out your testing elements dynamically from your data environment (with some kind of parameterization engine and data set to tell it what to look for so as to "make functors and run them" (e.g. all PKs of FKs or all columns starting with a certain prefix or all columns of a certain data type etc etc), gets the most up to date definitions of those elements from system tables and uses that data to create or update or drop functor objects ... wow, ok, this is getting complicated, I am going to stop now before I see the void.

agentultra a day ago

Ah but “functor,” is a name from category theory and algebraic topology. I had read along thinking that this would come up in the conclusion! Instead it seems it’s not even mentioned in the sources for the name.

There’s potentially something like this in the “esqueleto” package for Haskell. But it has some limitations. It composes on syntactic fragments.

Although speaking of functors, the category theory ones, maybe categorical databases would fit the bill: https://categoricaldata.net/

RyanHamilton a day ago

He got so close to the right answer but went the wrong direction. "Imagine a programming language without functions.". Imagine SQL was instead based on an actual programming language, with variables and functions. that would solve all the problems you mention. Kdb+ and Dialog already knew this 20+ years ago. I wish someone else will recreate this in an open source library. Now with Arrow format in memory the best hope may be that all languages will be able to act on shared memory data frames bringing tables to every language within the database.

zetalyrae 14 hours ago

> Imagine SQL was instead based on an actual programming language, with variables and functions.
This is what the entire article is about. That paragraph is meant to illustrate the problem of SQL through an analogy.

naiquevin 17 hours ago

I've implemented a CLI tool[1] that tries to address similar problems with SQL as mentioned in the article: testing, business logic and re-usability. The approach is different though - you write SQL and pgTAP tests using jinja templates and the tool generates

1. SQL queries that can be used by the application code using libs such as yesql, hugsql etc. and,

2. pgTAP tests, ensuring that they test the exact same SQL queries that are actually run by your application

I haven't used it in a /real/ production setting though, so I'd still call it an experiment. But I have been using it in a personal project with a handful of users (friends and family). Most of the business logic is implemented in SQL so the queries are fairly intricate (CTEs, sub queries, multiple joins). Occasionally, I push some features/improvements/tweaks and I am able to do so with high confidence, which I believe is mainly thanks to pgTAP tests. I am quite happy with it so far although that's just my experience. Haven't tried it on a large project with many contributors.

[1]: https://github.com/naiquevin/tapestry

yujzgzc a day ago

There are a lot more composability problems with SQL.

As an example, can you write a function that you can invoke on a column in a select statement and will compute the cumulative distribution function of that column?

The expression to do this is enormous "SUM(column) OVER (ORDER BY column ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) / SUM(column) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)" and there is no choice but to inline it. I think it would be reasonable to expect users to be able to describe this and then write simply "CDF(column)". The only systems that do this I believe are text processing macros on top of SQL...

reshlo a day ago

`CUME_DIST() OVER (ORDER BY column)`
https://modern-sql.com/caniuse/cume_dist
- yujzgzc 19 hours ago
  
  My point is that in most other languages, this kind of functionality can easily be factored into libraries available to the community. SQL doesn't have the composability so every single of these features requires to be added directly into the language. Libraries and package tooling is practically non-existent in SQL land.

r1b a day ago

Nodding along furiously. One area where this comes up is when you want to export some piece of logic that effectively “returns a table”.

For example, I work with a source of data that requires a particularly hairy JOIN strategy. The logic is the same for any table with the same “shape”, but no one can reuse it without copy / paste.

virtualwhys a day ago

See Esqueleto in Haskell, Slick and Quill in Scala, probably LINQ or some variant on .NET in C#/F#.
All support building up arbitrarily complex queries based on statically typed query fragments.
String fragment "composition" is like banging rocks together.

noisy_boy a day ago

Stop trying to fix SQL. Stop trying to make it do things that belong with a full fledged unit testable language. It is mainly a way to view, filter and aggregate data - nothing more. Having dealt with untestable thousand lines stored procedures (each of which are copy of a similar stored procedure with subtle variations), just don't put yourself in a world of pain by doing overcomplicated things at the database end.

And when you want to query data, use a view so that a) you don't need to deal to idiosyncrasies of your ORM deciding to do weird/suboptimal joins b) if tomorrow you have to make database side changes or optimize your view, all you need to care about is to keep the select clause the same c) you can just query/profile the view and be assured that what you see is what your application will see too (in general). But don't keep nesting them like mad and expect the database to do magic.

asqueella a day ago

There's a need to do complex data processing (not sure if it falls under your "view, filter and aggregate data - nothing more"), and it's an open question whether "full fledged" languages (presumably with an API, like pandas, ibis or spark), improved SQL, or another language focused on data transformation will be a better fit for this task.
paulddraper a day ago

> It is mainly a way to view, filter and aggregate data
It looks like it’s viewing filtering and aggregating data to me.
- noisy_boy 19 hours ago
  
  It is basically describing a wishlist where you can pass arbitrary sql as source for your "main" sql via valid sql itself without resolving to dynamic sql generation - presumably with support for type checks etc. Basically a level above a custom sql generator.
  Notwithstanding the fact that no database supports the described approach, this combined with flexibility of sql itself could give rise to many convoluted setups. Someone builds arbitrary nesting on top of this or does complex conditional constructs - suddenly you have doubts about what is the actual sql that will be produced/executed. It is not like you can write a unit test and attach a debugger to the sql to see whats going on within the nested levels. Redirecting to temporary tables and print based debugging is only available for procedural sql which is not the target here.
  What I meant by "mainly a view to view, filter and aggregate data" is that we should keep those core actions as simple as possible. Move the complexity to the application logic written in whatever language - then atleast you'll be able to test/troubleshoot it much better and gain from the existing mature approaches/tooling to do such things. Support for such things is grossly underdeveloped for sql and it gets hairy real fast when it comes to building layers with sql.

iforgot22 a day ago

I prefer to keep the SQL itself as simple as possible. I've gotten by OK by composing SQL in my outer code as needed. Despite relying crazy heavily on SQL, that's rarely needed. And the outer code has tests.

It'd just be nice if Postgres could loosen the syntax in a few annoying places, like allowing a trailing comma in "WITH q1 AS (...), q2 AS (...)" Or if something could be done to avoid needing the "WHERE true AND" trick.

mrtimo a day ago

Malloy is a newish data language with semantic models... Perhaps what semantic SQL would look like: https://www.malloydata.dev/

392 16 hours ago

Somehow Malloy always gets called out in these threads but not PRQL.
https://prql-lang.org/

default-kramer a day ago

Very nice. Adding this one to my list of "things other people wrote that nicely elucidate a shortcoming of SQL"

> Why would this be useful? Because SQL tables are global variables. By vanishing global variables, we automatically make every query fully testable.

And even if you don't care about testability, certainly you can appreciate that global variables are bad for composability.

maxulite a day ago

Imagine a programming language without functions...This would be tiresome. But that’s what SQL is

SQL limitations encourage creative solutions. I once implemented a simple "functional" framework as a virtual module in SQLite.

Some background. SQLite's virtual tables were always "functional". To make a virtual table function you basically require that particular constraint from where clause is present otherwise you will report a failure. Having this, you may create for example a table of all primes but with two columns, the prime value (obviously) and the count, but the column count will be mandatory (so the query for showing first 10 primes will be 'select value from primes where count=10'). Later SQLite added table-valued syntax and hidden column which allows the use of the syntax 'Select value from primes(10)'. But finally what made me implement a functional framework were eponymous virtual tables. Usually any virtual table should be present in schema, but eponymous tables don't require this and exist just by virtue of the corresponding module being registered.

So the framework uses all this machinery and a simple table is used for storing "functions". The table has two columns, query name and query body. The body may contain numbered SQLite parameters (?1, ?2, ?3). So this single row will automatically create a virtual table with the name from the query name column while all numbered parameters in the body will be converted into "input" parameters of this table (No need to manually parse, SQLite will take care of this thanks to its API). Sure, the bodies can use other queries from this library, SQLite handles this nicely.

gremgoth a day ago

SQL select-from-where queries can indeed be formalized as "pro-functors": https://arxiv.org/pdf/1602.03501

alexandrehtrb 18 hours ago

SQL Server has table-valued parameters, they are close to the functors described in the article, with the difference that TVPs define the whole table structure, instead of a subset of columns, as proposed. I liked the functors idea, to be honest, I've always wondered why this doesn't exist in SQL.

https://learn.microsoft.com/en-us/sql/relational-databases/t...

alex_duf a day ago

The one thing I do not like about this approach is that the functor is being created as a part of the schema.

In other words, the schema carries the logic as well as the data. And from experience with mixing data and logic, is that it's a bad idea, especially if your team size > 1. You always end up having maintenance difficulties due to the fact that your app has a part of the logic, and now your DB too. So where's the bug? in the app? in the DB? usually it's a mixture of the two. Is it possible do it properly? yes probably, but I wouldn't recommend it.

So counter proposition: We use functors, but they are just another way of executing SQL requests. And at this point there's really no difference to creating the table without the foreign key checks.

dgan a day ago

I was thinking about this exact problem, one year ago, and even started to sketch out what a better "meta-sql" would look like. However, i did lose myself in the weeds.

Notably :

- the metasql has to be embedded in the host language, the way normal sql is. This means, it also has to handle in/out bound parameters from the host. I guess this could be handled by a codegen step, where the functions of the host language are generated based on metasql. The alternative is to provide a runtime for each host language.. So this one is out

- i wanted a way to compose "columns sets". This is because you might want for example, to indicate that the query returns "all the columns in this table, minus thia and that". Or, for example "the union of this column set union in-column set". This feature looks like adding another meta-language on top of the metasql already lol I stopped at that point because i could find a graceful/looknice solution, let alone implement it.

mike_hearn a day ago
Sounds a bit like the Active Record/Repository pattern? The "meta sql" in that case is a natural language DSL, e.g.
```
    @Repository interface Employees : CrudRepository<Employee, Integer> {
        fun findBySalaryGreaterThanForUpdate(salary: Int): List<Employee>

        data class NameAndSalary(val name: String, val salary: Int)
        fun findByDepartment(department: String): List<NameAndSalary>
    }
```
The compiler reads the method name and writes the equivalent SQL query at compile time. The custom data class acts as a projection. This sort of thing can be found in Micronaut Data, or Spring or similar.
gregw2 a day ago

I implemented a data access permissions layer in SQL stored procs with setbased logic with allow/deny and include/exclude functionality and some 'security rules composability'.
Implementing it definitely took some of the hardest thinking at that job. But it's possible.

brikym a day ago

This what Kusto Query Language looks like:

  StormEvents
  | summarize 
      TotalStormsInState = count(),
      StormsWithCropDamage = countif(DamageCrops > 0)
      by State
  | extend PercentWithCropDamage = 
      round((todouble(StormsWithCropDamage) / 
  TotalStormsInState \* 100), 2)
  | sort by StormsWithCropDamage

I find it to be much more readable. With SQL CTEs is as close as you can get.

__thecoldwine__ a day ago

Kusto is fine, but joins are quite horrendous there. To be frank, I don't really get complains about "SQL is not readable". Most of the time SQL is the most compact, clear and to the purpose way to fetch the data (given the data had been modeled in a manner suitable for queriying).
Luckily, most of the tables for KQL in Azure are flat and not complex and that is partially what makes kusto so compelling.

efromvt a day ago

In trying to tackle this in the past, I've attempted to break down the things that SQL does well (and it does a lot well, I really like SQL) and how they could be individually more composable/reusable.

Semantic Information: - Express relationships between tabular assets [PKs, FKs, Joins] - Express relationships within tabular assets [row relations]

Transformational Expressiveness: - The general syntax - functions, aggregations, subqueries, grouping, etc - I think this is more of what functors get at, though the table valued cases blur it to some degree

SQL has some very nice tools for managing semantic layer abstractions, such as views and stored procedures, though they can suffer from change-management, opaqueness, and performance issues. I agree with some of the sentiments expressed in comments that enforced PKs are downstream of the semantic/business logic - they're more of tests than where you want your true business definitions to live. Ideally they'd be created automatically from your semantic definitions.

Functions are reasonable transformational expressiveness tools, but suffer from rarely being defined inline so it's hard to 'get' a statement that heavily uses them. Temporary functions are a really nice middle ground here.

In an ideal world, I think you can separate out composition across both dimensions - a semantic model, which I think requires decoupling from physical tables to some degree - views are an okay implementation, but I think DBT/others have shown the desire for this to exist outside the database layer to promote a better development lifecycle.

- a functional reuse model, which is also ultimately more about the development you can associate with it and the iteration and experience. I think having this not be coupled to the database itself but defined at the caller/sql layer is important to make it maintainable - some degree of imports/reuse would be helpful here.

Obligatory plug: I've written about experimenting with solving both: https://trilogydata.dev/blog/composability_reuse.html

george_ciobanu 5 hours ago

I built a 100% composable SQL at www.human.software

brianhama a day ago

MS-SQL would seem to already support everything you’ve suggested.

froderick a day ago

I already have functors, effectively, by writing sql queries in clojure with honeysql. The clojure data structures make behavior composition from reusable sql snippets easy, and the testing of sql snippets is just based on literals as the article describes. Works really well.

nateroling a day ago

I’ve done the same with SQLAlchemy in Python and SQLKata in C#.
Sadly the whole idea of composable query builders seems to have fallen out of fashion.
- jaza a day ago
  
  If composable query builders have fallen out of fashion, that's news to me. I've used SQLAlchemy in Python fairly heavily over the years, it's gotta be one of the best out there, I plan to continue using it for many more years. And I've recently gotten my feet wet with Ecto in Elixir, I'm quite impressed with it too.

beart a day ago

On the testing aspect - The TSQLT assembly (https://tsqlt.org/) is an attempt at unit testing within MSSQL. It works okayish for what it is. I've at least found it to be nice when testing inputs for SQL functions and procedures, as well as some business rules within tables and views.

A few IDEs provide integrations for TSQLT: Management Studio with the redgate extension, Datagrip, maybe others.

earnestinger 4 days ago

prior art:

https://research.google/pubs/sql-has-problems-we-can-fix-the...

https://www.scattered-thoughts.net/writing/against-sql/ - talks about compositionality

https://www.edgedb.com/blog/we-can-do-better-than-sql (2019)

(List taken from https://lobste.rs/s/vmyk7d/composable_sql#c_layb8n)

tie-in a day ago

While I appreciate the functional approach outlined in the article, there are alternative and, in my opinion, more practical ways to write composable and functional SQL queries embedded in a host programming language using query builders. For example: https://news.ycombinator.com/item?id=42265668

summerlight a day ago

Isn't it table-valued function? IIRC, the SQL standard still doesn't have it but it's almost universally supported extension across vendors.

zetalyrae a day ago

At least in Postgres, table-valued functions can't take tables as arguments, only scalars. That's the main difference: functors can not just return tables, but take tables satisfying some interface as arguments.
https://www.postgresql.org/docs/7.3/xfunc-tablefunctions.htm...
I thought I had written a footnote or appendix about this but I guess I forgot.
- dagss a day ago
  
  MSSQL can take tables as arguments if they are temporary tables declared to be of a certain type. But that restriction limits their use a lot.

zapkyeskrill a day ago

If you're creating fake tables for your solution, couldn't you then also create a fake table books(title) to satisfy the first problem statement?

earnestinger 4 days ago

Attempt at sql composability.

mandarlimaye 20 hours ago

Putting lipstick on a pig

revskill a day ago

I tried before with https://yaml2sql.netlify.app/

cess11 a day ago

I think the Ecto query DSL is much nicer.

But I'm also fine with writing SQL by hand and don't really feel an attraction to these kinds of more complicated things. I find automating pulling out EXPLAIN and schema into doc blocks is more useful.