Sorry, wasn't clear enough, tenant ID = customer ID = workspace ID. There is often a benefit though to materializing that ID onto all child tables. If you take a salesforce.com example, every customer has an "orgID", from there have accounts and accounts have opportunities. It's not common for people to put the "orgID" on the opportunity from a data modeling perspective, but when it comes to sharding it gives you a big win.
If you have that tenant discriminator on all tables then it's easy to route it to the appropriate physical and logical shard right away vs. having to do some DB join first in your request to figure out where it goes.
I’ve stumbled across this in databases I’ve inherited before and while I appreciated that I could easily do filtering and stuff without joining across tables (and based on the comments here, more easily shard when things scale), it seems to denormalize the data.
Is there a specific term for the trade off of denormalized vs more easily query/shard/etc?
> Is there a specific term for the trade off of denormalized vs more easily query/shard/etc?
No, it's pretty much just called that. A good DBA will be able to strike the right balance between normalization and performance using their intuition and experience, which is one of the reasons they're often paid very well despite working in relatively ancient ecosystems.
Generally, the downside of denormalizing is that you risk inconsistency, and some extra storage. The inherent assumption in this case is that the customer ID (or whatever the key is) doesn't change for any associated table, and in that world, inconsistency is not a big risk. So yea, you store some extra data but if you've designed your ids well, it's not that big of a cost.
If you have the customer-ID in the data it basically represents the "owner" of the data. Typically the data of one owner does not refer to the data of other owners. Also the ownership of given data probably never changes so updating it is not a problem.
I wonder if this could or should be a built-in feature in databases. It is "meta-data" meaning data about data, who owns it.
If you have that tenant discriminator on all tables then it's easy to route it to the appropriate physical and logical shard right away vs. having to do some DB join first in your request to figure out where it goes.