Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not to mention that perfectly normalizing a database always incurs join overhead that limits horizontal scalability. In fact, denormalization is required to achieve scale (with a trade-off).

I’m not sure how formal verification would’ve prevented this issue from happening. In my experience, it’s unusual to have to specify a database name in the query. How could have formal verification covered this outcome?

The recommendations don’t make sense saying that the query needed DISTINCT and LIMIT. Don’t forget that the incoming data was different (r0 and default did not return the same exact data, this is why the config files more than doubled in size), so using DISTINCT would have led to uncertain blending of data, producing neither result and hiding the double-database read altogether. Secondly, LIMIT only makes sense to use in conjunction with a failure circuit breaker (if LIMIT items is returned, fail the query). When does it make business-logic sense to LIMIT the query-in-question’s result? And do you think the authors would have known how to set the LIMIT to not exceed the configuration file consumers’ limitations?

The article says: > “You can’t reliably catch that with more tests or rollouts or flags. You prevent it by construction—through analytical design.”

That’s the big design up front fallacy. Of course you can catch it reliably with more tests, and limit the damage with flags and rollouts. There’s zero guarantee that the analytical design would’ve caught this up front.



> Not to mention that perfectly normalizing a database always incurs join overhead that limits horizontal scalability. In fact, denormalization is required to achieve scale (with a trade-off).

This is just not true, at least not in general. Inserting on a normalized design is usually faster, due to smaller index sizes, fewer indexes and fitting more rows per page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: