Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The table data in database is the canonical form. You can delete the transaction logs, and temporarily lose some reliability. It is very common to delete the transaction logs when not needed. When databases are backed up, they either dump the logical data or take snapshot of the data. Then can take stream of transaction logs for syncing or backup until the next checkpoint.

I'm pretty sure journalled filesystem recycle the journal. There are log-structured filesystem but they aren't used much beyond low-level flash.



Sorry, this is mistaking the operational for the fundamental.

If a transaction log is replayed, then an identical set of relations will be obtained. Ergo, the log is the prime form of the database.

It’s that simple.


At a very abstract level, maybe. But it's common not to log changes that can trivially be rolled back, like insertions into a table that was created or truncated within the transaction. Of course, such optimizations are incompatible with log-based replication. So the statement should probably be, “in a system with log-based replication, the log is authoritative, and the tables are just an optimization”. This framing also avoids ambiguities because a transaction log may not be fully serialized, and might not fully determine table contents.


At work we need to distribute daily changes to a dataset, so we have a series of daily deltas. If a new client is brought up, they need to apply all the deltas to get the current dataset.

This is time consuming, so we optimized it by creating "base versions" every month. So a client only needs to download the latest base version and the apply the deltas since then...


Which is what accountants call "closing the books". Once all ledgers have been reconciled, old ledgers can be archived and you go forward from the last closing.

Forensic accounting, incidentally, is when something went badly wrong and outside accountants have to go back through the old ledgers, and maybe old invoices and payments and reconstruct the books. FTX had to do that after the bankruptcy to find out where the money went and where it was supposed to go.


The transaction log maintained from time 0 would be equivalent but too expensive to store compared to the tables.


If you relax your constraint to "retain logs for the past N days", you can accumulate the logs from T=0 to T=(today - N) into tables and still benefit from having snapshots from that cutoff onwards.


On the contrary, I’ve known plenty of sites that keep their logs.

Often written to tape, for obvious reasons.


Conversely, given a database, you can't (in general) reconstruct the specific transaction log that resulted in it. You can reconstruct some log, but it's not uniquely defined and is missing a lot of potentially relevant information.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: