I still didn’t get why they built this, there’s a better explanation of the feature set in the FAQ comparison with parquet: https://docs.arcticdb.io/latest/faq/
> How does ArcticDB differ from Apache Parquet?¶
> Both ArcticDB and Parquet enable the storage of columnar data without requiring additional infrastructure.
> ArcticDB however uses a custom storage format that means it offers the following functionality over Parquet:
> Versioned modifications ("time travel") - ArcticDB is bitemporal.
> Timeseries indexes. ArcticDB is a timeseries database and as such is optimised for slicing and dicing timeseries data containing billions of rows.
> Data discovery - ArcticDB is built for teams. Data is structured into libraries and symbols rather than raw filepaths.
> Support for streaming data. ArcticDB is a fully functional streaming/tick database, enabling the storage of both batch and streaming data.
> Support for "dynamic schemas" - ArcticDB supports datasets with changing schemas (column sets) over time.
> Support for automatic data deduplication.
The other answer I was looking for was why not kdb since this is a hedge fund.
I think people are getting a little tired of being held ransom to kdb/q and kdb consultants. Even if you have 'oil barrels' of money, eventually it is annoying enough to look elsewhere.
The idea they had was to just make something up, like pandas and a series/columnar database, themselves. There are other competitors to kdb/q, but they are not as entrenched and maybe not a perfect fit. These guys cooked up a closer fit for their own systems, than, say clickhouse and other tools.
It had to be a very close fit to what they do, as kdb/q is pretty damned good at some things in finance. Maybe there is not enough money in the highly specialised areas it does very well at, for other people to come in with something new.
It would be a huge mistake to think sql is a replacement for Q.
Given how high a selling point this is, it is something that I cannot recall ever using in ArcticDB (5+ years of using). It's (financial) time series data, the past doesn't change.
Back adjusting futures contracts or back adjusting for dividends / splits come to mind as I write this, but I would just reprocess these tasks from the raw data "as of date" if needed.
For pricing data, sure, but for things like fundamentals or economics where there are estimates that are revised over time, one way to store this data is with a versioning feature. It allows for PIT data without much overhead, in theory.
And actually, pricing can be revised as well, though it is much less common.
That said, versioning is not the only way to handle these kinds of things.
Versioning can also be useful as an audit trail for data transformation. Though again, these could be stored in another way as well.
100% makes sense, it depends what you're looking for.
There are however lots of time-series that do change in Finance, e.g. valuations, estimates, alt-data (an obvious one is weather predictions). The time-travel feature can being super useful outside of external data changing as well, as an audit-log, and as a way to see how your all-time evaluations have changed (say backtests).
I’ve never seen it used for backtests personally. Generally backtested results are saved as a batch of variants following some naming convention as different symbols.
You would use some convention for naming and parametrising backtests, 'different' backtests would get stored separately. But once you start updating backtests, running them in a loop with changing data, that's when the time-travel feature starts to be useful.
> How does ArcticDB differ from Apache Parquet?¶
> Both ArcticDB and Parquet enable the storage of columnar data without requiring additional infrastructure.
> ArcticDB however uses a custom storage format that means it offers the following functionality over Parquet:
> Versioned modifications ("time travel") - ArcticDB is bitemporal. > Timeseries indexes. ArcticDB is a timeseries database and as such is optimised for slicing and dicing timeseries data containing billions of rows. > Data discovery - ArcticDB is built for teams. Data is structured into libraries and symbols rather than raw filepaths. > Support for streaming data. ArcticDB is a fully functional streaming/tick database, enabling the storage of both batch and streaming data. > Support for "dynamic schemas" - ArcticDB supports datasets with changing schemas (column sets) over time. > Support for automatic data deduplication.
The other answer I was looking for was why not kdb since this is a hedge fund.