If you want a real self-contained, serverless, zero-config JSON document store, try UnQLite or the new SQLite JSON extension. I've written about both of them on my blog if you're curious:
There are several use cases where this is a sure-fire way of shooting yourself in the foot:
* if you have many records (i.e. more than a couple hundred), the file system will have a lot of work to do and the whole thing becomes sluggish
* if you want to query the data by content, there's nothing that gives you sublinear search capability here
* it's not easily possible to modify data under this scheme. If you add that functionality, you'll have the familiar choice between race conditions and added complexity. Having said that, if you don't modify the data you can also remove the store and just use the data itself (maybe encrypted if you need that) instead of the key.
As an alternative to this, consider each process appending to a file and keeping filename+offset as the identifier for a particular record. This solves at least the "too many files" problem.
Or, if you care for reading a static collection, put your Json (or some moral equivalent, e.g. msgpack) into a CDB
database:
http://cr.yp.to/cdb.html
Next step up: use LevelDB, or KyotoCabinet/KyotoTycoon to organize the storage.
Whether a large number of files becomes sluggish really depends on your file system. In any case, a common technique is to break large numbers of files into subfolders which usually does reasonably well at solving this problem.
As for updating, flock[0] solves this issue on operating systems which support it.
Usenet news and maildir are cases where current operating systems already have to cope with that kind of load, so it's definitely possible.
The question is, can this be useful without becoming a partial and bug-ridden reimplementation of a NoSQL database (just because we have NoSQL databases that fit the bill and carry less maintenance costs wrt a spit-and-glue solution).
ReiserFS (v3) is a great small filesystem that's fantastic at lots of small files (and also copes great with power outage events, like on my laptop for the last 15 years). I've had tons of issues with ext3/4 (running out of extents, slow performance on lots of small files), btrfs (running out of metadata space when I still have hundreds of GB left?!), xfs (great on everything except lots of tiny files or power loss). It even supported reliable shrinking and growing on LVM.
It's too bad no one is supporting it anymore, since the founder is in prison and the only other people who seem to be able to support it seem to be focused on a Reiser4 pipe dream instead of supporting great, reliable technology that had most of the bugs worked out a long time ago.
> if you have many records (i.e. more than a couple hundred), the file system will have a lot of work to do and the whole thing becomes sluggish
On OS X, running the simple test which creates, reads, and deletes 1,000 documents is no problem. The big concern is reaching the maximum number of inodes or files in a single directory limit. This can be worked around by "sharding" into sub-directories based on the first character of the UUID.
I've been wanting something like SQLite for JSON for awhile now. Or alternately, something like Mongo but without the server process.
The challenge is the query system, finding documents again. This JSONlite doesn't have one (yet) other than retrieving documents by UUID. There's been some work to make jq usable as a library, that seems like a good basis for JSON queries. https://github.com/stedolan/jq/wiki/C-API:-libjq
EJDB [0] is pretty good and seems to be actively developed again after a period of inactivity. It uses BSON rather than JSON directly but close enough, and the queries are modelled after Mongo too.
I made a C++14 wrapper [1] a while ago which will probably still work unless EJDB itself had some drastic API changes. (I also have a C++14/1y BSON/JSON library [2] that's handy for working with EJDB, but is a bit of a playground for template metaprogramming so compile times will explode with certain functionality).
The main problem with EJDB is that it's not crash tolerant so you need signal handlers to attempt a graceful flush/close on a global handle.
I took a look at EJDB recently and thought it looked like a pretty neat project. One potential gotcha is that it is built on TokyoCabinet which AFAIK is no longer maintained. Looks like they have plans to build a v2, though.
I commented above, but UnQLite is basically SQLite for JSON documents. If you're brave, SQLite has a new JSON extension as well, which looks pretty awesome.
Honestly, I would not be able to use this unless it had, at least, a native C library. Having it be bash-reliant, though quick and simple, makes it much harder to actually use it in anything native (shelling out is generally a BadTime™). I do not mean to discourage you though, using JSON as a human-readable data-store is something I've definitely done before, so you have a good idea, I just differ in my needs in the implementation.
Personally, I've used jansson[1] for this kind of thing in the past since it makes working with JSON in C an absolute breeze.
We have native implementations for iOS, Android (Java), Windows (C#), and play well with the Apache CouchDB ecosystem, so you can also use projects like PouchDB.
I wish I had time to write the code it would take to add p2p sync to this project. It might not be hard considering the PouchDB source already has implementations of most of the algorithms in JavaScript.
For all those who are curious, I'm building something similar - but it has realtime sync (like Firebase) and offline support: http://github.com/amark/gun . I'm glad others are working on this problem too!
I actually used a system back in 1997 that used exactly this data store system but with a proprietary encoding rather than JSON. It worked really well in a single-user single-process capacity but we had to adjust the block size of the filesystem so we didn't run out of inodes quickly.
The only killer was when some muppet put it on an NFS share when they were trying to be clever and get it working for 5 users. Every failure mode possible turned up at once.
Edit: also to keep directory lookups cheap, it used nested filesystem prefix-based: /a/b/c/1/abc112341982389129382 for example.
The vast majority of people who use SQLite use it because they have flexible queries they want to run. Just storing structures to disk is so simple it is usually just easier to integrate with your codebase.
Just guessing, but it looks like jsonlite relies on the uniqueness property of UUIDs to 'guarantee' uniqueness in what amounts to it's primary-key index, the index of ids.
User-defined keys would need to be checked against all existing keys before inserting, to ensure uniqueness. Not impossible, but potentially much slower.
* http://charlesleifer.com/blog/introduction-to-the-fast-new-u...
* http://charlesleifer.com/blog/using-the-sqlite-json-extensio...