JSONlite – A self-contained, serverless, zero-configuration, JSON document store

_pgmf · on Sept 22, 2015

If you want a real self-contained, serverless, zero-config JSON document store, try UnQLite or the new SQLite JSON extension. I've written about both of them on my blog if you're curious:

* http://charlesleifer.com/blog/introduction-to-the-fast-new-u...

* http://charlesleifer.com/blog/using-the-sqlite-json-extensio...

NelsonMinar · on Sept 22, 2015

That JSON1 extension for SQLite looks very promising. Thanks for the writeup!

amyjess · on Sept 22, 2015

Those both look wonderful.

Thank you!

sqrt17 · on Sept 22, 2015

There are several use cases where this is a sure-fire way of shooting yourself in the foot:

* if you have many records (i.e. more than a couple hundred), the file system will have a lot of work to do and the whole thing becomes sluggish

* if you want to query the data by content, there's nothing that gives you sublinear search capability here

* it's not easily possible to modify data under this scheme. If you add that functionality, you'll have the familiar choice between race conditions and added complexity. Having said that, if you don't modify the data you can also remove the store and just use the data itself (maybe encrypted if you need that) instead of the key.

As an alternative to this, consider each process appending to a file and keeping filename+offset as the identifier for a particular record. This solves at least the "too many files" problem.

Or, if you care for reading a static collection, put your Json (or some moral equivalent, e.g. msgpack) into a CDB database: http://cr.yp.to/cdb.html

Next step up: use LevelDB, or KyotoCabinet/KyotoTycoon to organize the storage.

michaelmior · on Sept 22, 2015

Whether a large number of files becomes sluggish really depends on your file system. In any case, a common technique is to break large numbers of files into subfolders which usually does reasonably well at solving this problem.

As for updating, flock[0] solves this issue on operating systems which support it.

[0] http://linux.die.net/man/2/flock

sqrt17 · on Sept 22, 2015

Usenet news and maildir are cases where current operating systems already have to cope with that kind of load, so it's definitely possible.

The question is, can this be useful without becoming a partial and bug-ridden reimplementation of a NoSQL database (just because we have NoSQL databases that fit the bill and carry less maintenance costs wrt a spit-and-glue solution).

_hyn3 · on Sept 22, 2015

ReiserFS (v3) is a great small filesystem that's fantastic at lots of small files (and also copes great with power outage events, like on my laptop for the last 15 years). I've had tons of issues with ext3/4 (running out of extents, slow performance on lots of small files), btrfs (running out of metadata space when I still have hundreds of GB left?!), xfs (great on everything except lots of tiny files or power loss). It even supported reliable shrinking and growing on LVM.

It's too bad no one is supporting it anymore, since the founder is in prison and the only other people who seem to be able to support it seem to be focused on a Reiser4 pipe dream instead of supporting great, reliable technology that had most of the bugs worked out a long time ago.

nodesocket · on Sept 22, 2015

> if you have many records (i.e. more than a couple hundred), the file system will have a lot of work to do and the whole thing becomes sluggish

On OS X, running the simple test which creates, reads, and deletes 1,000 documents is no problem. The big concern is reaching the maximum number of inodes or files in a single directory limit. This can be worked around by "sharding" into sub-directories based on the first character of the UUID.

amirouche · on Sept 22, 2015

wiredtiger is faster that LevelDB in my experience.

sqrt17 · on Sept 22, 2015

Interesting, hadn't heard of it. Thanks!

NelsonMinar · on Sept 22, 2015

I've been wanting something like SQLite for JSON for awhile now. Or alternately, something like Mongo but without the server process.

The challenge is the query system, finding documents again. This JSONlite doesn't have one (yet) other than retrieving documents by UUID. There's been some work to make jq usable as a library, that seems like a good basis for JSON queries. https://github.com/stedolan/jq/wiki/C-API:-libjq

chrismanning · on Sept 22, 2015

EJDB [0] is pretty good and seems to be actively developed again after a period of inactivity. It uses BSON rather than JSON directly but close enough, and the queries are modelled after Mongo too.

I made a C++14 wrapper [1] a while ago which will probably still work unless EJDB itself had some drastic API changes. (I also have a C++14/1y BSON/JSON library [2] that's handy for working with EJDB, but is a bit of a playground for template metaprogramming so compile times will explode with certain functionality).

The main problem with EJDB is that it's not crash tolerant so you need signal handlers to attempt a graceful flush/close on a global handle.

[0] https://github.com/Softmotions/ejdb

[1] https://github.com/chrismanning/ejpp

[2] https://github.com/chrismanning/jbson

_pgmf · on Sept 22, 2015

I took a look at EJDB recently and thought it looked like a pretty neat project. One potential gotcha is that it is built on TokyoCabinet which AFAIK is no longer maintained. Looks like they have plans to build a v2, though.

_pgmf · on Sept 22, 2015

I commented above, but UnQLite is basically SQLite for JSON documents. If you're brave, SQLite has a new JSON extension as well, which looks pretty awesome.

halosghost · on Sept 22, 2015

Honestly, I would not be able to use this unless it had, at least, a native C library. Having it be bash-reliant, though quick and simple, makes it much harder to actually use it in anything native (shelling out is generally a BadTime™). I do not mean to discourage you though, using JSON as a human-readable data-store is something I've definitely done before, so you have a good idea, I just differ in my needs in the implementation.

Personally, I've used jansson[1] for this kind of thing in the past since it makes working with JSON in C an absolute breeze.

[1] http://www.digip.org/jansson/

baxter001 · on Sept 22, 2015

How is this not massively worse than hacking something together using shelve and json from the python standard library?

cbgbt · on Sept 22, 2015

It turns out it's much faster, even if you ditch shelve and use anydmb to avoid pickle.

baxter001 · on Sept 22, 2015

Really? Over what size of document collection?

jchrisa · on Sept 22, 2015

We have a mature JSON database with p2p sync capability, that is open source under the Apache 2.0 license.

http://developer.couchbase.com/mobile

We have native implementations for iOS, Android (Java), Windows (C#), and play well with the Apache CouchDB ecosystem, so you can also use projects like PouchDB.

I wish I had time to write the code it would take to add p2p sync to this project. It might not be hard considering the PouchDB source already has implementations of most of the algorithms in JavaScript.

sweetiewill · on Sept 22, 2015

Pretty neat that a Peer-2-Peer Photo Sharing app can be built using Couchbase Technologies -

http://blog.couchbase.com/photodrop

marknadal · on Sept 22, 2015

For all those who are curious, I'm building something similar - but it has realtime sync (like Firebase) and offline support: http://github.com/amark/gun . I'm glad others are working on this problem too!

buffoon · on Sept 22, 2015

I actually used a system back in 1997 that used exactly this data store system but with a proprietary encoding rather than JSON. It worked really well in a single-user single-process capacity but we had to adjust the block size of the filesystem so we didn't run out of inodes quickly.

The only killer was when some muppet put it on an NFS share when they were trying to be clever and get it working for 5 users. Every failure mode possible turned up at once.

Edit: also to keep directory lookups cheap, it used nested filesystem prefix-based: /a/b/c/1/abc112341982389129382 for example.

KirinDave · on Sept 22, 2015

What is this... what is its intended use?

The vast majority of people who use SQLite use it because they have flexible queries they want to run. Just storing structures to disk is so simple it is usually just easier to integrate with your codebase.

So, cool shell script hack. But what's it for?

nodesocket · on Sept 22, 2015

In the README: "JSONlite is a proof of concept, and it may not make any sense to actually use it in development or production."

ketralnis · on Sept 22, 2015

A proof of what concept, though?

djfm · on Sept 22, 2015

Git is also a very nice zero-conf tool to store arbitrary objects and retrieve them by hash :)

lawlessone · on Sept 22, 2015

If i had something like this for Android it would be great.

GaiusCoffee · on Sept 22, 2015

Shared Preferences* works for small data.

*http://developer.android.com/training/basics/data-storage/sh...

mdellavo · on Sept 22, 2015

Shameless plug of my implementation of this pattern: https://github.com/mdellavo/Sack

It's just a small bit of code I drag along with me, surprisingly useful.

a_c · on Sept 22, 2015

What about adding an option to using a user defined key instead of the uuid. I think it will be useful

s_kilk · on Sept 22, 2015

Just guessing, but it looks like jsonlite relies on the uniqueness property of UUIDs to 'guarantee' uniqueness in what amounts to it's primary-key index, the index of ids.

User-defined keys would need to be checked against all existing keys before inserting, to ensure uniqueness. Not impossible, but potentially much slower.

nodesocket · on Sept 22, 2015

Correct, this all works, because we assume the UUID is globally unique, thus a read is a O(1) operation.

duaneb · on Sept 22, 2015

Please rename? The current one gives no idea that it's a data store until AFTER you see the association with SQLite.