the difference between comments and news is that a thread (a collection of comments for a given news) is an hash made of sub-hashes, the hash is ID -> comment_hash. The comment hash just contains the different fields.
When there are this two levels, storing the first level as a Redis hash, and the second as JSON leads to very good memory usage performances, it is internally stored as a linear array. We can do that because we know most news will have just a few tens of comments. If there are more, the hash will turn into a real hash table transparently, more space, but worth it for the rare cases when this is needed.
The news instead is just a collection of fields. There is no outer object that is reasonably sized, like "all the news obeject" (it is too big), so there is no gain in using this approach.
What is good about storing sub-objects of hashes as JSON objects is that Redis unstable just got JSON support in Lua scripts, so it will also be able to manipulate this objects sending Redis small Lua scripts.
many thanks for the reply, that does clarify the issue.
So when sub-objects are only a few dozen per collection (per parent object) on average, storing them as JSON blobs allows Redis to "inline" them, yielding good memory usage. But for large numbers of items per parent, Redis can't inline them so you might as well represent them as a hash, which can never be inlined. Is this a fair summary?
I guess that ideally, all objects would always be represented in the same way from a client perspective, and the database engine would decide which internal format is best suited for storage, maybe using hints provided by the client, and handle any necessary (de)serialization as an implementation detail. That said, I know Redis is still a young project and I suppose this is something you guys are thinking to improve long-term anyway.
What Antirez is referring to is the fact that when a Redis hash is fairly small, it is stored as an array that is scanned linearly (worse time complexity, but lower memory usage), and is only "upgraded" to a true hash table when there are many key/value pairs. (This is mostly an implementation detail, however - from an API standpoint, you interact with hashes the same regardless of what internal representation they are using.)
In both cases, Lamernews still stores comment IDs as the hash keys and the JSON-serialized structures as the hash values, antirez was just commenting on a memory-saving feature of Redis for small hashes.
Thanks for chiming in, but now I'm confused. Antirez said:
When there are this two levels, storing the first level as a Redis hash, and the second as JSON leads to very good memory usage performances, it is internally stored as a linear array.
I'm reading this to say that memory usage wouldn't be as good if the second level was stored as a hash instead of a JSON blob.
My specific question was: why store the comment as a JSON blob as opposed to a hash? I think Antirez answer was, in a nutshell: because a hash of hashes can't be stored as an array. For news items it doesn't matter because the hash couldn't be stored as an array anyway due to the (anticipated) large number of news items.
While the time complexity is worse, absolute performance might be superior, due to both skipping the hashing arithmetic, and (due to that lower memory usage) fitting more data in the CPU cache.
the difference between comments and news is that a thread (a collection of comments for a given news) is an hash made of sub-hashes, the hash is ID -> comment_hash. The comment hash just contains the different fields.
When there are this two levels, storing the first level as a Redis hash, and the second as JSON leads to very good memory usage performances, it is internally stored as a linear array. We can do that because we know most news will have just a few tens of comments. If there are more, the hash will turn into a real hash table transparently, more space, but worth it for the rare cases when this is needed.
The news instead is just a collection of fields. There is no outer object that is reasonably sized, like "all the news obeject" (it is too big), so there is no gain in using this approach.
What is good about storing sub-objects of hashes as JSON objects is that Redis unstable just got JSON support in Lua scripts, so it will also be able to manipulate this objects sending Redis small Lua scripts.
I hope this clarifies the issue.