Hello julian, the difference between comments and news is that a thread (a colle...

julian37 · on Oct 20, 2011

Hi Salvatore,

many thanks for the reply, that does clarify the issue.

So when sub-objects are only a few dozen per collection (per parent object) on average, storing them as JSON blobs allows Redis to "inline" them, yielding good memory usage. But for large numbers of items per parent, Redis can't inline them so you might as well represent them as a hash, which can never be inlined. Is this a fair summary?

I guess that ideally, all objects would always be represented in the same way from a client perspective, and the database engine would decide which internal format is best suited for storage, maybe using hints provided by the client, and handle any necessary (de)serialization as an implementation detail. That said, I know Redis is still a young project and I suppose this is something you guys are thinking to improve long-term anyway.

Cheers!

LeafStorm · on Oct 20, 2011

What Antirez is referring to is the fact that when a Redis hash is fairly small, it is stored as an array that is scanned linearly (worse time complexity, but lower memory usage), and is only "upgraded" to a true hash table when there are many key/value pairs. (This is mostly an implementation detail, however - from an API standpoint, you interact with hashes the same regardless of what internal representation they are using.)

In both cases, Lamernews still stores comment IDs as the hash keys and the JSON-serialized structures as the hash values, antirez was just commenting on a memory-saving feature of Redis for small hashes.

julian37 · on Oct 21, 2011

Thanks for chiming in, but now I'm confused. Antirez said:

  When there are this two levels, storing the first level as a Redis hash, and the second as JSON leads to very good memory usage performances, it is internally stored as a linear array.

I'm reading this to say that memory usage wouldn't be as good if the second level was stored as a hash instead of a JSON blob.

My specific question was: why store the comment as a JSON blob as opposed to a hash? I think Antirez answer was, in a nutshell: because a hash of hashes can't be stored as an array. For news items it doesn't matter because the hash couldn't be stored as an array anyway due to the (anticipated) large number of news items.

Are you saying this interpretation is wrong?

nknight · on Oct 21, 2011

> worse time complexity, but lower memory usage

While the time complexity is worse, absolute performance might be superior, due to both skipping the hashing arithmetic, and (due to that lower memory usage) fitting more data in the CPU cache.

white_devil · on Oct 21, 2011

> Don't use templates, they suck.

I'm curious, why do templates suck?