Introducing BERT and BERT-RPC: GitHub's new serialization and RPC protocol

vicaya · on Oct 21, 2009

Any benchmarks compared with Thrift and Protobuf? BTW, Hadoop's new Avro project is yet another serialization project that's designed to be dynamic language friendly.

I personally find thrift IDL trivial to manage and is probably the only sane way to support a dozen languages (statically and dynamically typed) in a type safe and efficient way.

BERT-RPC is simple, but what if you find your server too slow and needs to be rewritten in another language? I think the Thrift approach makes that simpler.

antonovka · on Oct 21, 2009

This is just silly:

I just can’t. I find the entire concept behind IDLs and code generation abhorrent. Coming from a background in dynamic languages and automated testing, these ideas just seem silly.

The point of IDLs is to provide a stable, exact, portable and succinct definition of an inter-application protocol. In defining the protocol explicitly in a language neutral way, it is easier to ensure conformance and correctness of implementation.

As a side effect, it does make code generation easy, allows one to optionally apply static typing to ensure that messages are always correctly formed, and ensures a _DRY_ approach across the board.

You could just as easily implement poorly defined interfaces using Protobuf or Thrift -- nobody says that you actually have to use an IDL, and the serialization format doesn't require it as long as you keep the messages self-describing. Moreover, a lot of effort has gone into making Protobuf (and even Thrift) as efficient as possible -- yet another serialization format is wholly unnecessary.

[Edit] Google even outlines one way to implement self-describing messages using the existing protobuf standard in the project documentation: http://code.google.com/apis/protocolbuffers/docs/techniques....

Other methods including simply encoding fields as tuples of (name, value) -- protobuf includes value types in the field encoding.

reginaldo · on Oct 21, 2009

Agreed. There's also something else: in a well implemented IDL-based framework, you automatically avoid part of the version compatibility issue.

Imagine that you add fields/services to a class, without breaking the API. As long as you don't change the ids on your definition, clients with code generated for an old version of the class should still be able to call its services, without serialization/deserialization errors.

DougWebb · on Oct 21, 2009

IDL is best if it can be generated directly from the service code (using appropriate and syntactically valid markers in the service code to specify what to expose) or if the IDL can be used to generate an interface layer for the service which automatically attaches itself to the correct implementation methods.

Otherwise, you've got to maintain the IDL separately from the service implementation, and keep the two in sync manually. That's a pain, and the bigger your service and/or the more people are involved with developing it the worse the pain gets.

I have three concerns about BERT:

1) It looks like every method in your service code is automatically exposed. That could turn into a security problem. I prefer to explicitly indicate which methods to expose.

2) It looks like your function names have to match the function names exposed by the service. That's fine as a default, but could become a problem over time as the service evolves. Sometimes you want to rewrite the guts without breaking API backward compatibility,

3) I'm not sure how complex the RPC part is; HTTP is plenty capable of sending binary requests and returning binary responses on its own.

SirWart · on Oct 21, 2009

I think the code generation part is a lot worse than the IDL part.

antonovka · on Oct 21, 2009

It makes usage quite a bit cleaner for some target languages, but it's also optional.

jerf · on Oct 21, 2009

Is this intended to be put out for others to use? My reaction to it as an internal protocol is "OK, that's interesting" (I have my own streaming JSON message protocol myself), but putting it out for others to use clogs up an awfully full space (you'll note I'm not linking you to a "library" since my protocol isn't worth releasing).

One issue that leaps to mind (based on the fact my streaming JSON protocol also travels through Erlang as it happens) is what you do with dicts? Erlang's native representation for dicts is extremely hostile as a term to send to other languages:

    Erlang R13B01 (erts-5.7.2) [source] ...etc

    Eshell V5.7.2  (abort with ^G)
    1> dict:from_list([{"abc", "def"}, {"efg", "hij"}]).
    {dict,2,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
          {{[],
            [["efg",104,105,106]],
            [],[],[],
            [["abc",100,101,102]],
            [],[],[],[],[],[],[],[],[],[]}}}

That's a hash with two entries, abc -> def and efg -> hij. Note that's the internal representation; dumping out a perl hash gives you a semantic-level representation but under the hood it too has dirty nasty bits like that.

If that's not easy, that's a big price to pay for moving binaries around, in general. In specific cases it could be fine.

evgen · on Oct 21, 2009

Why would you send an actual dict instead of a proplist? Every serialization protocol I am aware of does not send the actual internal dict/hash/map representation but instead flattens it down to what is effectively a list of key-value pairs that may or may not be tagged by the protocol as something to be reconstituted as a dict at the remote end. Looking at the spec it appears that this is what BERT does as well.

jerf · on Oct 21, 2009

Because the result in Erlang will not be a dict, it will be a list of key/value pairs. That's not the same.

Doing that automatically means you have to do something else to label it as a key/value list meant to back to a dict.

These are not insurmountable problems, just issues that have to be dealt with. Any problem in Erlang can be solved with another layer of record indirection, but those don't come for free.

I must not have looked closely enough to find the source; I did try.

bham · on Oct 22, 2009

http://github.com/mojombo/bert/blob/master/lib/bert/encoder....

http://github.com/mojombo/erlectricity/blob/master/lib/erlec...

simonw · on Oct 21, 2009

I thoroughly enjoyed the justification given for inventing something new despite the existence of XML / JSON / Thrift etc.

rads · on Oct 21, 2009

Why not REST? What's wrong with HTTP?

shykes · on Oct 21, 2009

Less overhead. Sometimes it just makes sense to use a binary format on top of a binary channel.

jacobolus · on Oct 22, 2009

Because not everything in the world maps well onto request-response, and trying to put stateful bi-directional communication on top of HTTP is an ugly hack.

DougWebb · on Oct 22, 2009

While that's true, it seems that a function call maps precisely onto request-response: call a function, get back a return value. Statefulness isn't a problem for REST or HTTP either, because it's the communication protocol that's stateless, not the resources. For example, websites are REST services, the pages on the website are the resources, and they certainly exist: that's their state. What's stateless is that it doesn't matter what order you retrieve the pages; you'll always get the same page for a given URL. The only exception is when the request is intended to change the state of some resource, such as this thread changing state when I submit the comment I'm writing.

For a RESTful function call service, each function would be a resource, and the functions would be required not to have side-effects (eg: don't change the state of the service). That's often good design for any API, not just RPC-type services. Again, the exception would be function calls whose purpose is to modify the service state in some way.

When the service state is intended to be different for each user of the service, HTTP has a solution for that too: a cookie which contains a token identifying the session. Any state changes can be associated with the token.

Now you mention bi-directional. If you mean that either side of the interaction can initiate an exchange, then you're right; HTTP is no good for that. Not many RPC services work that way though; it's more complex to design. It's much more common and simpler to have client/server interactions where the client initiates every exchange, and the server listens and responds. You can do a lot with that model.

jacobolus · on Oct 22, 2009

> It's much more common and simpler to have client/server interactions

I can’t agree with such a blanket statement. It completely depends on your goals and application logic. All kinds of protocols/applications are designed to be bi-directional (multiplayer games (think MUDs, FICS, etc.), IRC/Jabber/other chat, VoIP, push email, message queues, multi-user document editors, telnet/SSH, remote device monitoring apps, etc. etc. etc.

Just because such apps aren't so often seen on the web, where their function would have to be hacked on top of HTTP, doesn't mean they aren't common, in general.

Running your multi-player game protocol over HTTP, using cookies to record the session, and polling for updates, just so you can say you’re being RESTful (and "simpler"?) is a terrible terrible idea, because if the semantics of your app stay the same, you basically have equally complex logic, now with several layers of indirection and overhead tossed in for no reason.

> For a RESTful function call service, each function would be a resource, and the functions would be required not to have side-effects

Okay, I want my function call to be "I just captured your queen with my rook, and you'd better find out about it because it's now your turn". How does that work exactly?

DougWebb · on Oct 22, 2009

I didn't say that everything was a client/server type of interaction; I said that client/server interactions are more common than bi-directional client/client interactions. You've provided a list of client/client protocols, which is great, but I still think client/server is much more common.

For client/client, you're right: HTTP is not appropriate. If you can stomach the XML Jabber (aka XMPP, right?) looked like a good general-purpose bi-directional protocol, and if I'm not mistaken that's what Google Wave is using. If you don't want to pay the XML parsing/serializing overhead, other protocols are available, and maybe that's why BERT-RPC was created. My original point was simply that, depending upon the requirements, the existing HTTP protocol might be sufficient for the kind of interaction needed and it's supported by a rich infrastructure.