Gaia will produce quite a lot of data, approximately 200 TB over 5 years, according to this Wikipedia article (http://en.wikipedia.org/wiki/Gaia_(spacecraft)), however that feels a bit hand-wavy. Even if we assume an order of magnitude error, and its more like 2 PB, it is still far far less than the 15 PB (images plus metadata) that will be produced over 10 years with the Large Synoptic Survey Telescope. And yes, the LSST is driving a large number of distributed computing research projects because of its unique processing requirements.
I’d be interested to know the ratio of the L0 data size to the L2/3/4 data[0] size – in other words, how many downlinked bits of sensor readout are processed per bit of useful science data? Of course that won’t be a hard number; I’m just curious to get a sense of the general scale on which the pipeline reduces pixels to physical parameters.