Are events and webhooks mutually exclusive? How about a combination of both: eve...

sb8244 · on July 13, 2021

What about supporting fast lookup of the event endpoint, so it can be queried more frequently?

I think that a combo of webhooks / events is nice, but "what scope do we cut?" is an important question. Unfortunately, it feels like the events part is cut, when I'd argue that events is significantly more important.

Webhooks are flashier from a PM perspective because they are perceived as more real-time, but polling is just as good in practice.

Polling is also completely in your control, you will get an event within X seconds of it going live. That isn't true for webhooks, where a vendor may have delays on their outbound pipeline.

jacobr1 · on July 13, 2021

The article advocate for long-polling

sb8244 · on July 13, 2021

Yea, you're right. I am reading the advocacy as "if you need real-time, then support long-polling."

I see the value in this, but I actually disagree with the article in terms of that being the best solution. Long-polling is significantly different than polling with a cursor offset and returning data, so you wouldn't shoe-horn that into an existing endpoint.

pak9rabid · on July 14, 2021

Couldn't keeping a request open indefinitely open the system up to the potential of DoS attacks though? Correct me if I'm wrong, but isn't it kind of expensive to keep HTTP requests open for an indeterminate amount of time, especially if the system in question is servicing many of these requests concurrently?

coldacid · on July 13, 2021

I think that's what the author was getting at, after reading through the whole article. The idea isn't to get rid of webhooks, but provide an endpoint that can be used when webhooks won't necessarily work.

snarkypixel · on July 13, 2021

Very similar to how I built my previous application.

1) /events for the source of truth (I.e. cursor-based logs) 2) websockets for "nice to have" real-time updates as a way to hint the clients to refetch what's new

saurik · on July 13, 2021

Yeah... I'd go so far as to argue that this is the only architecture that should even ever be considered, as only having one half of the solution is clearly wrong.

alexbouchard · on July 13, 2021

This is the way to go and I'd love to see more API's with robust events endpoint for polling & reconciliation. Deletes are especially hard to reconcile with many APIs since they aren't queryable and you need to instance check if every ID still exist. Shopify I'm looking at you.

shvedsky · on July 13, 2021

Yes to the combination of both. I worked on architecture and was responsible for large-scale systems at Google. Reliable giant-scale systems do both event subscription and polling, often at the same time, with idempotency guarantees.

j_san · on July 13, 2021

Sorry if I'm daft, could you/someone explain why one would want to use both at the same time for the same system?

One thing that makes sense: if you go down use polling so you can work at your own pace. But this isn't really at the same time. When/why does it make sense to do both simultaneously?

shvedsky · on July 13, 2021

There is an inherent speed / reliability tradeoff that is extremely difficult to solve inside one message bus. When you get to truly large systems with a lot of nines of reliability, it starts to make sense to use two systems:

1. Fast system that delivers messages very quickly but is not always partition-tolerant or available 2. Slower, partition tolerant system with high availability but also higher latency (i.e. a database)

The author goes through this in the very first section. Webhook events will eventually start getting lost often enough for the developer to think about a backup mechanism.

Long-polling works if you have a lot of memory on your database frontend. Most shared databases want none of your long-running requests to occupy their memory which is better used for caches.

Even if your message bus has the ability to store and re-deliver events, you might want to limit this ability (by assigning a low TTL). Consider that the consumer microservice enters and recovers from an outage. In the meantime, the producer's events will accummulate in the message service. At the same time, the consumer often doesn't need to consume each individual event but rather some "end state" of some entity or a document. If all lost events were to get re-delivered, the consumers wouldn't be able to handle them, and would enter an outage again. This is where deliberately decreasing the reliability of the message bus and rely on polling would automatically recover the service.

There are other reasons, of course. The author is absolutely correct in their statement, though: whenever a system is implemented using hooks / messages, its developers always end up supplementing it with polling.

kissgyorgy · on July 13, 2021

What's the point of implementing webhooks once you implemented long polling for the /events endpoint?

mbrevda1 · on July 13, 2021

I'd argue against long/persistent polling. Webhooks allows for zero resource usage until a message needs to be delivered.

zimpenfish · on July 14, 2021

> Webhooks allows for zero resource usage until a message needs to be delivered.

Doesn't that only work in the case where the server treats each webhook delivery as ephemeral? If you're keeping a queue to allow reliable / repeatable delivery, that's definitely not "zero resource usage", right?

kissgyorgy · on July 14, 2021

On the sender side, sure. On the receiver side? You have to have a service listening 0-24.

luuio · on July 13, 2021

I don't think the original comment meant long polling (i.e. keeping the connection alive), they meant periodically call the endpoint to check for events.

cwyers · on July 13, 2021

The article advocates for long polling of endpoints.