Cloudflare launches new AI tools to help customers deploy and run models

cateye · on Sept 30, 2023

The expected pricing is very strange:

Regular Twitch Neurons (RTN) - running wherever there's capacity at $0.01 / 1k neurons

Fast Twitch Neurons (FTN) - running at nearest user location at $0.125 / 1k neurons

Neurons are a way to measure AI output that always scales down to zero. To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings.

Who came up with this? This is ridiculous. I understand the underlying issues but would still prefer a metric like seconds of utilization multiplied by the size of worker.

Besides this, the expected pricing doesn't talk about the expected pricing but just the pricing model. Have the feeling that this is not going to be competitive to platforms like Vast.ai

jokethrowaway · on Sept 30, 2023

Depending on how many tokens a typical response is using, pricing will vary wildly but a rough estimate put the fast one as more expensive than chatgpt3.5 and the cheap one as way cheaper.

Quality will likely be heaps worse than chatgpt3.5, given it's llama 7b

It's 0.96$ per 100 fast chat responses It's 0.0076$ per 100 slow chat responses

Chatgpt 3.5 with 50 tokens input, 50 tokens output will give you 0.02$ per 100 fast responses If the llm responses are 500 tokens in and 500 tokens out then you get 0.2$ per 100 fast responses

I presume people will flock to the cheap version for when they can't afford the price and quality of chatgpt3.5.

akmittal · on Sept 30, 2023

So running fast is >100x expensive? That's too much of a difference

037 · on Sept 30, 2023

On the other hand, if it reflects their costs, I'm very happy to have an option that is 100x cheaper, rather than a more strategic one that raises the lower price by 10x.

tempaccount420 · on Sept 30, 2023

Sounds like a marketing choice

waitwhatwhoa · on Sept 30, 2023

Previously (including several comments from CF folks including cofounder eastdakota): https://news.ycombinator.com/item?id=37674097

redwood · on Sept 30, 2023

I'm curious if anyone is using their non-CDN or security products in production? What kind of things have you been doing?

willswire · on Oct 1, 2023

My entire App Store Server Notifications for iOS apps runs on Cloudflare Workers. I open-sourced the code a while ago (https://github.com/workerforce/store-sentry) but it hasn’t gained much traction

Roark66 · on Sept 30, 2023

There isn't much about pricing, but this fragment suggests it will be economical mostly for light use cases.

>“Currently, customers are paying for a lot of idle compute in the form of virtual machines and GPUs that go unused,”

I'm definitely looking forward to having a lot more competition in the "pay as you go LLM AI" space. Especially services that use models one can download and run on your own hardware once a good use case has been developed.

Nischalj10 · on Sept 30, 2023

this looks much like replicate. has anyone tried it(https://replicate.com/). How's the experience with cold start?

We have models that are crucial but do not require dedicated hosting. We are looking for an aws lambda type of service, but for a fine tuned llama2-13b. any suggestions? would try out Cloudflare AI too.

kaliqt · on Sept 30, 2023

The problem with all existing pay as you go vendors is that the overall price is exceptionally high if you use any decent amount of compute. That and cold start.

It's often cheaper and far more powerful in quality and latency to pay for a full server funnily enough.

Nischalj10 · on Oct 2, 2023

Yeah, once the traffic is large it does make sense to have own compute

Mernit · on Sept 30, 2023

Cloudflare AI and Replicate are great for running off-the-shelf models, but anything custom is going to incur a 10+ minute cold start.

For running custom fine-tuned models on serverless, you could look into https://beam.cloud which is optimized for serving custom models with extremely fast cold start (I'm a little biased since I work there, but the numbers don't lie)

Nischalj10 · on Sept 30, 2023

Thanks! Looks promising from the outside. Will surely check out

NicoJuicy · on Sept 30, 2023

Why would it incur a cold start of 10 minutes on cloudflare? :O

Any proof?

msoad · on Sept 30, 2023

is a few milliseconds in latency really a problem for current LLM models? They are already so slow that users are used to waiting 10s of seconds for a response anyways. I feel like until the actual latency of LLM models improve to sub-second, this is not a product that worth the price.

tyingq · on Sept 30, 2023

One of the offerings is language translation where latency might matter. Though I don't know how fast it is.

Cloudflare doesn't currently have a "not edge" worker, so anything they offer has to be "edge".

hunter2_ · on Sept 30, 2023

I haven't used ChatGPT or others, but Bard seems to answer within 1-2s in my experience. Your point remains, but are most LLMs really much slower than Bard?

gremlinsinc · on Sept 30, 2023

gpt4 is slowest IMHO, I use claude2 for most of my non coding needs it's more creative and writes better IMHO, gpt4 is better at tasks, technical, and code.

Claude 2 is very fast too...

but they're also offering more than just LLMs but also image models, sometimes it takes 190 seconds or more on playgroundai.com and 40 seconds on leonardo.ai, and about same on tensor.art.

I'm trying to get an ai Etsy store off the ground and faster gen times would be greatly appreciated.