Fast Twitch Neurons (FTN) - running at nearest user location at $0.125 / 1k neurons
Neurons are a way to measure AI output that always scales down to zero. To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings.
Who came up with this? This is ridiculous. I understand the underlying issues but would still prefer a metric like seconds of utilization multiplied by the size of worker.
Besides this, the expected pricing doesn't talk about the expected pricing but just the pricing model. Have the feeling that this is not going to be competitive to platforms like Vast.ai
Depending on how many tokens a typical response is using, pricing will vary wildly but a rough estimate put the fast one as more expensive than chatgpt3.5 and the cheap one as way cheaper.
Quality will likely be heaps worse than chatgpt3.5, given it's llama 7b
It's 0.96$ per 100 fast chat responses
It's 0.0076$ per 100 slow chat responses
Chatgpt 3.5 with 50 tokens input, 50 tokens output will give you 0.02$ per 100 fast responses
If the llm responses are 500 tokens in and 500 tokens out then you get 0.2$ per 100 fast responses
I presume people will flock to the cheap version for when they can't afford the price and quality of chatgpt3.5.
On the other hand, if it reflects their costs, I'm very happy to have an option that is 100x cheaper, rather than a more strategic one that raises the lower price by 10x.
My entire App Store Server Notifications for iOS apps runs on Cloudflare Workers. I open-sourced the code a while ago (https://github.com/workerforce/store-sentry) but it hasn’t gained much traction
There isn't much about pricing, but this fragment suggests it will be economical mostly for light use cases.
>“Currently, customers are paying for a lot of idle compute in the form of virtual machines and GPUs that go unused,”
I'm definitely looking forward to having a lot more competition in the "pay as you go LLM AI" space. Especially services that use models one can download and run on your own hardware once a good use case has been developed.
this looks much like replicate. has anyone tried it(https://replicate.com/). How's the experience with cold start?
We have models that are crucial but do not require dedicated hosting. We are looking for an aws lambda type of service, but for a fine tuned llama2-13b. any suggestions? would try out Cloudflare AI too.
The problem with all existing pay as you go vendors is that the overall price is exceptionally high if you use any decent amount of compute. That and cold start.
It's often cheaper and far more powerful in quality and latency to pay for a full server funnily enough.
Cloudflare AI and Replicate are great for running off-the-shelf models, but anything custom is going to incur a 10+ minute cold start.
For running custom fine-tuned models on serverless, you could look into https://beam.cloud which is optimized for serving custom models with extremely fast cold start (I'm a little biased since I work there, but the numbers don't lie)
is a few milliseconds in latency really a problem for current LLM models? They are already so slow that users are used to waiting 10s of seconds for a response anyways. I feel like until the actual latency of LLM models improve to sub-second, this is not a product that worth the price.
I haven't used ChatGPT or others, but Bard seems to answer within 1-2s in my experience. Your point remains, but are most LLMs really much slower than Bard?
gpt4 is slowest IMHO, I use claude2 for most of my non coding needs it's more creative and writes better IMHO, gpt4 is better at tasks, technical, and code.
Claude 2 is very fast too...
but they're also offering more than just LLMs but also image models, sometimes it takes 190 seconds or more on playgroundai.com and 40 seconds on leonardo.ai, and about same on tensor.art.
I'm trying to get an ai Etsy store off the ground and faster gen times would be greatly appreciated.
Regular Twitch Neurons (RTN) - running wherever there's capacity at $0.01 / 1k neurons
Fast Twitch Neurons (FTN) - running at nearest user location at $0.125 / 1k neurons
Neurons are a way to measure AI output that always scales down to zero. To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings.
Who came up with this? This is ridiculous. I understand the underlying issues but would still prefer a metric like seconds of utilization multiplied by the size of worker.
Besides this, the expected pricing doesn't talk about the expected pricing but just the pricing model. Have the feeling that this is not going to be competitive to platforms like Vast.ai