Hacker Newsnew | past | comments | ask | show | jobs | submit | gardnr's commentslogin

This is a 30B parameter MoE with 3B active parameters and is the successor to their previous 7B omni model. [1]

You can expect this model to have similar performance to the non-omni version. [2]

There aren't many open-weights omni models so I consider this a big deal. I would use this model to replace the keyboard and monitor in an application while doing the heavy lifting with other tech behind the scenes. There is also a reasoning version, which might be a bit amusing in an interactive voice chat if it pronounces the thinking tokens while working through to a final answer.

1. https://huggingface.co/Qwen/Qwen2.5-Omni-7B

2. https://artificialanalysis.ai/models/qwen3-30b-a3b-instruct


This is a stack of models:

- 650M Audio Encoder

- 540M Vision Encoder

- 30B-A3B LLM

- 3B-A0.3B Audio LLM

- 80M Transformer/200M ConvNet audio token to waveform

This is a closed source weight update to their Qwen3-Omni model. They had a previous open weight release Qwen/Qwen3-Omni-30B-A3B-Instruct and a closed version Qwen3-Omni-Flash.

You basically can't use this model right now since none of the open source inference framework have the model fully implemented. It works on transformers but it's extremely slow.



No... that website is not helpful. If you take it at face value, it is claiming that the previous Qwen3-Omni-Flash wasn't open either, but that seems wrong? It is very common for these blog posts to get published before the model weights are uploaded.

The previous -Flash weight is closed source. They do have weights for the original model that is slightly behind in performance https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct

Based on things I had read over the past several months, Qwen3-Flash seemed to just be a weird marketing term for the Qwen3-Omni-30B-A3B series, not a different model. If they are not the same, then that is interesting/confusing.

It is an in-house closed weight model for their own chat platform, mentioned in Section 5 of the original paper: https://arxiv.org/pdf/2509.17765

I've seen it in their online materials too but can't seem to find it now.


I can't find the weights for this new version anywhere. I checked modelscope and huggingface. It looks like they may have extended the context window to 200K+ tokens but I can't find the actual weights.

They link to: https://huggingface.co/collections/Qwen/qwen3-omni-68d100a86... from the blog post but it does seem like this redirects to their main space on HF so maybe they didn't yet make the model public?

> There is also a reasoning version, which might be a bit amusing in an interactive voice chat if it pronounces the thinking tokens while working through to a final answer.

last i checked (months ago) claude used to do this


Haha, you could hear how it’s mind thinks, maybe by putting a lot of reverb on the thinking tokens or some other effect…

I dont think the Flash model discussed in the article is 30B

Their benchmark table shows it beating Qwen3-235B-A22B

Does "Flash" in the name of a Qwen model indicate a model-as-a-service and not open weights?


Flash is a closed weight version of https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct (it is 30B but with addtional training on top of the open weight release). They deploy the flash version on Qwen's own chat.

Thanks

Was it being closed weight obvious to you from the article? Trying to understand why I was confused. Had not seen the "Flash" designation before

Also 30B models can beat a semi-recent 235B with just some additional training?


They had a Flash variant released alongside the original open weight release. It is also mentioned in Section 5 of the paper: https://arxiv.org/pdf/2509.17765

For the evals it's probably just trained on a lot of the benchmark adjacent datasets compared to the 235B model. Similar thing happened on other model today: https://x.com/NousResearch/status/1998536543565127968 (a 30B model trained specifically to do well in maths get near SOTA scores)


> This is a 30B parameter MoE with 3B active parameters

Where are you finding that info? Not saying you're wrong; just saying that I didn't see that specified anywhere in the linked page, or on their HF.


The link[1] at the top of their article to HuggingFace goes to some models named Qwen3-Omni-30B-A3B that were last updated in September. None of them have "Flash" in the name.

The benchmark table shows this Flash model beating their Qwen3-235B-A22B. I dont see how that is possible if it is a 30B-A3B model.

I don't see a mention of a parameter count anywhere in the article. Do you? This may not be an open weights model.

This article feels a bit deceptive

1: https://huggingface.co/collections/Qwen/qwen3-omni


I love lightweight distros. QNX had a "free as in beer" distro that fit on a floppy, with Xwindows and modem drivers. After years of wrangling with Slackware CDs, it was pretty wild to boot into a fully functional system from a floppy.

> QNX had a "free as in beer" distro that fit on a floppy, with Xwindows and modem drivers.

I don’t think that had the X Windows system. https://web.archive.org/web/19991128112050/http://www.qnx.co... and https://marc.info/?l=freebsd-chat&m=103030933111004 confirm that. It ran the Photon microGUI Windowing System (https://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx....)



I never understood how that QNX desktop didn't pick up instanntly, it was amazing !

Licensing, and QNX missed a consumer launch window by around 17 years.

Some businesses stick with markets they know, as non-retail customer revenue is less volatile. If you enter the consumer markets, there are always 30k irrational competitors (likely with 1000X the capital) that will go bankrupt trying to undercut the market.

It is a decision all CEO must make eventually. Best of luck =3

"The Rules for Rulers: How All Leaders Stay in Power"

https://www.youtube.com/watch?v=rStL7niR7gs


This also underscores my explanation for the “worse is better” phenomenon: worse is free.

Stuff that is better designed and implemented usually costs money and comes with more restrictive licenses. It’s written by serious professionals later in their careers working full time on the project, and these are people who need to earn a living. Their employers also have to win them in a competitive market for talent. So the result is not and cannot be free (as in beer).

But free stuff spreads faster. It’s low friction. People adopt it because of license concerns, cost, avoiding lock in, etc., and so it wins long term.

Yes I’m kinda dissing the whole free Unix thing here. Unix is actually a minimal lowest common denominator OS with a lot of serious warts that we barely even see anymore because it’s so ubiquitous. We’ve stopped even imagining anything else. There were whole directions in systems research that were abandoned, though aspects live on usually in languages and runtimes like Java, Go, WASM, and the CLR.

Also note that the inverse is not true. I’m not saying that paid is always better. What I’m saying is they worse is free, better was usually paid, but some crap was also paid. But very little better stuff was free.


There is also the option by well written professional wherer the startergy is to grab as much market share as they can by allowing the proliferation of their product to lockup market/mindshare and rleaget the $ enforcement for later - successfully used by MSWindows for the longest time and Photoshop .

Conversly i remenber Maya or Autodesk used to have a bounty program for whoever would turn in people using unlicensed/cracked versions of their product.Meanwhile Blender (from a commercial past) kept their free nature and have connsistently grown in popularity and quality without any such overtures.

Of course nowadays with Saas everything get segmented into wierd verticals and revenue upsells are across the board with the first hit usually also being free.


As a business, dealing with Microsoft and Oracle is not a clean transactional sale.

They turned into legal-service-firms along the way, and stopped real software development/risk at some point in 2004.

These firms have been selling the same product for decades. Yet once they get their hooks into a business, few survive the incurred variable costs of the 3000lb mosquito. =3


The only reason FOSS sometime works was because the replication cost is almost $0.

In *nix, most users had a rational self-interest to improve the platform. "All software is terrible, but some of it is useful." =3


As someone who used QNX back then, they didn't target end-users but embedded and real-time users.

They were expensive too. You had to pay for each device driver you used.


because it's not free and their aim was at developers and the embedded space. How many people have even heard of QNX?

That famous QNX boot disk was the first thing I thought of when reading the title as well.

Me too! And the GUI was only a 40KB distribution and was waaaaaay better than Windows 3.0!

And incredibly responsive compared to the operatings systems of even today. Imagine that: 30 years of progress to end up behind where we were. Human input should always run at the highest priority in the system, not the lowest.

That ended with Win9x. It was the last OS where the mouse and keyboard inputs were processed as hardware interrupts.

yeah but what can you do with free QNX? With tinycore, you can install many packages. What packages exist for QNX?

> yeah but what can you do with free QNX?

« QNX DEMO disk

Extending possibilities and adding undocumented features »

http://qnx.puslapiai.lt/qnxdemo/qnx_demo_disk.htm


Location: NZ / USA

Remote: Yes

Willing to relocate: Yes

Technologies: Mainly web SaaS (TypeScript/Python) with a focus on AI, embeddings, LLMs and Voice Agents.

Résumé/CV: https://docs.google.com/document/d/1mcK18EVVkjQ-KadXu0VW6Ovx...

Email: gardner@bickford.ai


Full length: https://www.youtube.com/watch?v=WXuK6gekU1Y

They do a great job capturing the "Move 37" moment: https://youtu.be/WXuK6gekU1Y?t=2993


Telegram, which ostensibly claimed to provide e2e but really only did in very specific circumstances? My right wing uncle is still bitter about that. Then there's the rolling over the founder did after getting pulled up by Interpol.

I'm not sure why anyone would trust Telegram.


Exactly what "rolling over" would that be?

Maybe you don't believe Durov's statement[0] about it. But is there any actual evidence anywhere that they've ever violated the secrecy of non-e2e private groups or messages for anyone? I've yet to find any.

[0] https://t.me/durov/342


Oh, I agree. it's just, they mentioned things like Twitter DM, Messenger and forgot Telegram...

That wasn’t the original question though. Twitter and Messenger are also untrustworthy. Telegram’s message export is very good compared to all the other options.

He says the current models generalize dramatically worse than people: https://youtu.be/aR20FWCCjAs?t=1501

Then, he starts to talk about the other ideas but his lawyers / investors prevent him from going into detail: https://youtu.be/aR20FWCCjAs?t=1939

The worrisome thing is that he openly talks about whether to release AGI to the public. So, there could be a world in which some superpower has access to wildly different tech than the public.

To take Hinton's analogy of AGI to extraterrestrial intelligence, this would be akin to a government having made contact but withholding the discovery and the technology from the public: https://youtu.be/e1Hf-o1SzL4?t=30

It's a wild time to be alive.


It’s also weird to think that if there is extraterrestrial contact, it will most definitely happen in the specific land mass known as the United States and only the US government will be collecting said technology and hiding it. Out of the entire planet, contact is possible only in the USA.


I'm not sure if you're jabbing at the concept of American supremacy, or Hinton's idea, or my position. I don't live in the USA right now, but I am happy to participate in conversation. That's why I am here.

Can you unpack your ideas a bit more?


I thought about the huge pile of hard drives in Utah this morning. The TLAs in the USA have a metric shit ton of data that _should_ not be used but _could_ be used.

Even still, we need evolutions in model architecture to get to the next level. Data is not enough.


They are going to be used.


A lot of what's on that pile of hard drives is ciphertext waiting for cryptographically relevant quantum computing to arrive.

LLMs can't do jack shit with ciphertext (sans key).


The gains were on benchmarks. Ilya describes why this is a red herring here: https://youtu.be/aR20FWCCjAs?t=286


Gemini 3 is a huge jump. I can't imagine how anyone who uses the models all the time wouldn't feel this.


What does it do that Opus doesn't do?

I like Ilya's points but its also clearly progress, and we can't just write it off because we like another narrative


This is an AI generated article based on real interviews.

Watch the original Sutskever interview: https://www.youtube.com/watch?v=aR20FWCCjAs

And LeCun: https://www.youtube.com/watch?v=4__gg83s_Do


Not to say you are wrong but how do you know it's AI generated rather than written by Sorca Marian? To me phrases like "Models look god-tier on paper" look more human than AI as a) "god-tier" never came up in the interview and b) it's brief and doesn't waffle on.


If it was AI-generated I had no difficulty with it, certainly on par with typical surface level journalist summaries, and vastly better than losing 2 hours of my life to watching some video interviews.. :) AI as we know it may not be real intelligence but it certainly has valid uses


I get a lot from seeing the person talk vs reading a summary. I have gone back and watched a lot of interviews and talks with Ilya. In hindsight, it is easy to hear the future ideas in his words at the time.

That said, I use AI summaries for a lot of stuff that I don't really care about. For me, this topic is important enough to spend two hours of my life on, soaking up every detail.

As for being on par with typical surface level journalism. I think we might be further into the dead internet than most people realize: https://en.wikipedia.org/wiki/Dead_Internet_theory


> I get a lot from seeing the person talk vs reading a summary

And some people simply have the opposite preference. There's lots of situations where sound is simply not an option. Some people are hearing impaired. Some people are visually impaired, and will definitely not get much from watching the person speak. Some ESL people have a hard time with spoken English. Even some native English speakers have a hard time with certain accents. Some people only have 5 minutes to spare instead of 50.

All of those problems are solved by the written word, which is why it hasn't gone away yet, even though we have amazing video capabilities.

You can have a preference without randomly label everything you don't like as AI slop.


How does it compare and contrast to DNSCrypt?

https://github.com/DNSCrypt


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: