Hacker Newsnew | past | comments | ask | show | jobs | submit | Eager's commentslogin

Open weights is one thing, but we don't even have that with OpenAI at least.

Even then, open weights is like me checking in a .exe and acting surprised if people look at me funny.

I'm definitely in the camp where all the artefacts are provided along with fully reproducible build and test environment for anyone who wants to retrace the steps.

Whatever 'open' means, I don't think it is eight shell companies, not even weights provided and closely guarded secrets around how RLHF, alignment and safety testing is carried out.

In fact, you would think that being 'open' about at least alignment and safety testing procedures would be the least one could expect.

I do understand that revealing these things may disclose zero day exploits for bad actors, but on the other hand, being open for inspection is how things get fixed, and I've never been a fan of security through obscurity.


I can't tell if you are joking.

It seems like you are implying the reason we do stuff is to make money, or at least the main motivating factor for you, or you believe it is for most people.

I started working in a factory when I was about 13 casually, and by 14 I was putting in enough hours for my technical apprenticeship.

During those years I did get paid, and I was thankful for it, but I definitely wasn't doing it for the money, and I was still going to school and then to college in parallel.

I'm sure things are different these days. It just makes me sad to think personal progress is somehow conflated with earning money.

With regard to the article. I think given what we know about social networks and young minds, it's already been shown to be a toxic mix. Getting kids on the capitalistic treadmill so young seems extremely cynical and it really does concern me.


> It seems like you are implying the reason we do stuff is to make money, or at least the main motivating factor for you, or you believe it is for most people.

Being able to make money, especially as a kid would've been ultimate compliment or appreciation to me. Anyone can say kind words, "good job", but if I make money, somehow what I do is actually valuable to other people. Obviously, especially amazing if it is something that I am passionate about, and skilled at. I don't believe this about most people, neither do I disbelieve it, I'm just talking from how I would've felt as a kid. And certainly many of my friends. It would've highly developed my self-confidence.

> With regard to the article. I think given what we know about social networks and young minds, it's already been shown to be a toxic mix. Getting kids on the capitalistic treadmill so young seems extremely cynical and it really does concern me.

I think there are actually much worse things to worry about. If you want to worry about child exploitation, then indeed worry about factory workers who are forced into those conditions, making shoes, but not about kids who are able to use their creativity to provide value because they love doing that.


A few months ago I made a neat little linux utility.

It was a drop in replacement shim for an arbitrary executable that would pretend to be the original when invoked, fork off the original and hook up to its stdout and stderr.

The error output was then fed to a custom GPT assistant that knew what program the errors came from. That assistant was tasked with turning the original errors into friendly human readable form. The output from the assistant was then sent out of the shim stderr.

It worked very well, but then I got really sick and wasn't able to work on it anymore.

I was using it for GCC / Clang errors because I had become tired of staring at heavily nested compiler dumps for concept/template issues, but you could use it for anything of course.

It would be a nice project for someone to build again, do it properly and generalize it since it doesn't look like I am going to be bouncing around again for a while.


This is rather excellent!

It would be great to have a VST or CLAP (preferably) plugin that hosts strudel and can be hosted in-DAW

It might also be nice to have a VSCode extension that let's you experiment in live mode like the guys at cmajor.dev have.

Finally, I could see this working really well with say cmajor in general or for embedded synthesizer development on say the Daisy Seed if there were a native version.

In any case, it's all great as it is. I will give it a try integrating with Bitwig this afternoon.


For live collaboration you can use the flok editor: https://flok.cc


I don't know the pharmacology, but it is well known that Saffron makes you happy and a bit giggly.


The grocery store relies on the mystique of the thing. The people making those expensive jars probably also seed articles online about it being extremely expensive. Then when the average person goes to the store they look at it and aren't shocked by the ridiculous price because somewhere at the back of their mind they will have read it is worth more than gold. lol. Meanwhile, anyone in the know is buying it by the wheel for quite reasonable amounts, and likely better purity.


Grinding with sugar is definitely the way to go. I generally buy a big wheel of Iranian Saffron on eBay or a Persian shop, then grind it all up in a mortar and pestle with some sugar and store it in a spice jar in a dark cupboard. Then when you need it you don't need to be faffing about. If you want to be fancy, save a few whole strands that you can add to rice for decoration, but using whole stamens is really not necessary.


> I generally buy a big wheel of Iranian Saffron on eBay or a Persian shop

Shouldn't this be a dead ringer for the stuff being fake? We don't trade with Iran.



I think that's just the US, though, right?

UK imported 11 tons of saffron from Iran in 2023 - https://wits.worldbank.org/trade/comtrade/en/country/GBR/yea...


Did you know that not everyone on HN is American? This is such a silly I'm the main character syndrome.


Yes we do.


I have been playing with it for the last couple of weeks.

I do a lot of traditional music production for fun and was wondering how I could use Suno together with Leonardo for video and then bring it all together in my existing tools.

Here are some examples. I wrote the lyrics by hand and the music has been reinforced with my existing studio equipment.

For me, that is where the gold is. Not replacing myself, but extending what I can do.

https://youtu.be/Qip6eUbD8zs

https://youtu.be/mfFV3Cm_Kow

https://youtu.be/DZSpi6ySe-g


I've been listening to some Frank Sinatra AI made songs, youtube for the song "Frank Sinatra - where is my mind". It's originally a Pixies song but as as recreation with Frank Sinatra's voice is surprisingly good. I wonder what took to produce that, it seems like there's a lot more artistry than just prompting make a song by artist X as a cover for song Y of artist Z.


Oh that's a nice one. I think you are on to something that if you want to go beyond push button output you really need to think holistically about what you want to achieve.

My experiments taught me that it is an instrument that is easy to approach but hard to master right now at least.

It's kind of weird because you have to play it through writing text, which is super strange.

Everything I have done previously has been very in the moment with keyboards or guitars or whatever. With this I had to put a lot more thought in ahead of time, and try to put myself in the position of the generator and how it might take my prompts and convert them.

There is certainly quite some thinking that has to go on. How to describe the sound you are looking for is a little bit unusual.

I am finding it quite fascinating anyway, and I really believe it can be tasteful if done right and as the tools improve.

As far as covering specific artists, it isn't something I am attempting, it's more about the feel for me. I imagine people who do that have their own tricks and techniques.

As far as I know, Suno blocks any mention of specific artists. Probably you can jailbreak it, but I don't know about that.


All the sounds always sound so blurred together, as if there’s no space between the drum, bass, pad, melody, vocal etc layers


I agree in especially the first one. It was quite a challenge from the original content as the bitrate is quite low still, and in that particular case it was quite severely compressed. I had to dig quite deep to get the headroom back.

The last one was my first attempt.

I think the middle one is okay considering. By the time I got to that I had figured out how to get Suno to create multiple takes with much more open mixes, which left me a lot more latitude in the studio.

I expect I will get better at it, and I don't doubt the compression artifacts and the rest will improve.

To me at least, it is quite impressive where we are at. A ways to go, but very promising.


I agree, the middle one is pretty good. I also love the part of the video where the singer's head appears to be on fire...she's literally smoking! https://youtu.be/mfFV3Cm_Kow?si=R2gU1U4qfkzW04cX&t=48


I have tried feeding some of the foundation models obfuscated code from some of the competitions.

People might think that the answers would be in the training data already, but I didn't find that to be the case. At least in my small experiments.

The model's did try to guess what the code does. They would say things like, "It seems to be trying to print some message to the console". I wasn't able to get full solutions.

It's definitely worth more research, not just as a curiosity, but these kinds of problems are good proxies for other tasks and also excellent benchmarks for LLMs particularly.


This is why round-tripping the code is important.

If you decompile the binary to source, then compile the source back to binary you should get the original binary.

You just need to do this enough times until the loss drops to some acceptable amount.

It's a great task for reinforcement learning, which is known to be unreasonably effective for these types of problems.


>If you decompile the binary to source, then compile the source back to binary you should get the original binary.

You really can't expect that if you're not using exactly the same version of exactly the same compiler with exactly the same flags, and often not even then.


Yes, that's a limitation of trying to ensure exact binary reconstruction. Luckily there is also a separate line of work on detecting the compiler version and optimization flags based on a binary – it turns out this is not that hard and it's easy to get a bunch of labeled data for a classifier.

If folks are interested in reading more there's a nice paper by Grammatech on the idea: https://eschulte.github.io/data/bed.pdf (though it's pre-LLM and uses evolutionary algorithms on the initial decompilation to search for a version that recompiles exactly).


Right.

A less formidable problem with higher chances of succeeding is from a given binary to figure out first compiler, compiler-version, compiler-flags.

From there you could have a model for every combination or at least a model for the compiler variant and use the other info (version, flags) as input to the model.


Maybe we then need an LLM to tell us if two pieces of compiled code are equivalent in an input-output mapping sense (ignoring execution time).

I'm actually serious; it would be exceedingly easy to get training data for this just by running the same source code through a bunch of different compiler versions and optimization flags.


An LLM cannot do this. I don’t even mean this in a formal sense, because your problem is addressed by Rice’s Theorem, which places bounds on what any system (LLM or not) can do here; I mean it in the sense that an LLM isn’t even appropriate to use here because the best it can possibly do is provide you with its best guess at the answer. And while this might be a useful property for decompilation in general that’s not what was being discussed here.


Rice's theorem does NOT prevent a program from giving correct answers to non-trivial properties of programs (including the halting problem or other undecidable problems) for 99.99% of inputs and "I don't know" for 0.01% of inputs. It only states that you cannot write a program that provides a correct and definitive yes-or-no for 100% of inputs.

For a decompiler, being able to decompile even 90% of programs would be awesome. We're not looking for theoretical perfectness.


Why would an llm be the tool for that job?


Without analytical thinking how else would you come to conviction that two functions are identical, for a computationally unfeasible number of possible inputs?


Formal logic / formal proofs. We have good systems for verifying that.

The proper flow is that you use LLM to generate decompilation steps, along with potential proofs, and then use old algorithms from 1970s that verify that the steps are correct.

Source: I built a decompiler for EVM, arguably the best one on the market, and to some extent it was how it worked (and others comparable in class).

The issue was always the exploration of possible transformations of code, once you manage to find the right ones (which LLMs can propose way better than old hard coded rules and SMT solvers), it's simple to verify that the transformations are correct.


You try your best, and if you provide enough examples, it will undoubtedly get figured out.


I think you're misunderstanding OP's objection. It's not simply a matter of going back and forth with the LLM until eventually (infinite monkeys on typewriters style) it gets the same binary as before: Even if you got the exact same source code as the original there's still no automated way to tell that you're done because the bits you get back out of the recompile step will almost certainly not be the same, even if your decompiled source were identical in every way. They might even vary quite substantially depending on a lot of different environmental factors.

Reproducible builds are hard to pull off cooperatively, when you control the pipeline that built the original binary and can work to eliminate all sources of variation. It's simply not going to happen in a decompiler like this.


Well, no, but yes.

The critical piece is that this can be done in training. If I collect a large number of C programs from github, compile them (in a deterministic fashion), I can use that as a training, test, and validation set. The output of the ML ought to compile to the same way given the same environment.

Indeed, I can train over multiple deterministic build environments (e.g. different compilers, different compiler flags) to be even more robust.

The second critical piece is that for something like a GAN, it doesn't need to be identical. You have two ML algorithms competing:

- One is trying to identify generated versus ground-truth source code

- One is trying to generate source code

Virtually all ML tasks are trained this way, and it doesn't matter. I have images and descriptions, and all the ML needs to do is generate an indistinguishable description.

So if I give the poster a lot more benefit of the doubt on what they wanted to say, it can make sense.


Oh, I was assuming that Eager was responding to klik99's question about how we could identify hallucinations in the output—round tripping doesn't help with that.

If what they're actually saying is that it's possible to train a model to low loss and then you just have to trust the results, yes, what you say makes sense.


I haven't found many places where I trust the results of an ML algorithm. I've found many places where they work astonishingly well 30-95% of the time, which is to say, save me or others a bunch of time.

It's been years, but I'm thinking back through things I've reverse-engineered before, and having something which kinda works most of the time would be super-useful still as a starting point.


Have you ever trained a GAN?


Technically, yes!

A more reasonable answer, though, is "no."

I've technically gone through random tutorials and trained various toy networks, including a GAN at some point, but I don't think that should really count. I also have a ton of experience with neural networks that's decades out-of-date (HUNDREDS of nodes, doing things like OCR). And I've read a bunch of modern papers and used a bunch of Hugging Face models.

Which is to say, I'm not completely ignorant, but I do not have credible experience training GANs.


That's true but a solvable problem. I once tried to reproduce the build of an uncooperative party and it was mainly tedious and boring.

The space of possible compiler arguments is huge, but ultimately what is actually used is mostly on a small surface.

Apart from that, I wrote a small tool to normalize the version string, timestamps and file path' in the binaries before I compared them. I know there are other sources of non-determinism, but these three things were enough in my case.

The hardest part were the numerous file path' from the build machine. I had not expected that. In hindsight, stripping both binaries before comparison might have helped, but I don't remember why I didn't do that.


Err, no, sorry, it won't. Compilers don't work that way. There's a lot of ways to compile down source to machine code and the output changes from compiler version to compiler version. The LLM would have to know exactly how the compiler worked at which version to do this. So the idea is technically possible but not technically feasible.


What exactly are you suggesting will get figured out?


The mapping from binary to source code.


Even ignoring all sources of irreproducibility, there does not exist a bijection between source and binary artifact irrespective of tool chain. Two different toolchains could compile the same source to different binaries or different sources to the same binary. And you absolutely shouldn't be ignoring sources of irreproducibility in this context, since they'll cause even the same toolchain to keep producing different binaries given the same source.


Exactly, but neither the source nor the binary is what's truly important here. The real question is: can the LLM generate the functionally valid source equivalent of the binary at hand? If I disassemble Microsoft Paint, can I get code that will result in a mostly functional version of Microsoft Paint, or will I just get 515 compile errors instead?


This is what I thought the question was really about.

I assume that an llm will simply see patterns that look similar to other patterns and make assosciations and assume ewuivalences on that level, meanwhile real code is full of things where the programmer, especially assembly programmers, modify something by a single instruction or offset value etc to get a very specific and functionally important result.

Often the result is code that not only isn't obvious, it's nominaly flatly wrong, violating standards, specs, intended function, datasheet docs, etc. If all you knew were the rules written in the docs, the code is broken and invalid.

Is the llm really going to see or understand the intent of that?

They find matching patterns in other existing stuff, and to the user who can not see the infinite body of that other stuff the llm pulled from, it looks like the llm understood the intent of a question, but I say it just found the prior work of some human who understood a similar intent somewhere else.

Maybe an llm or some other flavor of ai can operate some other way like actually playing out the binary like executing in a debugger and map out the results not just look at the code as fuzzy matching patterns. Can that take the place of understanding the intents the way a human would reading the decompiled assembly?

Guess we'll be finding out sooner of later since of course it will all be tried.


The question was about the reverse mapping.


Except LLMs cannot reason.


LLMs can mimic past examples of reasoning from the dataset. So, it can re-use reasoning that it has already been trained on. If the network manages to generalize well enough across its training data, then it can get close to reproducing general reasoning. But it can't yet fully get there, of course.


Do you have evidence LLMs can indeed generalize outside their training data distribution?

https://twitter.com/abacaj/status/1721223737729581437/photo/...


No. I know only that they can generalize within it, and only to a limited degree, but don't have solid evidence of even that.


So what you're saying is there's tenuous-at-best, non-"solid" evidence that LLMs can reason even within their training data.

And yet I'm currently sitting at -1 for stating the blisteringly obvious. Lmao


Yes, that's basically what I'm saying. Just less bluntly. It's slightly more nuanced than "LLMs cannot reason" because lines of reasoning are often in their dataset and can sometimes be used by the model. It's just that the model can't be relied on to know the correct reasoning to use in a given situation.


> you should get the original binary

According to the project's README, they only seem to be checking mere "re-compilability" and "re-executability" of the decompiled code, though.


> If you decompile the binary to source, then compile the source back to binary you should get the original binary.

Doesn't that depend on the compiler's version though? Or, for that matter, even the sub-version. Every compiler does things differently.


From the README:

> By re-compiling the decompiled output and running the test cases, we assess if the decompilation preserved the program logic and behavior.

As this is in the metrics section, I guess fully automating this is not part of the research.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: