Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure if this qualifies as intermediate representation in the LLVM sense, but PHP has opcodes already, it is not fully interpreted anymore (like it was in PHP 3).

These opcodes are used for optimizations and cached for the lifetime of the PHP process (across multiple requests).



That's probably too low-level. Usually an AST is needed that is void of all syntactic sugar, yet can express the required data flows and dependencies so the trivial substitutions/replacements/caching and refactorings can be done at this level.

Lower level bytecode lacks these higher level references (it usually has a lot of indices that index into raw lookup tables, but no real symbols that refer to results of computations and other variables).


Not sure why you think this is the case: there are numerous JITs/VMs that optimize starting from bytecode e.g. all JVMs, WebAssembly, V8, JSC,...

Can you point me to some paper/reference about this? AFAIK it doesn't matter whether you have an AST or bytecode. Optimizations are done on an IR anyways.


There does appear to be an internal AST in php7: https://github.com/nikic/php-ast


LuaJIT and WebKit both produce their IR from the bytecode, IIUC.


PHP has had an AST for years now.


Of course, but the opcode is lower level than that. And that was what the parent comment mentioned.


Usually an AST is needed that is void of all syntactic sugar,

That’s how I’ve always see “AST” used, but not the following:

yet can express the required data flows and dependencies so the trivial substitutions/replacements/caching and refactorings can be done at this level.

Is what you describe some flavor of ‘augmented’ AST? Does that have a name other than AST?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: