Not sure if this qualifies as intermediate representation in the LLVM sense, but...

pas · on March 31, 2019

That's probably too low-level. Usually an AST is needed that is void of all syntactic sugar, yet can express the required data flows and dependencies so the trivial substitutions/replacements/caching and refactorings can be done at this level.

Lower level bytecode lacks these higher level references (it usually has a lot of indices that index into raw lookup tables, but no real symbols that refer to results of computations and other variables).

readittwice · on March 31, 2019

Not sure why you think this is the case: there are numerous JITs/VMs that optimize starting from bytecode e.g. all JVMs, WebAssembly, V8, JSC,...

Can you point me to some paper/reference about this? AFAIK it doesn't matter whether you have an AST or bytecode. Optimizations are done on an IR anyways.

tyingq · on March 31, 2019

There does appear to be an internal AST in php7: https://github.com/nikic/php-ast

jashmatthews · on March 31, 2019

LuaJIT and WebKit both produce their IR from the bytecode, IIUC.

ryanmccullagh · on March 31, 2019

PHP has had an AST for years now.

pas · on March 31, 2019

Of course, but the opcode is lower level than that. And that was what the parent comment mentioned.

WalterGR · on April 1, 2019

Usually an AST is needed that is void of all syntactic sugar,

That’s how I’ve always see “AST” used, but not the following:

yet can express the required data flows and dependencies so the trivial substitutions/replacements/caching and refactorings can be done at this level.

Is what you describe some flavor of ‘augmented’ AST? Does that have a name other than AST?