b <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> entersub[t2] vKS/TARG,1 ->7
- <1> ex-list K ->6
3 <0> pushmark s ->4
4 </> match(/" 25 ; # "/) sM/RTIME ->5
- <1> ex-rv2cv sK/128 ->-
5 <#> gv[*whatever] s ->6
7 <;> nextstate(main 1 -e:1) v:{ ->8
a <@> die[t3] vK/1 ->b
8 <0> pushmark s ->9
9 <$> const[PV "this dies!"] s ->a
What does this mean practically? If you want to rename "whatever" to something else, you can do that, since the name of the function is captured no matter how many arguments it takes. If you want to lint all your regular expressions, you can do that with the chance that you'll be wrong if you guessed the prototype wrong.
The only thing you can't do is figure out exactly how the code will run at runtime. But if you could do that, you wouldn't have a computer program, you'd just have a lookup table.
Another thing to keep in mind: if a static analysis tool can't understand your code, perhaps the future maintainer can't either. In that case, why not rewrite the code to not be ambiguous?
> it proves that Perl can't be parsed unambiguously.
That's what can't be parsed means. They're pretty clear about what they mean by something being "parsable", you're muddying that definition to extend it to: "parsable: things that can be vaguely processed most of the time"
This perfectly encapsulates my feelings about the Perl parsing question!
An ambiguous parse is still a parse, and yeah, the 'whatever' example is pathologically unmaintainable code. As you say, if a static parser finds it ambiguous, you should probably fix it.
(Or mark it in some way, you know: "This is ambiguous, but it needs to be because _______" - because frankly, ambiguous parses are destined to be bug sources.)
As one of the comments in the original article points out, this is pretty obvious. You just need a BEGIN { } block containing eval(string), where the string depends on evaluating code (e.g. a random number)
At least the article doesn't jump to big conclusions and make stupid declarations based on this result.
Unfortunately the commentators do: "This means that things like static code analysis, code transformation and syntax hilighting will never be reliable." So? Almost all code is still easily analysed, syntax highlighting works, etc. It's like stating that you can't do this stuff with C/C++, because the code depends upon #defines (whose values could be variable, or sourced from external processes)
Actually, syntax highlighting + warnings about ambiguity would be just the thing in the article's example. Suggesting that the developer choose between use of an explicit "m" for the pattern matching:
whatever m/ 25 ; # / ; die "this dies!";
...or the use of parentheses for division:
(whatever / 25) ; # / ; die "this dies!";
...would be a helpful suggestion a syntax parser could give. Also, in either event the code is syntactically valid.
I believe JITting (code generation) happens at a later stage, after the source code had been parsed (or perhaps even "tokenized"/"lex'ed" is more appropriate?)
So even an "unreliable" syntax wouldn't matter much since the jitter kicks in after the parser had delivered its results.
I don't think this result means anything much for compiler optimization in Perl. Perl is definitely a difficult language to parse, a difficult language to interpret, and its flexibility makes for very tricky compilation (have there been any usable compilers for it?)
BUT - you can't really draw the link between a language that is turing-complete to parse, and one that is difficult to parse. An example of this would be to take the most simple, basic language you can think of, and add a backtick-like operation. As in, you can put `echo hello` into the source and the string will get replaced by the output of the enclosed command. The source language instantly becomes turing complete (you're having to execute code, after all), but your one-off parser hasn't become much more difficult (you just call exec() and wait for the output)
I feel like the type of work that's done by browsers on JIT compiling Javascript would be really hard if you can't 100% reliably parse the language.
That's irrelevant in this case, because a hypothetical Perl 5 JIT wouldn't do much until the Perl 5 parser finished parsing its code. Remember, all this proof proves is that you cannot produce a single, unambiguous parse tree from every Perl 5 source file because of the syntactic construct that lets you change how the parser works for specific symbols.
Because that's what I took away from reading about C++ templates - the turing-completeness happens at the parsing/preprocessing stage, before the compiler's code generator even starts translating. Was I wrong? Wouldn't a smart IDE that resolves types (for example for tab completion or type hints), looking at the Factorial<N> thing, have to solve for the value for Factorial<4>?
No. (You need parse all includes to know typedefs, but that's it.)
C++ is awkward for a different reason, but there's nothing unparseable about it. Resolving types for tab completion/hinting is getting closer to static analysis (at least, with C++ it is), but parsing is well and truly done by that stage.
There's nothing about template<4> that can't be parsed, and nothing about the templates or instantiations thereof causes the syntax to be interpreted differently; this is probably the requirement for a parser to be Turing complete.
Perl is actually unparseable without execution: you can modify the allowed syntax mid-parse, based on some condition known only at runtime. Nothing about a template argument (for example) in some call causes a later template definition to be interpreted according to different syntax rules.
Your comment lacks context. Do you mean parseability of perl has never been a goal of perl? If you read the article, you'll see it was the goal of the author, which is why he wrote this proof. So I'm not exactly clear what you're trying to say.
The point the grand parent was making seems to me to be rather simple. He says that the designers of Perl (not the author of the article) didn't consider parseability of Perl an objective, since they focused more on making a scripting language for humans.
Of course the author (and many others) would have liked if Perl was statically parseable since a lot of static tools like documentation require this to be complete.
So it just shows that maybe static parseability should have been an objective of the Perl designers or any other language designer in the future.
It should be noted that this is from 2008, and last updated in 2009. I'm not a Perl user, so I don't know if the language has changed enough since then to affect the validity of the proof. We also don't know if it holds for Perl 6.
It also holds for Perl 6 (to the same degree that it requires you to run some components of the code that is being compiled; Perl 6 does make it easier for the compiler to know which code it must run to complete the parse, though).
Er. This proof is seriously flawed, starting with "what is Perl"? How do we define a language that's not parseable? Does it only mean that you can't know what the program will do until you run it? That's true for Ruby, Python, pretty much everything that has any level of meta-programming. Also, it lacks any formal definition of what is to be proved. Statement: "Perl is unparseable" is not very well defined. "Perl" is the first problem, the second problem is "unparseable" - does it mean you can't write a grammar that does apply to all valid perl programs and does not to all invalid ones? We come back to what's "valid" and "invalid" program.
Obviously if the statement is "you can't write any useful linting tools", which seems to be what the author is after, than fine, but you have to define "useful".
The example given is:
This can be parsed into two variants without running any code or knowing the arity of whatever.The case where whatever takes no arguments:
The case where whatever takes a single argument: What does this mean practically? If you want to rename "whatever" to something else, you can do that, since the name of the function is captured no matter how many arguments it takes. If you want to lint all your regular expressions, you can do that with the chance that you'll be wrong if you guessed the prototype wrong.The only thing you can't do is figure out exactly how the code will run at runtime. But if you could do that, you wouldn't have a computer program, you'd just have a lookup table.
Another thing to keep in mind: if a static analysis tool can't understand your code, perhaps the future maintainer can't either. In that case, why not rewrite the code to not be ambiguous?