Notes on Context-Free Languages
- A context-free language (CFL) can be more expressive than a regular
language.
- With CFLs, we often speak of a grammar. We are familiar with this word
through our study of English. A grammar is a set of rules that one should
follow when constructing English sentences. A CFL grammar is exactly the
same idea, but applied to much simpler languages.
- The set of context-free languages is a superset of regular languages. That
is, any regular language can be represented as a context-free language, but
the reverse is not always true.
- Just as regular languages have DFAs and NFAs as its computing
"machinery", context-free languages have pushdown automaton (or
PDAs).
- A context-free grammar (CFG) is a set of 4 items (V, S, R, S), where
- V is a finite set of variables,
- S is a finite set, disjoint from V, of
terminals,
- R is a finite set of rules, with each rule being a variable and a string
of variables and terminals, and
- S Î V is the start variable.
- An example CFL (G1 in the book):
V = {A, B}
S = {0, 1, #}
R has three rules as follows:
A ® 0A1
A ® B
B ® #
Let A be the start variable (which, in this case, means either of the first
two rules may be applied initially)
- One can do two things with a grammar: derive something (from the
start state) or parse something (i.e. discern whether or not a string
can be derived from this grammar).
- So, for practice, try deriving some strings using G1.
- Next, try parsing the following strings: 00#11, #, 0#111. When parsing one
often uses something called a parse tree which illustrates how the
string could be derived.
- Consider another example (a simpler version of G4 in the book):
E ® E + T | T
T ® a
- practice: give two strings that can be derived by the previous grammar and
two strings that cannot be derived from it.
- practice creating a grammar: Ex. 2.4 parts b, c
- practice: create a grammar that could be used for relatively simple C/C++
declarations like "int a;", "string c, d;", and
"char q;". Assume 32-character limit on variable names.
Assume you cannot initialize a variable.
- Ambiguity is a problem that often manifests itself with grammars.
Ambiguity is the official name for how one grammar can have more than one
way to derive a given string.
- Ambiguity example: Let's say we wanted to simplify our addition expression
grammar to the following:
E ® E + E | a
This can derive all the same strings that the earlier version did, but it
introduces ambiguity. Take, for example, the string "a + a +
a". There are at least two ways to parse it.
- Leftmost derivations versus rightmost derivations deal with whether to
replace the leftmost variable in an expression (e.g. the leftmost E in
"E + E") first or the rightmost. Our default method is
leftmost.
- Generally ambiguity is not a good thing. Proving ambiguity is simply a
matter of coming up with one string and two ways to parse it. Proving the
absence of ambiguity is quite a bit more difficult and beyond the scope of
our study.
- Chomsky Normal Form and the property (and joy!) of being context-free.