Vocabulary/Parsing

From J Wiki
Jump to navigation Jump to search

Back to: Vocabulary

In this section we will analyze sentences of increasing complexity from a high level. In the next section we will follow parsing in detail from the bottom up.

   9!:3 (5) NB. Do this once to select simplified display
   &.
&.

With only one word, there are no operands and nothing to execute, so the result of the sentence is the word itself: the conjunction &. .

   −:&.^.
−:&.^.

The result of executing −:&.^., i. e. executing &. with −: and ^. as operands, is an anonymous verb. This anonymous verb will execute according to the definition of &. given operands −: and ^. (i. e. −:&.^. y will execute as ^ −: ^. y ).

Note that the conjunction &. is executed without reference to the argument(s) supplied to the anonymous verb (indeed, in this case there are no such argument(s) and the anonymous verb is the result of the sentence). We could assign the anonymous verb to a name, in which case it would no longer be anonymous (e. g. sqrt =: −:&.^.); without such an assignment we will refer to it here by the nickname av . The value of av is the verb described by −:&.^. .

   −:&.^. 16
4

We know that this is executed as ^ −: ^. 16; let's see how that happens. Conjunctions are executed before verbs, so first −:&.^. will be executed to produce the anonymous verb we called av . Then av is executed with the operand 16 . av operates according to the definition of &. : it produces the same result as ^ −: ^. 16 (but it may use a different algorithm than executing ^ −: ^. 16 directly).

It might appear that &. was executed twice: once to create av and then again during the execution of av . No, it was executed only once, to create av . av operates according to the definition of &., but it is av that is executing, not &. . The confusion arises because of the way the interpreter displays av . There is no better way to show a verb that performs the function −:&.^. than to show the way the verb was created, i. e. with the characters '−:&.^.', but you should think of this as an exhibition of the pedigree of av, and an assurance of its good behavior, rather than a list of functions to be executed. In fact, part of the reason for J's good performance comes from its recognizing functions that can be combined efficiently and providing customized routines to handle anonymous verbs that call for those combinations.

Confusion between a conjunction and the anonymous verb it produces is most likely when the conjunction is one you wrote using conjunction define or 2 :n . In most cases the text of the conjunction actually describes a derived verb, and it is natural for you to say 'the conjunction C is executed with operands u, v, and y' rather than the more accurate 'the anonymous verb created by the application of C to u and v is executed, with u and v available during the interpretation of the text of C and with y as the operand'. Such confusion is almost always harmless, but let's go through a few examples so you can see the layers of execution:

   2 : 'u'
2 : 'u'

We execute 2 : 'u' and the result is an anonymous conjunction that we'll call ac1 . The display of ac1 shows where it came from. When ac1 is executed, its result will be its left operand.

   +: (2 : 'u') −:
+:

Here 2 : 'u' is executed first to produce ac1; then ac1 is executed with left operand of +: and right operand of −: . The result is an anonymous verb that we'll call av1 its value is the verb +: which was the left operand to ac1 .

   +: (2 : 'u') −: 5
10

Remember, (2 : 'u') is a conjunction (the conjunction we have called ac1), and conjunctions are executed before verbs, so this is executed as (+: (2 : 'u') −:) 5, which becomes av1 5 . We execute av1 with the operand . Monad +: doubles its operand, and the result is 10 .

We know that a conjunction can produce a conjunction result. That's how explicit conjunctions are formed: executing the conjunction : with left operand 2, as in 2 :n, produces a conjunction. There is nothing special about 2 :n : any conjunction is allowed to produce a conjunction result:

   2 : '&'
2 : (,'&')

We execute 2 : '&' and the result is an anonymous conjunction that we'll call ac2 . The display of ac2 shows where it came from. (the , in the display of ac2 is harmless, a reminder that internally the anonymous entity resulting from m :n stores n as a list of characters.)

   +: (2 : '&') −:
&

We execute ac2 with left operand of +: and right operand of −: . The result is an anonymous conjunction that we'll call ac3 . ac3 is a conjunction because its value & (the last sentence executed by ac2) is a conjunction. Yes, & by itself can be a result: modifiers can return any primary part of speech (but try to return a conjunction from a verb and you will get an error).

Note that this is not the same as u&v : that would also be a valid return value from a conjunction, but u and v would be substituted and & would be executed to make the returned value an anonymous verb with the description u&v .

Make sure you see why the +: and −: disappeared. First, the conjunction : was executed with operands 2 and '&'; that produced a conjunction ac2 which was then executed with operands +: and −:; but the defining text of ac2 does not look at its operands; it simply produces the value . So, the operands to ac2 disappear without a trace, and the result of the whole phrase is a conjunction with the value .

   2 (+: (2 : '&') −:) *
2&*


Continuing the example, we execute ac3 (which was just the conjunction &) with left operand 2 and right operand . The result is the anonymous verb av2 which will execute as 2&* .

   2 (+: (2 : '&') −:) * 5
10

Finally, we execute av2 with the operand 5, and get the result 10 .

Explicit modifiers that refer to the operands of their derived verb (as x or y) come in for special treatment. A simple example is the conjunction defined by

   2 : 'u v y'
2 : 'u v y'

We execute 2 : 'u v y' and the result is an anonymous conjunction that we'll call ac4 . You can't tell it from the display, but ac4 is a special kind of conjunction. Because it refers to y, the text of ac4 can be executed only as a verb (only then are x and y meaningful). The stored ac4 makes note of this fact.

   +: (2 : 'u v y') − 5
_10

When ac4 itself is executed (as +: (2 : 'u v y') − here),since ac4 is a conjunction it is executed before its result is applied to the noun operand 5), the text of ac4 is not interpreted (as it was in our other examples). Instead, the new anonymous verb av3 is created. av3 contains the defining text of ac4, along with the operands that were given to ac4 (+: and here). When the verb av3 is executed as in the line above, the text of ac4 is finally interpreted, with the operands of ac4 (+: and here) available as u and v, and the noun operands of av3 (5 here) available as y (and x if the invocation is dyadic); the result is the result of +: − 5 .

The Details of Parsing

Now that you understand what an anonymous verb/adverb/conjunction is, you are ready to follow parsing and execution word by word. We will finally abandon all shortcuts and process sentences exactly as the interpreter does.

In any compiled language, a program is broken into words (tokens) and then parsed, and code is generated from the parsed result. Not so in J: a sentence is broken into words, but the sentence is not fully parsed; rather, parsing and execution proceed simultaneously, scanning the text from right to left. Parsing finds patterns in the sentence that are then executed. Execution includes the usual supplying of noun operands to verbs to obtain a result from the verb, but also other actions: supplying verb and noun operands to conjunctions and adverbs to produce derived entities, and recognition of other sequences that we will learn soon. Execution of a bit of a sentence, which we will call a fragment, consists of replacing the fragment with an appropriate single word, namely the result of executing the fragment. In the simple case, where the fragment is the invocation of a verb (i. e. the fragment looks like verb noun or noun verb noun), the word that replaces it is the noun that is the result of the verb. If the fragment is the invocation of a modifier, the result of executing it will be a noun or a derived verb/adverb/conjunction. A noun is nothing but its value, but the derived verb/adverb/conjunction will itself eventually be executed: it is called an anonymous verb/adverb/conjunction and is saved by the interpreter in a private form, and the single word used to replace the fragment is a reference to this anonymous verb/adverb/conjunction (for the case of an anonymous verb, you may think of the single word as a pointer to a function that performs according to the definition of the anonymous verb). In all cases the word replacing the fragment has a definite part of speech, and if it is a verb, a definite rank.

The Execution Stack

Execution of a sentence begins by breaking the sentence into words. The words (with a beginning-of-line marker, shown here as §, prepended) become the initial contents of the unprocessed word list. A push-down stack is also used during execution; it is initially empty. Words are moved onto the stack one by one, either as a value (the nameless value of the word) or as a references (the name the value is assigned to).

  • Primitives (including punctuation) are always stacked as values.
  • Nouns are always stacked as values.
  • The special names x v m n u v are always stacked as values.
  • Verbs are always stacked as references.
  • Modifiers are stacked as values if they are nameless, that is if their value refers to no names; otherwise as references.
    Note that a nameless modifier is stacked as a nameless value, and does not contribute any references to a name it appears in. Thus (adverb1 adverb2) qualifies as nameless if adverb1 and adverb2 are both nameless. To force a reference to a modifier write 'modifier'~

When a reference is executed, its part of speech must be the same as when the reference was created.

The Parsing Table

Execution of the sentence is performed by repetition of the parsing step which comprises: (1) examining the top 4 elements of the stack to see if they match one of the 9 executable patterns; (2) if a match was found, executing the executable portion of the stack (what we called the executable fragment in the last chapter), resulting in a single word which replaces the fragment on the stack; (3) if no match was found, moving the rightmost word of the unprocessed word list into the leftmost position of the stack, pushing the rest of the stack to the right. Execution finishes when there are no unprocessed words and the stack does not contain an executable pattern. Note that execution of a fragment may leave the stack matching one of the executable patterns, so several sequential parsing steps may perform an execution without moving anything onto the stack. After all words have been processed, the stack should contain a beginning-of-line marker followed by a single word which becomes the result of the sentence.

To follow the parsing we need to know what patterns at the top of the stack contain an executable fragment. The parsing table below gives the complete list. More than one symbol in a box means that any one of them matches the box. name means any valid variable name, and C, A, V, and N stand for conjunction, adverb, verb, and noun respectively. The word "anything" includes all the other tokens in the table, as well as the possibility of the stack containing only three words.

leftmost stack word other stack words action
§ =. =: ( V N anything 0 Monad
§ =. =: ( A V N V V N 1 Monad
§ =. =: ( A V N N V N 2 Dyad
§ =. =: ( A V N V N A anything 3 Adverb
§ =. =: ( A V N V N C V N 4 Conj
§ =. =: ( A V N V N V V 5 Fork
§ =. =: ( C A V N C A V N C A V N 6 Modifier trident
§ =. =: ( C A V N C A V N anything 7 Hook or Modifier bident
name N =. =: C A V N anything 8 Is
( C A V N ) anything 9 Paren

The lines in the parsing table are processed in order. If the leftmost 4 words on the stack match a line in the table, the fragment, made up of those words on the stack which are in boldface in the parsing table, is executed and replaced on the stack by the single word returned. Because the fragment is always either two or three words long, it is officially known as a bident or trident. The last column of the parsing table gives a description of what execution of the fragment entails.

You will have an easier time following the parsing if you note that the leftmost word in the executable pattern is usually one of § =. =: ( A V N . This means that you can scan from the right until you hit a word that matches one of those before you even start checking for an executable pattern. If you find one of § =. =: ( A V N and it doesn't match an executable pattern, keep looking for the next occurrence.

Note that the leftmost stack word in rule 4 in the parsing table is never a conjunction (as is true of the other parsing rules). This is the source of our long-noted rule that conjunctions associate left-to-right: a conjunction can be executed when it appears in the third stack position, but if another conjunction is in the leftmost position then, the stack will always be pushed down to examine that conjunction's left argument.

Line 7, defining hooks and modifier bidents, is matched for any combination of CAVN CAVN except N A and V A which are matched in line 3.

Examples Of Parsing And Execution

We will now follow a few sentences through parsing. We will represent anonymous entities by names in italics, with an indication of how the anonymous entity was created. Up till now in this book we have scarcely noticed that the term 'verb' was used both for an entity that can be applied to a noun to produce a noun result, and also for the name of that entity. This ambiguity will continue (being precise would be too cumbersome) but you should be aware of it. When we say 'the result is av, defined as +/', that means that an anonymous verb was created whose function is described as +/, and the nickname we are giving it (the word, that is, that goes on the execution stack to betoken this verb) is av.

Sentence: +/2*a where a is 1 2 3

unprocessed word list stack line
§ + / 2 * a
§ + / 2 * 1 2 3 (not executable)
§ + / 2 * 1 2 3 (not executable)
§ + / 2 * 1 2 3 (not executable)
§ + / 2 * 1 2 3 (result 2 4 6) 2
§ + / 2 4 6 (result av, defined as +/) 3
§ av 2 4 6 (result 12) 0
§ 12

The column labeled 'line' indicates which line of the parsing table was matched by the stack. The fragment is marked in boldface and underlined. Note that when the noun a moved onto the stack, its value was moved; when a named verb, adverb, or conjunction is moved onto the stack, only the name is moved. Note also that the noun's value (1 2 3 here) is a single word.

From now on we will omit the lines that do not contain an executable fragment.

Sentence: mean =: +/ % #

unprocessed word list stack line
§ mean =: + / % #
§ mean =: + / % # (result av1, defined as +/) 3
§ mean =: av1 % # (result av2, defined as av1 % #) 5
§ mean =: av2 (result av2; mean is assigned av2) 8
§ av2

I want to emphasize that what is assigned to mean is the result of parsing +/ % # . It is not the sequence +/ % #, but rather a single verb which performs the function described by the fork. Now you see why putting parentheses around the definition doesn't matter: av2 would be parsed the same either way.

Sentence: mean 4 5 6

unprocessed word list stack line
§ mean 4 5 6
§ mean 4 5 6 (result 5) 0
§ 5


Since mean is the result from parsing +/ % #, it is executed without further ado. As you can see, a single 'execution' step can trigger a cascade of processing as each verb referred to by an executing verb is executed in turn. Here, execution of mean does the entire processing of the fork, returning the result . The verb to be executed can be quite complex, and can have a mixture of named and anonymous components, as in the next example.

Sentence: (mean − (+/ % #)&.(^."1)) 4 5 6 (find the difference between arithmetic and geometric mean)

unprocessed word list stack line
§ ( mean - ( + / % # ) &. ( ^. " 1 ) ) 4 5 6
§ ( mean - ( + / % # ) &. ( ^. " 1 ) ) 4 5 6 (result av1, defined as ^. " 1) 4
§ ( mean - ( + / % # ) &. ( av1 ) ) 4 5 6 (result av1) 9
§ ( mean - ( + / % # ) &. av1 ) 4 5 6 (result av2, defined as +/ 3
§ ( mean - ( av2 % # ) &. av1 ) 4 5 6 (result av3, defined as av2 % #) 5
§ ( mean - ( av3 ) &. av1 ) 4 5 6 (result av3) 9
§ ( mean - av3 &. av1 ) 4 5 6 (result av4, defined as av3 &. av1) 4
§ ( mean - av4 ) 4 5 6 (result av5, defined as mean - av4) 5
§ ( av5 ) 4 5 6 (result av5) 9
§ av5 4 5 6 (result 0.0675759) 0
§ 0.0675759

Again, there was only one execution of a verb. It happened at the very end: after av5 was created, it was executed, and its execution included the execution of everything else.

Sentence: inc =: ({.a)&+ where a is 4 5 6

unprocessed word list stack line
§ inc =: ( {. a ) & +
§ inc =: ( {. 4 5 6 ) & + (result 4) 0
§ inc =: ( 4 ) & + (result 4) 9
§ inc =: 4 & + (result av, defined as 4&+) 4
§ inc =: av (result av; inc is assigned av) 8
§ av

This illustrates an important point. Even in the middle of a complex definition, verbs are applied to nouns wherever possible. And, the value of a noun in a definition is the value at the time the definition was parsed. If a parsed definition refers to a verb, it does so by name, so the value of a verb is its value when it is executed.

The remaining examples are curiosities to show you that it's worth your trouble to learn the intricacies of parsing.

Sentence: a + a =. 5

unprocessed word list stack line
§ a + a =. 5
§ a + a =. 5 (result is 5; a is assigned 5) 0
§ 5 + 5 (result is 10) 2
§ 10

a is assigned a value just before that value is pushed onto the stack.

Sentence: 2 +: (2 : '&') −: * 5

unprocessed word list stack line
§ 2 +: ( 2 : '&' ) -: * 5
§ 2 +: ( 2 : '&' ) -: * 5 (result is ac1, defined as 2 : '&') 4
§ 2 +: ( 'ac'1 ) -: * 5 (result is ac1) 9
§ 2 +: ac1 -: * 5 (result is ac2, defined as &) 4
§ 2 ac2 * 5 (result is av, defined as 2&*) 4
§ av 5 (result is 10) 0
§ 10

Look at what happens when we omit the parentheses:

Sentence: 2 +: 2 : '&' −: * 5

unprocessed word list stack line
§ 2 +: 2 : '&' -: * 5
§ 2 +: 2 : '&' -: * 5 (result is 1) 1
§ 2 +: 2 : '&' -: 1 (result is ac1, defined as 2 : '&') 4
§ 2 +: ac1 -: 1 (result is ac2, defined as &) 4
§ 2 ac2 1 (domain error: 2&1 is illegal) 4

The omission produces a subtle but fatal change in the parsing order. As the Dictionary says, "it may be necessary to parenthesize an adverbial or conjunctival phrase that produces anything other than a noun or verb". Now you see why.

Undefined Words

If you try to execute a nonexistent verb, you get an error:

   z 5
|value error: z
| z 5

However, that error occurs during execution of the name, not during its parsing. During parsing, an undefined name is assumed to be a verb of infinite rank. This allows you to write verbs that refer to each other, and relieves you from having to be scrupulous about the order in which you define verbs. For example:

   a =: z

This produces a verb a which, when executed, will execute z.

   z =: +
   a 5
5

With z defined, a executes correctly. Of course, it's OK to assign z to another verb too:

   b =: z
   b 5
5

Now, can you tell me what

   +/@b +/@b] 1 2 3

will do? Take a minute to figure it out (Hint: note that I used @ rather than @:).

   +/@b 1 2 3
1 2 3

Because b has rank 0, +/@b also has rank zero, so the 'summing' is applied to atoms individually and we get a list result. Do you think +/@a 1 2 3 will have the same result?

   +/@a 1 2 3
6

Even though a has the same value as b, its rank is different. a's rank was assigned when it was parsed, and at that time z was assumed to have infinite rank. b's rank was assigned when it was parsed too, but by that time z had been defined with rank 0. You can win a lot of bar bets with this one if you hang out with the right crowd.