Skip to content

Lexical analysis - Section 2.5.3 - Best order for implementation and avoid errors #110936

@josepojr

Description

@josepojr

Documentation

In section 2.4.1. String and Bytes literals, I will suggest to change the order these expressions:

Before:

stringLiteral ::= [stringprefix] (shortstring | longstring)
byteliteral ::= bytesprefix (shortbytes | longbytes)

After:

stringLiteral ::= [stringprefix] (longstring | shortstring)
byteliteral ::= bytesprefix (longbytes | shortbytes )

We see that I only change order of elements "shortstring" by "longstring" and "shortbytes" by "longbytes".
Because the old order induces the tokenizer developer to error, if the same follow the expressions in literal way.

Example:

d ::= """abc"""

In old expression, the tokenizer will recognize the folllows tokens:
Token 1: ""
Token 2: "abc"
Token 3: ""

And the tokenizer won´t use the expression "longstring" that handles triple quotes! Because the first option is "shortstring" (Reading the left to right).
The correct is the tokenizer uses the "longstring" because it has triple quotes.
This is just an improvement to make python tokenizer development more popular.

Any doubt I am available, thank you!

José Oliveira

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dirinterpreter-core(Objects, Python, Grammar, and Parser dirs)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions