Skip to content

Conversation

@MaximeKjaer
Copy link
Contributor

With the upcoming release of Scala 3 (currently available as a milestone release at lampepfl/dotty), the Scala lexer needed to be updated for the new language version.

This new implementation is inspired by that of scala/vscode-scala-syntax. It supports both Scala 2 and Scala 3, which share a lot of syntax. Scala 3 can be written in either an indentation-based syntax, or a curly-brace-based syntax (which was the only syntax variant supported in Scala 2). This new lexer implementation supports both variants.

Fixes #1035. Very likely fixes #1121, although I cannot definitely confirm as the link to the bug reproduction is dead.

@Anteru Anteru added this to the 2.8 milestone Feb 6, 2021
@Anteru Anteru requested a review from birkenfeld February 6, 2021 15:51

idrest = '%s(?:%s|[0-9])*(?:(?<=_)%s)?' % (letter, letter, op)
letter_letter_digit = '%s(?:%s|\\d)*' % (letter, letter)
opchar = (u'[!#%&*+\\-\\/:<>=?@^|~\u00a6-\u00a7\u00a9\u00ac\u00ae\u00b0-\u00b1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't reintroduce the "u" prefix please.

Also, are these derived from Unicode categories? If yes, please use the existing lists from pygments.unistring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Indeed, those were Unicode categories. These were present in the old lexer, but using pygments.unistring is indeed much nicer.

plainid = u'(?:%s|%s+)' % (idrest, opchar)
backQuotedId = r'`[^`]+`'
anyId = u'(?:%s|%s)' % (plainid, backQuotedId)
endOfLineMaybeWithComment = r'(?=\s*(//.*|/\*(?!.*\*/\s*\S.*).*)?$)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty complex. Especially the .*\*/ can eat the whole text due to DOTALL.

Is recognizing these comments really necessary? Keep in mind that pygments is a highlighter, 100% accuracy is not required, speed is more important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recognizing the end-of-line is necessary, as end is a soft keyword that should only be highlighted in certain situations.

I think placing a comment after an end might happen, especially in educational material that explains this end syntax. However, the precision with which the regex was matching on block comments (all of this to avoid a false positive) was not necessary, and I was able to significantly simplify.

(r'[{}()\[\];,.]', Punctuation),
(r'(?<!:):(?!:)', Punctuation),
],
'keywords': [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should all use words() which auto-escapes and optimizes the regex. Same for operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've also fixed the storage modifiers to use words

@Anteru Anteru modified the milestones: 2.8, 2.9 Feb 14, 2021
@Anteru Anteru added the update needed Waiting for an update from the PR/issue creator label Feb 14, 2021
Copy link
Member

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thanks!

@birkenfeld birkenfeld added changelog-update Items which need to get mentioned in the changelog and removed update needed Waiting for an update from the PR/issue creator labels Mar 1, 2021
@birkenfeld birkenfeld merged commit 37113b0 into pygments:master Mar 1, 2021
@MaximeKjaer MaximeKjaer deleted the scala-3-rewrite branch March 1, 2021 10:40
@Anteru Anteru removed the changelog-update Items which need to get mentioned in the changelog label Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scala lexer: type highlighted differenty if after type parameter Syntax highlight error in Scala lexer, literal types

3 participants