Rewrite Scala lexer for Scala 3 #1694

MaximeKjaer · 2021-01-23T21:34:34Z

With the upcoming release of Scala 3 (currently available as a milestone release at lampepfl/dotty), the Scala lexer needed to be updated for the new language version.

This new implementation is inspired by that of scala/vscode-scala-syntax. It supports both Scala 2 and Scala 3, which share a lot of syntax. Scala 3 can be written in either an indentation-based syntax, or a curly-brace-based syntax (which was the only syntax variant supported in Scala 2). This new lexer implementation supports both variants.

Fixes #1035. Very likely fixes #1121, although I cannot definitely confirm as the link to the bug reproduction is dead.

birkenfeld · 2021-02-09T14:46:31Z

pygments/lexers/jvm.py

-
-    idrest = '%s(?:%s|[0-9])*(?:(?<=_)%s)?' % (letter, letter, op)
-    letter_letter_digit = '%s(?:%s|\\d)*' % (letter, letter)
+    opchar = (u'[!#%&*+\\-\\/:<>=?@^|~\u00a6-\u00a7\u00a9\u00ac\u00ae\u00b0-\u00b1'


don't reintroduce the "u" prefix please.

Also, are these derived from Unicode categories? If yes, please use the existing lists from pygments.unistring.

Fixed.

Indeed, those were Unicode categories. These were present in the old lexer, but using pygments.unistring is indeed much nicer.

birkenfeld · 2021-02-09T14:55:43Z

pygments/lexers/jvm.py

+    plainid = u'(?:%s|%s+)' % (idrest, opchar)
+    backQuotedId = r'`[^`]+`'
+    anyId = u'(?:%s|%s)' % (plainid, backQuotedId)
+    endOfLineMaybeWithComment = r'(?=\s*(//.*|/\*(?!.*\*/\s*\S.*).*)?$)'


This seems pretty complex. Especially the .*\*/ can eat the whole text due to DOTALL.

Is recognizing these comments really necessary? Keep in mind that pygments is a highlighter, 100% accuracy is not required, speed is more important.

Recognizing the end-of-line is necessary, as end is a soft keyword that should only be highlighted in certain situations.

I think placing a comment after an end might happen, especially in educational material that explains this end syntax. However, the precision with which the regex was matching on block comments (all of this to avoid a false positive) was not necessary, and I was able to significantly simplify.

birkenfeld · 2021-02-09T15:00:44Z

pygments/lexers/jvm.py

+            (r'[{}()\[\];,.]', Punctuation),
+            (r'(?<!:):(?!:)', Punctuation),  
+        ],
+        'keywords': [


these should all use words() which auto-escapes and optimizes the regex. Same for operators.

Done. I've also fixed the storage modifiers to use words

birkenfeld

LGTM now. Thanks!

MaximeKjaer added 7 commits January 23, 2021 15:26

Rewrite Scala lexer for Scala 3

00fc3a3

Remove duplicated tests

11d31d3

Add test for pygments#1035

7690007

Fix highlighting of named givens

3bbecfc

Fix extension regex

f59e639

Simplify Scala identifier regexes

ed1dde1

Reformat long lines in Scala lexer

23682e1

Anteru added this to the 2.8 milestone Feb 6, 2021

Anteru requested a review from birkenfeld February 6, 2021 15:51

birkenfeld requested changes Feb 9, 2021

View reviewed changes

Remove u string prefix from Scala lexer

b32240f

Anteru modified the milestones: 2.8, 2.9 Feb 14, 2021

Anteru added the update needed Waiting for an update from the PR/issue creator label Feb 14, 2021

MaximeKjaer added 4 commits February 15, 2021 23:23

Replace Scala unicode regexes with call to unistring module

4682eee

Remove complicated regex

53f71f3

Refactor regexes with word helper function

c20870a

Fix end soft keyword highlighting

ca4e295

MaximeKjaer requested a review from birkenfeld February 28, 2021 21:47

birkenfeld approved these changes Mar 1, 2021

View reviewed changes

birkenfeld added changelog-update Items which need to get mentioned in the changelog and removed update needed Waiting for an update from the PR/issue creator labels Mar 1, 2021

birkenfeld merged commit 37113b0 into pygments:master Mar 1, 2021

MaximeKjaer deleted the scala-3-rewrite branch March 1, 2021 10:40

Anteru removed the changelog-update Items which need to get mentioned in the changelog label Mar 5, 2021

alexandru mentioned this pull request Oct 28, 2022

Provide support for Scala 3 rouge-ruby/rouge#1885

Open

SethTisue mentioned this pull request Aug 15, 2025

Switch to treesitter or linguist for syntax highlighting? scala/scala-lang#1833

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rewrite Scala lexer for Scala 3 #1694

Rewrite Scala lexer for Scala 3 #1694

Uh oh!

MaximeKjaer commented Jan 23, 2021

Uh oh!

birkenfeld Feb 9, 2021

Uh oh!

MaximeKjaer Feb 15, 2021

Uh oh!

birkenfeld Feb 9, 2021

Uh oh!

MaximeKjaer Feb 15, 2021

Uh oh!

birkenfeld Feb 9, 2021

Uh oh!

MaximeKjaer Feb 15, 2021

Uh oh!

birkenfeld left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rewrite Scala lexer for Scala 3 #1694

Rewrite Scala lexer for Scala 3 #1694

Uh oh!

Conversation

MaximeKjaer commented Jan 23, 2021

Uh oh!

birkenfeld Feb 9, 2021

Choose a reason for hiding this comment

Uh oh!

MaximeKjaer Feb 15, 2021

Choose a reason for hiding this comment

Uh oh!

birkenfeld Feb 9, 2021

Choose a reason for hiding this comment

Uh oh!

MaximeKjaer Feb 15, 2021

Choose a reason for hiding this comment

Uh oh!

birkenfeld Feb 9, 2021

Choose a reason for hiding this comment

Uh oh!

MaximeKjaer Feb 15, 2021

Choose a reason for hiding this comment

Uh oh!

birkenfeld left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants