Skip to content

Auto-detect language per-line is guaranteed to produce poor results #392

@joshgoebel

Description

@joshgoebel

Hey, current maintainer of Highlight.js here. This just came to my attention via #391.

You're looping over lines and then calling highlightAuto on every line (when you don't have a known language). This is not recommended and guaranteed to produce poor results. Auto-detect is not intended to be useful with such little data and the noise will often (as reported in #391) be much higher than the signal - you're just as likely to get random languages than anything useful. There will be color, but often all wrong.

If you do wish to use auto-detect you should pass us the ENTIRE document (or at the very least all the available lines from the document/diff), then look at the language we determine it to be, then use that language for every single line.

You'll have to take this approach with version 11 anyways since you'll have to do the highlighting in a single pass (rather than per-line). So calling highlightAuto upfront for all available lines and letting it use the greater amount of content available for it's auto-detection... then splitting that result back out into the individual lines you need - already highlighted for you.

You'll have to do it twice of source, once each for the before and after streams.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions