Skip to content

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Jan 20, 2024

In #101398 we added support for the recursive wildcard ** in pathlib.PurePath.match(). This should allow arbitrary prefix and suffix matching, like p.match('foo/**') or p.match('**/foo'), but there's a problem: for relative patterns only, match() implicitly inserts a ** token on the left hand side, causing all patterns to match from the right. As a result, it's impossible to match relative patterns from the left: PurePath('foo/bar').match('bar/**') is true!

This PR reverts the changes to match(), and instead adds a new full_match() method that:

  • Allows empty patterns
  • Supports the recursive wildcard **
  • Matches the entire path when given a relative pattern


📚 Documentation preview 📚: https://cpython-previews--114350.org.readthedocs.build/en/114350/library/pathlib.html#pathlib.PurePath.globmatch

In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`globmatch()` method that:

- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern

As a result, `globmatch()`'s pattern language exactly matches that of
`glob()`.
@zooba
Copy link
Member

zooba commented Jan 22, 2024

Does the problem extend beyond "** is allowed to match nothing"? Or can we change the existing algorithm to require that a trailing ** has to match at least one segment?

Presumably .match("foo\\*") has to find something after foo\, so would be a better version of "foo\\**" anyway, but I can't imagine anyone being happy that "foo\\**" matches "foo\\" - you wouldn't specify the ** if you didn't want something to be found there.

@barneygale
Copy link
Contributor Author

** matching nothing is quite normal I think - it's what glob.glob(), pathlib.Path.glob(), zsh, and IIRC bash do. If you want at least one segment after foo/, you can spell that with foo/**/*.

@barneygale
Copy link
Contributor Author

e.g. zsh man page:

A pathname component of the form '(foo/)#' matches a path consisting of zero or more directories matching the pattern foo.

As a shorthand, '**/' is equivalent to '(*/)#'; note that this therefore matches files in the current directory as well as subdirectories. Thus:

ls (*/)#bar

or

ls **/bar

does a recursive directory search for files named 'bar' (potentially including the file 'bar' in the current directory).

@zooba
Copy link
Member

zooba commented Jan 22, 2024

Okay, I definitely like having full in the name somewhere then. It's still a match, and the parallel is fullmatch is way more helpful than to glob (which would just raise questions about "how does match do its matching then?")

@barneygale barneygale changed the title GH-73435: Add pathlib.PurePath.globmatch() GH-73435: Add pathlib.PurePath.full_match() Jan 22, 2024
Copy link
Member

@zooba zooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@barneygale
Copy link
Contributor Author

Thanks for the reviews, both.

@barneygale barneygale enabled auto-merge (squash) January 26, 2024 00:51
@barneygale barneygale merged commit b69548a into python:main Jan 26, 2024
aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
In 49f90ba we added support for the recursive wildcard `**` in
`pathlib.PurePath.match()`. This should allow arbitrary prefix and suffix
matching, like `p.match('foo/**')` or `p.match('**/foo')`, but there's a
problem: for relative patterns only, `match()` implicitly inserts a `**`
token on the left hand side, causing all patterns to match from the right.
As a result, it's impossible to match relative patterns from the left:
`PurePath('foo/bar').match('bar/**')` is true!

This commit reverts the changes to `match()`, and instead adds a new
`full_match()` method that:

- Allows empty patterns
- Supports the recursive wildcard `**`
- Matches the *entire* path when given a relative pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news topic-pathlib type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants