url: Use string_views for arguments of CanonicalizeStandardURL() and ReplaceStandardURL()

This CL has no behavior changes.

Bug: 350788890
Change-Id: Id8d0abb08fff549c4d4367fca19823f9c0a07830
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7077619
Commit-Queue: Hayato Ito <hayato@chromium.org>
Auto-Submit: Kent Tamura <tkent@chromium.org>
Reviewed-by: Hayato Ito <hayato@chromium.org>
Owners-Override: Hayato Ito <hayato@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1534728}
NOKEYCHECK=True
GitOrigin-RevId: f22f0455d2910ae8d34f3a5601e4c4857134faa8
7 files changed
tree: c1ff8926a2e91d968bc69016378c89f3be1c7684
  1. android/
  2. ipc/
  3. mojom/
  4. third_party/
  5. BUILD.gn
  6. DEPS
  7. DIR_METADATA
  8. features.gni
  9. gurl.cc
  10. gurl.h
  11. gurl_abstract_tests.h
  12. gurl_debug.cc
  13. gurl_debug.h
  14. gurl_fuzzer.cc
  15. gurl_fuzzer.dict
  16. gurl_unittest.cc
  17. origin.cc
  18. origin.h
  19. origin_abstract_tests.cc
  20. origin_abstract_tests.h
  21. origin_debug.cc
  22. origin_debug.h
  23. origin_unittest.cc
  24. OWNERS
  25. README.md
  26. run_all_perftests.cc
  27. run_all_unittests.cc
  28. scheme_host_port.cc
  29. scheme_host_port.h
  30. scheme_host_port_unittest.cc
  31. url_canon.cc
  32. url_canon.h
  33. url_canon_etc.cc
  34. url_canon_filesystemurl.cc
  35. url_canon_fileurl.cc
  36. url_canon_host.cc
  37. url_canon_icu.cc
  38. url_canon_icu.h
  39. url_canon_icu_fuzzer.cc
  40. url_canon_icu_test_helpers.h
  41. url_canon_icu_unittest.cc
  42. url_canon_internal.cc
  43. url_canon_internal.h
  44. url_canon_internal_file.h
  45. url_canon_ip.cc
  46. url_canon_ip.h
  47. url_canon_mailtourl.cc
  48. url_canon_non_special_url.cc
  49. url_canon_path.cc
  50. url_canon_pathurl.cc
  51. url_canon_query.cc
  52. url_canon_relative.cc
  53. url_canon_stdstring.cc
  54. url_canon_stdstring.h
  55. url_canon_stdurl.cc
  56. url_canon_unittest.cc
  57. url_constants.h
  58. url_features.cc
  59. url_features.h
  60. url_file.h
  61. url_idna_icu.cc
  62. url_idna_icu_alternatives_android.cc
  63. url_idna_icu_alternatives_ios.mm
  64. url_parse_file.cc
  65. url_parse_internal.h
  66. url_parse_perftest.cc
  67. url_parse_unittest.cc
  68. url_test_utils.h
  69. url_util.cc
  70. url_util.h
  71. url_util_internal.h
  72. url_util_unittest.cc
README.md

Chrome's URL library

Layers

There are several conceptual layers in this directory. Going from the lowest level up, they are:

Parsing

The url_parse.* files are the parser. This code does no string transformations. Its only job is to take an input string and split out the components of the URL as best as it can deduce them, for a given type of URL. Parsing can never fail, it will take its best guess. This layer does not have logic for determining the type of URL parsing to apply, that needs to be applied at a higher layer (the “util” layer below).

Because the parser code is derived (very distantly) from some code in Mozilla, some of the parser files are in url/third_party/mozilla/.

The main header to include for calling the parser is url/third_party/mozilla/url_parse.h.

Canonicalization

The url_canon* files are the canonicalizer. This code will transform specific URL components or specific types of URLs into a standard form. For some dangerous or invalid data, the canonicalizer will report that a URL is invalid, although it will always try its best to produce output (so the calling code can, for example, show the user an error that the URL is invalid). The canonicalizer attempts to provide as consistent a representation as possible without changing the meaning of a URL.

The canonicalizer layer is designed to be independent of the string type of the embedder, so all string output is done through a CanonOutput wrapper object. An implementation for std::string output is provided in url_canon_stdstring.h.

The main header to include for calling the canonicalizer is url/url_canon.h.

Utility

The url_util* files provide a higher-level wrapper around the parser and canonicalizer. While it can be called directly, it is designed to be the foundation for writing URL wrapper objects (The GURL later and Blink's KURL object use the Utility layer to implement the low-level logic).

The Utility code makes decisions about URL types and calls the correct parsing and canonicalzation functions for those types. It provides an interface to register application-specific schemes that have specific requirements. Sharing this loigic between KURL and GURL is important so that URLs are handled consistently across the application.

The main header to include is url/url_util.h.

Google URL (GURL) and Origin

At the highest layer, a C++ object for representing URLs is provided. This object uses STL. Most uses need only this layer. Include url/gurl.h.

Also at this layer is also the Origin object which exists to make security decisions on the web. Include url/origin.h.

Historical background

This code was originally a separate library that was designed to be embedded into both Chrome (which uses STL) and WebKit (which didn't use any STL at the time). As a result, the parsing, canonicalization, and utility code could not use STL, or any other common code in Chromium like base.

When WebKit was forked into the Chromium repo and renamed Blink, this restriction has been relaxed somewhat. Blink still provides its own URL object using its own string type, so the insulation that the Utility layer provides is still useful. But some STL strings and calls to base functions have gradually been added in places where doing so is possible.

Unsafe buffer usages

To ensure that the valid length of a buffer is always reliably conveyed, we are in the process of migrating functions that take a raw pointer and a size for string data. These functions are being updated to accept std::string_view or std::u16string_view instead. This change also applies to functions that only accept a raw pointer, which are being updated to take a string_view to prevent buffer overflows.

Currently, the codebase contains a mix of both the old, unsafe functions and the new, safer string_view-based functions. Our goal is to eventually convert all of them. This ongoing effort is tracked in crbug.com/350788890.

Caution for terminologies

Due to historical usage, the term “Standard URL” is currently used within the code to represent “Special URLs”, except for “file:” scheme URL, as defined in the URL Standard. However, this terminology is outdated and can lead to confusion, particularly now that we are supporting non-special URLs as well (crbug/1416006). For the sake of consistency and clarity, it is recommended to switch to the more accurate term “Special URL” throughout the codebase. However, this change should be carefully planned and executed due to the widespread use of the current terminology in both internal and third-party code. For a while, “Standard URL” and “Special URL” are used interchangeably.