-
-
Notifications
You must be signed in to change notification settings - Fork 11
Specify all presentation sequences #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It's a lot of them. My recentish pr only covered a small part. |
I'll be the first to bring up automation here: That means any concerns about such automation should not block this PR. |
I had to find a way to be confident in the content of this PR anyway, so I figured I could make tests so that the automation part is already done. I left a commit with the tests failing, so that you can see the output in CI. Notably, there was one mistake introduced in ae98eb0: I added the text presentation selector to For now, I query the list of presentation sequences defined by Unicode from the internet every time the tests are run. Feel free to suggest a better way. |
sym
and emoji
This PR now contains meta changes. |
How about caching it in a file Also, I'd like the web-request part of this test to be opt-in to begin with (the test can still run without it if the cache file already exists). IMO running |
Initially I thought it may be better to download the file as part of a build script for tests only (which would cache it for as long as the source code is not modified), but I'm not sure how to run something only for building tests? Also, maybe this is a bad idea for some other reason. I think your solution is probably better anyway. Maybe the file could even be part of the repo so that it doesn't need to be downloaded every time, but somehow that doesn't feel right to me. Also, that might have some licensing issues (we would probably need to include https://www.unicode.org/license.txt as well). Additionally, I want to clarify that the dependency on |
Yeah I got that, but it's still a huge dep tree that not everyone may want to have to download before running [the rest of] the tests. |
If we're gonna go with the downloading thing, then |
I switched to |
According to The Cargo Book, it is not possible to have a build dependency for tests only, so this feature trick is necessary:
|
Would it be possible to have some shorthand that specifies text vs emoji form? |
Could be Footnotes |
Or |
An alternative is to simply paste the corresponding codepoints in the symbol lists |
That defeats the point of having escape sequences to begin with. We could also paste in the zwj and friends verbatim, but that's terrible for editing, diffing, etc. |
This (only with |
I think this would be good |
The other variation selectors have multiple roles, and therefore aren't easy to name. Maybe There's a couple dozen mathematical symbol variants involving VS1 https://www.unicode.org/Public/16.0.0/ucd/StandardizedVariants.txt Edit: Only selectors 1-4, 7, 15 and 16 are currently in use. |
I implemented |
I think all the concerns were addressed and so this is ready for review |
This PR fixes #21, fixes #23, and closes #25, by adding the appropriate variation selectors to symbols that exist in both
sym
andemoji
. I verified that all the variation sequences are defined in by Unicode.12I have marked this PR as a draft because the next step is to add variation selectors to all symbols that allow it, whether present in both
sym
andemoji
or not, to prevent ambiguity.This made me realize that some emojis are poorly named, but improving this is a task for a separate PR.
Related: typst/typst#6489 (comment).
Footnotes
https://www.unicode.org/reports/tr51/#Emoji_Variation_Sequences ↩
https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt ↩