Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Swift runtime ANTLRInputStream that can’t read Unicode scalars #3025

Merged
merged 5 commits into from
Oct 11, 2021

Conversation

niw
Copy link
Contributor

@niw niw commented Jan 3, 2021

Problems

ANTLRInputStream in Swift runtime is using array of Character as internal representation to get Unicode code point as Int to supply it to the lexer. However, Swift Character is not representing Unicode code point but representing Unicode grapheme, therefore if the original String contains characters that build up from multiple Unicode code points such as Family emoji (👨‍👩‍👧‍👦, which is represented in a single Character in Swift but actually U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466), ANTLRInputStream will not able to get each Unicode code point.

Solution

  • Use array of UnicodeScalar instead.
  • Add unit tests to ensure ANTLRInputStream can read each unicode code point.

Turns out, using `UnicodeScalarView` is extremely slow.
@hanjoes
Copy link
Member

hanjoes commented Oct 11, 2021

will try to merge it as part of #3301

@parrt parrt added this to the 4.9.3 milestone Oct 11, 2021
@parrt parrt merged commit c293e23 into antlr:master Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants