Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MySQL #1

Merged
merged 20 commits into from
Mar 30, 2022
Merged

Add support for MySQL #1

merged 20 commits into from
Mar 30, 2022

Conversation

muir
Copy link
Owner

@muir muir commented Jul 16, 2021

This also backfills SkipIf

MySQL doesn't have DDL transactions so it needs a lot of help making sure that you don't shoot yourself in the foot.

Should sqltoken be its own repo?

Should dgorder be its own repo?

lsmysql/README.md Outdated Show resolved Hide resolved
lsmysql/README.md Outdated Show resolved Hide resolved
@muir
Copy link
Owner Author

muir commented Jul 16, 2021

Thanks, @aaronlehmann I've fixed those typos.

Copy link
Collaborator

@aaronlehmann aaronlehmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little concerned about the complexity of the hand-rolled tokenizer. Have you looked at any third-party lexer libraries like https://github.com/alecthomas/participle?

'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w' /*x*/, 'y', 'z',
'A' /*B*/, 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T' /*U*/, 'V', 'W' /*X*/, 'Y', 'Z',
'_':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems confusing and hard to maintain. At least there should be a comment explaining these values. It may make sense to handle these as ranges outside the switch.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a performance hack: anything missed will be caught by the unicode code path below. See added comment.

sqltoken/tokenize.go Outdated Show resolved Hide resolved
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'_',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be cleaner with range expressions inside a general switch {

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment: this is a performance hack. Would a range expression perform better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think the performance difference with a range expression would be imperceptible in the context of running database migrations.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to use this code to fix sqlx --- it would be run on nearly every (or maybe every?) query done by sqlx. Performance matters in that context.

@muir
Copy link
Owner Author

muir commented Jul 19, 2021

A little concerned about the complexity of the hand-rolled tokenizer. Have you looked at any third-party lexer libraries like https://github.com/alecthomas/participle?

I would be concerned too except that I used the coverage tool and hit 100% coverage (the coverage tool is buggy so it doesn't report 100% but if you look at the details, it is 100%). Not all inputs tried, of course.

Part of the motivation for this is that I noticed that sqlx is incorrectly parsing SQL when they're doing substitutions -- what they're doing is high performance but wrong. My intent is to open a PR against sqlx to use my tokenizer. Due to the way I wrote my tokenizer, I expect it's performance to be very good.

If I used a regular lexer, I would need to treat MySQL and PostgreSQL as separate grammars.

Tradeoffs galor!

@muir muir merged commit 83aa21d into main Mar 30, 2022
@muir muir deleted the mysql branch March 30, 2022 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants