-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MySQL #1
Conversation
Thanks, @aaronlehmann I've fixed those typos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little concerned about the complexity of the hand-rolled tokenizer. Have you looked at any third-party lexer libraries like https://github.com/alecthomas/participle?
sqltoken/tokenize.go
Outdated
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w' /*x*/, 'y', 'z', | ||
'A' /*B*/, 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', | ||
'N', 'O', 'P', 'Q', 'R', 'S', 'T' /*U*/, 'V', 'W' /*X*/, 'Y', 'Z', | ||
'_': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems confusing and hard to maintain. At least there should be a comment explaining these values. It may make sense to handle these as ranges outside the switch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a performance hack: anything missed will be caught by the unicode code path below. See added comment.
sqltoken/tokenize.go
Outdated
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', | ||
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', | ||
'_', | ||
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be cleaner with range expressions inside a general switch {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment: this is a performance hack. Would a range expression perform better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think the performance difference with a range expression would be imperceptible in the context of running database migrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to use this code to fix sqlx --- it would be run on nearly every (or maybe every?) query done by sqlx. Performance matters in that context.
I would be concerned too except that I used the coverage tool and hit 100% coverage (the coverage tool is buggy so it doesn't report 100% but if you look at the details, it is 100%). Not all inputs tried, of course. Part of the motivation for this is that I noticed that sqlx is incorrectly parsing SQL when they're doing substitutions -- what they're doing is high performance but wrong. My intent is to open a PR against sqlx to use my tokenizer. Due to the way I wrote my tokenizer, I expect it's performance to be very good. If I used a regular lexer, I would need to treat MySQL and PostgreSQL as separate grammars. Tradeoffs galor! |
* mysql: add github action * mysql: github action: fix * mysql: github action: fix2 * mysql: github action: fix3
This also backfills SkipIf
MySQL doesn't have DDL transactions so it needs a lot of help making sure that you don't shoot yourself in the foot.
Should
sqltoken
be its own repo?Should
dgorder
be its own repo?