Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular parser performance improvements #28

Merged
merged 5 commits into from
Jan 24, 2016

Conversation

thunderer
Copy link
Owner

As discussed with @rhukster in #26 and #27 there are massive performance problems in RegularParser when trying to parse big documents. This PR provides significant improvements for parsing speed while maintaining the correctness this parser was built for. Blackfire reports -91% performance improvement on "small" example and "full" completes in about 3.5s (which is at least 4500% improvement since it was reported that it didn't complete in more than half an hour).

@thunderer thunderer self-assigned this Jan 23, 2016
@thunderer
Copy link
Owner Author

By further analyzing Blackfire call graphs I see that tokenization is now a bottleneck, consumes 75% of time for "small" example and 97% of "full" one. Right now it uses an iterated regex approach, need to work on better one.

@rhukster
Copy link

Blackfire sounds like an interesting tool. I tried it once, but couldn't get it to work, maybe i'll give it another shot.

…rnating matches which completes last performance bottleneck
@thunderer
Copy link
Owner Author

By rewriting tokenizer to use single preg_match_all() I managed to further speed up RegularParser - now "full" example takes less than 350-400ms to complete (down from 3.5s). I think that concludes the current batch of optimizations and since the build is green, I'm merging it into master.

thunderer added a commit that referenced this pull request Jan 24, 2016
Regular parser performance improvements
@thunderer thunderer merged commit 6b8bf5d into master Jan 24, 2016
@thunderer thunderer deleted the regular-parser-performance branch January 24, 2016 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants