Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Shortcode Search #7

Closed
MyriaCore opened this issue Jun 28, 2020 · 3 comments · Fixed by #9
Closed

Implement Shortcode Search #7

MyriaCore opened this issue Jun 28, 2020 · 3 comments · Fixed by #9

Comments

@MyriaCore
Copy link
Contributor

It'd be cool of emoji shortcodes like :zap: and :tada: used by sites like github, discord, slack, etc could be included in the search data.

There are sites with data on this, like emojipedia, for example. It's just a matter of finding where the data is, and modifying EmojiSpider.py to scrape them.

@MyriaCore
Copy link
Contributor Author

emojipedia seems to use the unicode name of the emojis in the urls of the emoji pages. For example, the page for 😄 is listed under grinning-face-with-smiling-eyes, which turns out to be the URL-encoded version of that emoji's name.

So, all we'd need to do to create the URL from the listing on unicode.org is make all characters in the emoji name lowercase, replace all spaces with dashes, and then insert that to the string https://emojipedia.org/%s/. From there, you'd just scrape the resulting page for the first list element in the Shortcodes section.

@MyriaCore
Copy link
Contributor Author

MyriaCore commented Jul 20, 2020

I'm just gonna move ahead and try to draft this functionality in a new branch of my fork. The branch is based on emoji-packs@2cdd2c8, which currently lives as PR #8.

@MyriaCore
Copy link
Contributor Author

MyriaCore commented Jul 20, 2020

There's also another page that might be more useful. Here's the example page for 😄. The format of the URL is https://emojipedia.org/emoji/%s/, and we'd be able to just insert the emoji right into the %s. This might be more consistent than trying to find the page by using the emoji's CLDR short name, but it appears it'd be ultimately more difficult to parse the page, since the normal page gives a #shortcodes css class to the shortcodes section.

These /emoji pages seem to be most useful when used with the list of all emojis. In the future, we might use this site primarily for scraping. However, ATM I think we'll stick with the CLDR short name pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant