Skip to content

Data interpretation #701

Answered by igorbrigadir
somowennie asked this question in Q&A
Aug 19, 2023 · 1 comments · 5 replies
Discussion options

You must be logged in to vote

twarc-csv "flattens" the nested JSON structure of a tweet, adding some extra columns that should make it easier to work with. The full list is in https://github.com/DocNow/twarc-csv/blob/main/dataframe_converter.py#L13

The best place to see where it all comes from is the Data Dictionary here: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet

It also does some pre-processing, like extracting a list of hashtags without the character indexes in the text, so for example, in the csv, entities.hashtags is a list of hashtags extracted from the hashtags part of the entities object:

{"hashtags": [{"start": 70, "end": 75, "tag": "Lega"}, {"start": 176, "end": 192,…

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@somowennie
Comment options

@somowennie
Comment options

@SamHames
Comment options

@somowennie
Comment options

@igorbrigadir
Comment options

Answer selected by somowennie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants