Repository for the RecSys 2021 paper: I want to break free! Recommending friends from outside the echo chamber

Recommender systems serve as mediators of information consumption and propagation . In this role, these systems have been recentlycriticized for introducing biases and promoting the creation of echo chambers and filter bubbles, thus lowering the diversity of bothcontent and potential new social relations users are exposed to. Some of these issues are a consequence of the fundamental conceptson which recommender systems are based on. Assumptions like the homophily principle might lead users to content that they alreadylike or friends they already know, which can be naïve in the era of ideological uniformity and fake news. A significant challenge inthis context is how to effectively learn the dynamic representations of users based on the content they share and their echo chamberor community’s interactions to recommend potentially relevant and diverse friends from outside the network of influence of the users’echo chamber. To address this, we devise FRediECH (a Friend RecommenDer for breakIng Echo CHambers), an echo chamber-awarefriend recommendation approach that learns users and echo chamber representations from the shared content and past users’ andcommunities’ interactions. Comprehensive evaluations over Twitter data showed that our approach achieved better performance (interms of relevance and novelty) than state-of-the-art alternatives, validating its effectiveness.

The overall architecture of FRediECH is schematized in the following figure. FRediECH takes as input users and their interactions (replies, retweets, and mentions). The output is an estimated of the relationship or interaction strength between two users. Each user is represented by an embedding of their latent features and an embedding of their tweets. FRediECH’s architecture is inspired by the Deep & Wide architecture. The Wide part bases its prediction only on the latent embeddings. On the other hand, the Deep part concatenates the latent and tweet embeddings to produce a complete user representation. These vectors are processed by two blocks of GCN layers, representing the target users and the recommendees. The outputs of each GCN are passed through two dense layers to make a prediction. Finally, the predictions of the Deep and Wide components are added to produce the final strength estimation.

Data

Evaluation was based on the obamacare data collection, which includes tweets related to the obamacare and #aca hashtags in Twitter. The data collection also includes an estimated polarity of users sharing the tweets, based in the model in "Tweeting from left to right: Is online political communication more than an echo chamber?", in which scores allows distinguishing between democrat and republican users.

Tweets in the obamacare collection were retrieved using the Faking it!. For each tweet, we retrieve its content, user information and conversational thread (i.e, replies) and its retweets. From the original set of tweets in \texttt{obamacare}, we were able to retrieve approximately 8 million public tweets belonging to 8,164 users, and 585,524 adjacent users (users that were mentioned or replied to but that did not write any tweet on the original data collection).

Note: As per Twitter TOS, the shared graphs only include user and tweet ids involved in the interactions. No user information was included in the analysis.

Note2: The original data collection can be obtained upon request to the authors of "Political Discourse on Social Media: Echo Chambers, Gatekeepers, and the Price of Bipartisanship".

Note 3: The Faking it! can be used to rehydrate the tweets collection based on the ids in the graphs. In the folder data you will find:

Folders training and test. Each folder contains four files, one for each type of interaction (mentions, replies, retweets) and one mixing the three types.

Additional files:

df_tweets_users_created.csv tweets metadata incluiding tweet_id, user_id and date of creation. As the file contains long ids, try not to open it with spredsheet software.
df_user_leaning.csv users leaning information, including their own_leaning, leaning_mentions, leaning_retweets, leaning_replies, leaning_all. As the file contains long ids, try not to open it with spredsheet software.
train_ds.pickle, train_ds_no_neg_smp.pickle and test_ds.pickle are helper files for FRediECH to avoid needing to have everything loaded into memory.
embeddings-dates.h5 is the embedding model to compute node distances.
cos.npy contains pre-computed cosine similarity between nodes.
training_tweeets.npy contains the BERT representation of training tweets.

Note: this files are stored in GDrive do to GitHub size limitations.

Graph format

NetworkX was used for supporting the graphs.

Node attributes

id. User ID.
central. Whether the node/user belongs to the largest connected component.

Example of node with attributes:

(1234567890, {'central': True})

Edge attributes

source_id. User ID.
target_id. User ID.
weight. Number of interactions between the users.
date, date_replies and date_retweets. datetime.datetime representing the first date in which the users interacted.
tweets_mentions, tweets_replies, tweets_retweets. List of tweets for each type of interaction.

Example of edge with attributes:

(1234567890, 9876543210, {'weight': 5, 'date': datetime.datetime(2016, 10, 11, 6, 20, 14), 'date_replies': None, 'date_mentions': datetime.datetime(2016, 10, 11, 6, 20, 14), 'date_retweets': datetime.datetime(2016, 10, 11, 6, 20, 14), 'tweets_replies': [], 'tweets_mentions': [1234567890000987, 12345678765434567, 123456787654324567], 'tweets_retweets': [12345673454324567, 9876673454324567]})

Model

In the model folder, you can find the final FRediECH model reported in the paper.

Files

The files are divided into groups that should be run in order.

Data preparation

RecSys-Part2-New-FullGraph.ipynb: Graph processing and distance pre-computation.

Training model

RecSys-Part2.5-New-NegSampling-Train-Dual.ipynb: Trains the FRediECH_dual model.
RecSys-Part2.5-New-NegSampling-Train-Mentions-Retweets.ipynb: : Trains the FRediECH_mention_retweet model.
RecSys-Part2.5-New-NegSampling-Train-Mentions.ipynb: Trains the FRediECH_mention model.
RecSys-Part2.5-New-NegSampling-Train-No-Wide.ipynb: Trains the FRediECH_nowide model.
RecSys-Part2.5-New-NegSampling-Train-NoBert.ipynb: Trains the FRediECH_nobert model.
RecSys-Part2.5-New-NegSampling-Train-Replies-Mentions.ipynb: Trains the FRediECH_reply_mention model.
RecSys-Part2.5-New-NegSampling-Train-Replies-Retweets.ipynb: Trains the FRediECH_reply_retwees model.
RecSys-Part2.5-New-NegSampling-Train-Replies.ipynb: Trains the FRediECH_reply model.
RecSys-Part2.5-New-NegSampling-Train-Retweet.ipynb: Trains the FRediECH_retweet model.
RecSys-Part2.5-New-NegSampling-Train.ipynb: Trains the FRediECH model.
RecSys-Part2.5-New-Train-No-Wide.ipynb: Trains the FRediECH_nowide_nons model
RecSys-Part2.5-New-Train.ipynb: Trains the FRediECH_nons model.

Predictions

RecSys-Part3-New-NegSampling-Predict-Dual.ipynb: Predicts using the FRediECH_dual model.
RecSys-Part3-New-NegSampling-Predict-Mentions-Retweets.ipynb: Predicts using FRediECH_mention_retweet model.
RecSys-Part3-New-NegSampling-Predict-Mentions.ipynb: Predicts using FRediECH_mention model.
RecSys-Part3-New-NegSampling-Predict-No-Wide.ipynb: Predicts using FRediECH_nowide model.
RecSys-Part3-New-NegSampling-Predict-NoBert.ipynb: Predicts using FRediECH_nobert model.
RecSys-Part3-New-NegSampling-Predict-Replies-Mentions.ipynb: Predicts using FRediECH_reply_mention model.
RecSys-Part3-New-NegSampling-Predict-Replies-Retweets.ipynb: Predicts using FRediECH_reply_retwees model.
RecSys-Part3-New-NegSampling-Predict-Replies.ipynb: Predicts using FRediECH_reply model.
RecSys-Part3-New-NegSampling-Predict-Retweets.ipynb: Predicts using FRediECH_retweet model.
RecSys-Part3-New-NegSampling-Predict.ipynb: Predicts using FRediECH model.
RecSys-Part3-New-Predict-No-Wide.ipynb: Predicts using FRediECH_nowide_nons model
RecSys-Part3-New-Predict.ipynb: Predicts using FRediECH_nons model.

Citation

If you use FRediECH, please cite our work:

@inproceedings{10.1145/3460231.3474270,
  author = {Tommasel, Antonela and Rodriguez, Juan Manuel and Godoy, Daniela},
  title = {I Want to Break Free! Recommending Friends from Outside the Echo Chamber},
  year = {2021},
  isbn = {978-1-4503-8458-2/21/09},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3460231.3474270},
  doi = {10.1145/3460231.3474270},
  pages = {},
  numpages = {},
  keywords = {link prediction, echo chambers, social media, diversity, filter bubbles, social network analysis},
  location = {Amsterdam, Netherlands},
  series = {RecSys '21}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
data_preparation		data_preparation
model		model
prediction		prediction
training		training
LICENSE		LICENSE
README.md		README.md
frediech_architecture.png		frediech_architecture.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository for the RecSys 2021 paper: I want to break free! Recommending friends from outside the echo chamber

Data

Graph format

Node attributes

Edge attributes

Model

Files

Data preparation

Training model

Predictions

Citation

Contact info:

About

Releases

Packages

Languages

License

tommantonela/frediech_recsys2021

Folders and files

Latest commit

History

Repository files navigation

Repository for the RecSys 2021 paper: I want to break free! Recommending friends from outside the echo chamber

Data

Graph format

Node attributes

Edge attributes

Model

Files

Data preparation

Training model

Predictions

Citation

Contact info:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages