Skip to content

Collection of Python scripts used in my MSc thesis at the Oxford Internet Institute. The thesis investigated differences in "crowdsourced history" between Wikipedia language editions by analysing the hyperlink ego network of the "World War 2" article in English and German editions of Wikipedia.

Notifications You must be signed in to change notification settings

ccosborne/Wikipedia-Networks

 
 

Repository files navigation

Wikipedia Network Analysis

Python scripts used to collect and analyse Wikipedia data for my MSc thesis at the Oxford Internet Institute.

This thesis investigated how differences in both the content and structure of "crowdsourced history" between language editions on Wikipedia are developed and organised by hyperlinks and in turn the hyperlink structure that connects articles and governs possibilities of navigation between articles. The thesis investigated this phenomenon via a case study of the hyperlink ego network of the "World War 2" articles in the English and German editions of Wikipedia.

Python script to scrape hyperlink data from Wikipedia developed by @pgilders (forked)

Python scripts in Jupyter Notebooks developed by @ccosborne, including to:

  • collect properties of single articles and compare such properties between language editions
  • analyse hyperlink networks, e.g. network structure, degree distribution, components, PageRank centrality, etc.
  • collect attributes (i.e. properties) for all articles in hyperlink networks (for correlation analysis and visualisation)
  • collect dates in articles in network to visualise aggregate timeline of thousands of articles

Please note these are not all the scripts used in the research but these cover a large portion of the work done. Other work included comparison of hyperlink networks via Jaccard Similarity, analysis of edit wars in articles, correlation analysis of attributes and centrality values for articles, visualisation of networks (with Gephi) and so on. If you would like to read the thesis, feel free to contact me at: caileanosborne@gmail.com

About

Collection of Python scripts used in my MSc thesis at the Oxford Internet Institute. The thesis investigated differences in "crowdsourced history" between Wikipedia language editions by analysing the hyperlink ego network of the "World War 2" article in English and German editions of Wikipedia.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.4%
  • Python 2.6%