This file explains how the three questions have been solved.
- To access the abstract of each paper the grobid_tei_xml parser was used. The abstract is available as .abstract attribute. To create a worldcloud out of the abstract the wordcloud and pyplot were used.
- In order to count the number of figures in each paper the xmltodict parser was used. The list of figures can be accessed by ['TEI']['text']['body']['figure']. To get the total number the len() function was applied.
- In order to get the link the grobid_tei_xml parser was used again. If existing the link can be accessed by header.url