Skip to content

A script to help ensure all Pleiades place pages get a snapshot at archive.org after they've been changed.

License

Notifications You must be signed in to change notification settings

isawnyu/pleiades_wayback

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Consults datasetter pleiades dataset for places modified within a certain time frame, then checks those pleiades URIs live on the web to make sure the place hasn't been withdrawn or deleted, then checks the internet archive to see if their snapshot of that page has been updated since the last pleiades revision and, if not, asks internet archive to grab a new snapshot.

Run it like:

python waybackit.py

That will run silently. To get some feedback, try:

python waybackit.py -v

By default, the script looks back over the past week. If you want to change that horizon:

python waybackit.py -s 2022-07-20

There are more options:

python waybackit.py -h
                    [-f FROM] [-u USERAGENT]

Ensure recently added/changed Pleiades places are archived

options:
  -h, --help            show this help message and exit
  -l LOGLEVEL, --loglevel LOGLEVEL
                        desired logging level (case-insensitive string: DEBUG, INFO,
                        WARNING, or ERROR (default: NOTSET)
  -v, --verbose         verbose output (logging level == INFO) (default: False)
  -w, --veryverbose     very verbose output (logging level == DEBUG) (default: False)
  -s START, --start START
                        date when to start archiving (default: one week ago)
  -e END, --end END     date when to end archiving (default: today)
  -d DATASETTER, --datasetter DATASETTER
                        path to location of datasetter cache (default:
                        ~/Documents/files/D/datasetter/data/cache)
  -f FROM, --from FROM  email address for http request headers (default:
                        pleiades.admin@nyu.edu)
  -u USERAGENT, --useragent USERAGENT
                        user agent for http request headers (default:
                        PleiadesGazetteer/today (+https://pleiades.stoa.org))

About

A script to help ensure all Pleiades place pages get a snapshot at archive.org after they've been changed.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages