GitHub - burness/arxiv_tools: A tool to get the arxiv papers

Structure

├── README.md
├── email
│   └── user_email.list
├── flask
├── papers
│   └── 2016-01-05
│       └── cs.cv
└── spider

email folder include the scripts of send emails to users
flask folder include the scripts of our web interface
papers folder include the paper we get from arxiv.com, named by data-time, and the subfolder in the folder of date-time is the research area such as cs.cv
spider include the scripts to scrawl the papers from arxiv.

user_id	user_nickname	user_email	subject
1	hello	hello@hello.com	cs_cv
2	hello	hello@hello.com	cs_kl
3	hello2	hello2@hello.com	cs_cv

extract the information of the pdf
add the support of multi thread to download pdfs
add the config of the url including research area
add the module to write the all paper info to a file in the pdf folder 'summary.csv'
- [] add the support of filter the download failed files in the summary.csv
add the email to format the area email to the users
add the flask module including add the user email
add the module that python read the pdf files, detailed in Python读取PDF内容
[] replace the write file to sqite data
- [] replace write_file with write_sqite_file
- [] replace the run() in deploy_email and deploy_download_pdfs.py to be sqite version
[] add the module of the paper recommendation

deploy_download_pdfs.py: Scrapy the pdfs each week according the user_info.csv
deploy_email.py: Send the emails

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
flask		flask
papers/pdfs		papers/pdfs
send_email		send_email
spider		spider
wechat		wechat
.gitignore		.gitignore
README.md		README.md
deploy_download_pdfs.py		deploy_download_pdfs.py
deploy_email.py		deploy_email.py