Source code and bibliographic files are hosted on Github: flaminglasrswrd/citereddit.
Database is located at citereddit Zotero Group
Many subreddits revolve around academic citations. /r/Nootropics/, /r/DrugNerds/, /r/FoodNerds/ are among my favorite sources for new science. Unfortunately, these academic reddits consist of a loose collection of urls. They lack the sharability of a bibliographic database.
I propose creating a research database through a popular reference manager like Zotero or Mendeley. We could, of course, do this manually, but ain’t nobody got time for that. Scraping for citations automatically sounds like a better option.
The announcement post on reddit explains the impetus and initial planning comments.
The current version of citereddit is quite messy. It is controlled via command line like this:
python scrape.py -s [API_SECRET] -i [API_ID] -r [SUBREDDIT] -u [USERNAME] -p [PASSWORD]
In order for this to work, you must also have an active zotero translation server running (built from source) and have compiled the PRAW library with my changes yourself. I don’t expect anyone to be able to do this. Future versions will be considerably more user friendly. For now, the script outputs three files for each call.
So a call for subreddit
Library.bib is in bibtex format and can be imported using a reference manager like Zotero. Main.log is a complete debug log for each run.
urls.txt is a one-per-line list of urls imported by the script. If a url was successfully imported, the url is followed by
**(successful)**. This file and the debug information is intended to be used later to prevent duplicates.
- scrape subreddit for links
- import links to Zotero using web translators
- export group to Mendeley or other formats
- integrated reference manager group storage
- hosted storage
Sharing the full-text content of references which are obtained from behind a paywall probably infringes copyright.
Zotero offers annotation syncing with each reference. Mendeley has group editing of pdfs. The bibtex citation format has a key for “notes”.
Reddit is not the only informal source of citations. Examine is a very reputable source in the world of supplements, boasting hundreds of thousands of citations. It is referenced often on these subreddits. Integration with their internal reference database (if they have one) would be mutually beneficial.
TODO items in
scrape.py. There are many.