A twitter bot to monitor for pubmed links and check for full papers: @ScholarlyBot

I am trying to improve my coding skills, so I pieced together a python script that sarcastically replied to people to wrote “Dr Oz” and “great” in a tweet. That account was quickly suspended, so I made something a little more useful- one that checks for full texts of research papers indexed by Google Scholar using twitter as the interaction medium, through the account @ScholarlyBot.

It does 2 3 4 5 things:

  1. You can @ it a Google Scholar query and it will check if the first result has an indexed full text available for free. It sends back the result of the check. Because it only grabs the first result, it helps to include information like an author, year, or journal if the title is generic. You don’t have to be following the account to do this. Note: put the ‘@ScholarlyBot’ before the query.
    1. (Added 8.29) Now you can @ it pubmed or journal article links and it will check for the full text. Pubmed links will be more reliable until I am able to test further. Be sure to begin the tweet with the @ symbol.
  2. If you follow the account, it will follow you back and monitor your tweets for links to the pubmed database, get the abstract title, then query Google Scholar for the full text. Only if it finds a full text does it send you a link. Edit 8.17: I changed this to reply if it doesn’t find a link as well. Just unfollow it if you want to stop receiving messages. Note that there is a delay in Google Scholar indexing new abstracts and full texts (I don’t know if it is variable or not and I don’t think they publish this information), so keep that in mind when it searches papers from this year.
  3. (Added 8.27) If you are following the account, it will now scan for non-pubmed journal links and search Google Scholar for the full text. At this point it seems fairly reliable in my testing, but since there doesn’t seem to be a standardization in how journals encode their data, I’ll continue monitoring and improving.
  4. (Added 9.14) You can push abstracts and PDFs to your Mendeley library. See this post for details.

There are a couple limitations: twitter restricts the number of API calls that can be made per hour, so there will be a small delay after sending it a query or when it is checking for pubmed links (at most ~20 seconds). It also currently runs from my laptop, so I can’t guarantee it will be working all the time. I hope to move it to a platform in the cloud soon. There are numbers in parentheses after the messages- twitter shoots back an error message if a duplicate tweet has been sent too soon after the last, so this is just a way to get around that. I tested it pretty thoroughly, but can’t promise that everything works perfect. If you see some strange behavior please let me know.

It isn’t the most amazing thing, but now that I know how to grab and manipulate twitter data it will help for future projects. I’ll continue to develop it as I think of how it could be more useful, but if anyone has any ideas let me know in the comments!