Эпизоды
-
https://blog.stephenturner.us/p/inciteful-zotero-biologpt-semantic-scholar
I am in the middle of writing a review / perspectives paper. One that I’m confident will be exciting once we get it published. Some sections of the review cover subject matter at the outer periphery of my expertise. These are areas where I don’t have as strong of a command of the relevant literature as my collaborators do. In my Zotero library I have a collection of a few relevant papers in these areas published by my co-authors, but I needed a way to quickly find other relevant papers in this field based on the small collection of papers I already have. Inciteful + Zotero was a perfect combination. I also found BioloGPT and Semanitic Scholar to be useful for other related tasks.
Inciteful
A colleague introduced me to Inciteful (
https://inciteful.xyz/
), and at first glance it seemed to fit the bill perfectly. And, it’s free (really free, not freemium free, not free for now free — truly free, with no sign-up required). What does Inciteful do and why would you use it? From the general documentation:
[Inciteful] builds a network of papers from citations, uses network analysis algorithms to analyze the network, and gives you the information you need to quickly get up to speed on that topic. You can find the most similar papers, important papers as well as prolific authors and institutions.
And from the Use Cases documentation:
Getting Familiar With a Body of Literature
The first and most basic is familiarizing yourself with a body of literature. This happens all the time, you become interested in a topic that is not directly related to your current work and it’s tough to get a handle on the current state of that topic. Inciteful makes that easy.
Finding Literature for a Paper in Progress
As you are writing a paper it’s good to periodically check to be sure the what you are writing addresses the most recent literature in the topic.
Rounding Out a Literature Review
Literature reviews tend to start with keyword based searches using your academic search engine of choice. You often end up using complicated search strategies …Even after all these complicated searches, you have no way of knowing if anything slipped through the cracks. This is where Inciteful shines.
You can go to inciteful.xyz and start building out a collection of papers by entering DOIs, PMIDs, arXiv URLs, etc., but I found the most effective way to do this was to seed my Inciteful search from a collection of papers I already have in my Zotero library.
Using Inciteful with Zotero
I’ve been using Zotero for reference management since the 2000s when it was initially only a Firefox browser extension. I used Mendeley for a period until it was acquired by Elsevier, then switched back to Zotero. Zotero is the only reference manager I’m aware of that works with MS Word, Google Docs, and integrates seamlessly with RStudio to insert BiBTeX citations in RMarkdown/Quarto.
The Inciteful plugin for Zotero was clutch here. This allows you to highlight papers in your Zotero library, right click, and start a graph search, right from Zotero. Let’s take a look.
Demo
The literature review I’m writing isn’t a review on gene editing, but I work with a lot of brilliant genome engineers at Colossal. When I first got started here I read all the literature in this area I could get my hands on, because I came to Colossal with little background in synthetic biology. This demo uses a subset of papers in my CRISPR / genome engineering collection in Zotero.
First, you can highlight all the papers you want to seed your search with. With the Inciteful plugin installed, right click, and start a graph search.
This opens Inciteful in your browser and the first thing you’ll see is a citation graph.
It’s a nice visual, but I personally never find these kinds of graphs all that useful. The real benefit comes from the tables below. The first table is a list of similar papers which tend to cite the papers I used as input from my Zotero library.
Next you can see a list of the most important papers, by PageRank. Some of these I already have in my library. Others I don’t, but I can add these to the existing search with the “+” sign. Obvious missing paper here was the landmark 2012 paper from Emmanuelle Charpentier’s lab.
Next I can get a list of papers that cite the largest number of papers I have in my collection, which are likely review papers.
Finally, there are four sections that aren’t immediately useful for expanding my literature search. I’m glad the developer included these, because it’s interesting to see things like the top authors, institutions, journals, etc.
The “Upcoming Authors” section was interesting to me. How are these found? See the little “SQL” button at the bottom of each panel? Clicking it you can actually see the SQL code that’s running behind the scenes. And you can modify and re-execute it! Here’s the SQL for the Upcoming Authors section.
Other tools
Inciteful isn’t the only tool that occupies this space. I spent a little time with BioloGPT and Semantic Scholar. Both of these are more AI-forward than Inciteful.
BioloGPT
BioloGPT (biologpt.com) is an interesting one. It’s less of a research discovery tool and more of a Biology-focused AI research tool. That is, you start from research questions rather than from a stack of papers you already have. From the documentation:
BioloGPT is engineered to be a highly-detailed, evidence-based, and skeptical AI committed to truth-seeking and answering biology questions as accurately as possible. It rigorously cites all used papers to ensure reliability, and can even generate novel hypothesis, code, art, and experiments. By citing relevant data and maintaining a critical, empirical stance, BioloGPT directly counters potential research biases such as positive result bias, framing bias, ideologies, censorship, scientific corruption, and industry influence.
I asked about current best practices for analyzing single-cell ATAC seq data (link). You get back a short summary answer, followed by a longer answer. At first this might seem like something you can get out of ChatGPT or other tools with a recent knowledge limit. The first place BioloGPT differs from a generic AI chatbot is that assertions are backed by citations, and hoving over them gives you a preview of the paper, a very short summary, citation counts, and an evidence assertion.
BioloGPT then provides a code snippet. I’ve never actually used scanpy so I can’t verify the accuracy of this code, but it at least would help point you in the right direction of tools to take a look at, if nothing else.
Here’s where BioloGPT helps with literature discovery to some degree. Next you’ll see a list of top search results, showing you recent literature relevant to your query.
The next section presents potential hypotheses, a hypothesis graveyard, and potential experiments. This might be a little speculative, but I think the idea could help guide your research into areas you might not have explored, especially if it’s an area you’re not already intimately familiar with.
Another interesting feature of BioloGPT is its ability to create plots based on an input query. The example query Graph of CD4 expression across all immune cell types produces the following result. The interactive graphic it produces is made with Plotly, and BioloGPT provides the sources it used to create the plot.
Finally, BioloGPT saves your queries, and in your account page, you can fill in additional areas of interest. With this you’ll get a weekly roundup email pointing out new literature relevant to the questions you’ve asked and the topics you claim interest in.
Semantic Scholar
Semantic Scholar (semanticscholar.org) is a free AI-driven search and discovery tool. You start by searching for a subject or paper. Each paper has an AI-generated TLDR, with some citation information available on the side.
Once you find a paper, you can save it to your Semantic Scholar library, and you can save papers into different collections. Once you do so, you can opt in to receiving regular emails with newly published research that’s highly related to papers in one or more of your collections. This is the feature I love most about Semantic Scholar. Where Inciteful doesn’t have a login or session persistence, Semantic Scholar helps with staying on top of recently published literature relevant to literature you’ve already collected.
Others I didn’t try
There are plenty of other tools that occupy this space. I haven’t had a chance yet to use Elicit (elicit.com) or Research Rabbit (researchrabbit.ai). There’s also PaperQA2 (“superhuman scientific literature search”) — see the blog post and GitHub page (Apache license). There’s Google’s NotebookLM (
https://notebooklm.google/
), which is more of a research assistant tool than a literature discovery tool. I imagine we’ll see many more AI-driven research support tools like this in the near future.
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit blog.stephenturner.us -
Full post at https://blog.stephenturner.us/p/illuminate-preprints-with-an-ai-generated-podcast
Google has a new experimental tool called Illuminate (illuminate.google.com) that takes a link to a preprint and creates a podcast discussing the paper. When I tested this with a few preprints, the podcasts it generated are about 6-8 minutes long, featuring a male and female voice discussing the key points of the paper in a conversational style.
There are some obvious shortcomings. It doesn’t know how to pronounce words that aren’t real words (for example, bioRxiv, or Heng Li’s new Ropebwt3), and like many text-based genAI tools, it overuses the word “delve.” And, when I gave it my recent paper describing biorecap (blog post, paper), it delved into a discussion on generative AI ethics that I never wrote about in the paper. But, aside from these few quirks, I actually enjoyed listening to the audio it produced.
I used Illuminate to generate podcasts discussing a few preprints on arXiv quantitative biology that caught my attention lately, or in the case of biorecap and pracpac, those that I authored.
The full podcast at the top of this post has all six of these preprints together, timestamped with chapters if you’re listening to this in a podcast app. Alternatively, you can listen to each individual paper below.
biorecap: an R package for summarizing bioRxiv preprints with a local LLM (https://arxiv.org/abs/2408.11707)
BWT construction and search at the terabase scale (https://arxiv.org/abs/2409.00613)
Genomic Language Models: Opportunities and Challenges (https://arxiv.org/abs/2407.11435)
Near to Mid-term Risks and Opportunities of Open-Source Generative AI (https://arxiv.org/abs/2404.17047)
Guidelines for releasing a variant effect predictor (https://arxiv.org/abs/2404.10807)
pracpac: Practical R Packaging with Docker (https://arxiv.org/abs/2303.07876)
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit blog.stephenturner.us -
Пропущенные эпизоды?
-
This is an experiment. I’m not if or when I’ll ever do anything with it. In the meantime, subscribe to my newsletter at https://blog.stephenturner.us/.
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit blog.stephenturner.us