The Limitations of Using ChatGPT for Scientific Review Articles

Someone asked if I could make a video showing how to use Chad GPT to write a scientific review article. One problem with using ChatGPT to summarize the research on a particular scientific topic is that the chat bot may not provide accurate citations of peer-reviewed papers or may even completely make up sources. In this video, I examine whether it is possible to use ChatGPT to summarize published research papers and discuss some of the pitfalls involved.

I use the GPT 3.5 version, which is free and the one most likely used by students who cannot afford the twenty dollars per month to access GPT4. At the end of the video, I’ll suggest an alternative free AI tool for summarizing scientific research.

As I’ve discussed in previous videos, ChatGPT and similar tools such as Google’s Bard have a problem with providing citations for the source of their output. In some cases, the chat bot simply makes them up. For example, I asked how climate change will affect Mangrove forests and requested at least five citations in APA style. Chad GPT responded by saying it does not have access to proprietary publishers and cannot provide an answer without referencing websites. It then proceeded to craft an answer based on general knowledge about the topic up to the September 2021 knowledge cut-off.

As you can see, Chad GPT lists five known effects of climate change, each with a citation supposedly supporting the statement. However, upon checking the references, it becomes clear that some of them are either incorrect or made up.

Out of the five citations provided, only one correctly referenced a published paper that provided information on the stated topic. This is problematic, especially for writing a review paper that requires accurate and in-depth accounting of the relevant literature.

An alternative AI tool for writing scientific literature reviews is illicit. It can find papers on a topic, extract key points from individual articles, and summarize that information. It uses the Semantic Scholar database to provide a list of relevant papers. Elicit reads the abstract of each paper and generates a summary focused on answering the question posed. It also provides a one-sentence summary of each abstract.

Elicit’s output is only as good as the papers it is based on, so it is up to the user to assess the quality of the research being reviewed. This requires reading the paper and evaluating the rigor of the methods and the interpretation of the results.

The creators of illicit acknowledge that it may make mistakes in summarizing papers, but they estimate that it is currently 80 to 90 percent accurate in its response. It is important for users to double-check the provided summary against what is actually written in the abstract.

To sum up, Chad GPT cannot write your scientific review paper. Currently, it seems only able to provide a generic summary of a topic, lacking the in-depth evaluation of research and synthesis required in writing a review paper. It may provide legitimate-sounding citations that are inaccurate or non-existent, making it nearly impossible to be confident in the output provided on a topic.

This may change in the future as AI tools are improved. For now, AI tools such as illicit may be better at helping summarize the literature on a scientific topic.

Thanks for watching, and please like this video if you found it helpful.