18 November 2022 -

Giving credit where it's due

Recently I finished a new manuscript (aka a paper, an article, a long essay) about how we can make better plants to feed the world! One task the publishing journal asked us authors to do is to suggest some qualitied experts in the field as a reviewer to evaluate the quality of our masterpiece.

Several thoughts include someone who has expertise on the subject matter, someone whom the authors have no conflicts of interests with, or perhaps someone who was cited a lot in the written article itself!

Of course, if I cite someone a lot, it means I refer to their work as a basis for my argument, either supporting or against it. But, looking back at how many times I cited them, I think this sounds like a fun (and probable less biased that just coming up with the names on top of my head) way to find names to suggest as reviewers.

So, how do I know whom I have cited and how many times?

I use an open-source citation manager, called Zotero. For a long time I have been looking for such function, and seems like many other people are having the same problem. Well, there must be a solution out there!

I found a couple of great ideas! One of them is a free offline tool that can extract a list of cited papers out of my manuscript automatically https://github.com/rmzelle/ref-extractor. Kudos to Dr. Zelle for sharing this free open source tool! I really aspire (as my abilities allow lol) to share resources like this with the community in the future. Go support Dr. Zelle!

So, I got my data out. Now, I see a few of interesting patterns. Here are four questions I asked:

(I use the functions suggested by https://trumpexcel.com/extract-numbers-from-string-excel to extract citation numbers from CSL JSON file in excel; yes, I’m a fan of excel. One day I’ll do better in R)

1. Is my manuscript keeping up with the progress of sciences?

Yes! My most citation counts come from the past 5 years. Not only I refer to more publications, but for each of them I also refer to multiple times as well!

2. What’s the most popular journal I cited information from?

I look at this from 3 angles: Number of articles, number of citations, and proportion of citations. I only show top 10 journals for each list below.

Looking by journal count…

Looking by citation count (i.e. often I cite primary articles for multiple aspects = multiple citations)…

In short, my current manuscript is based on findings from Science, Journal of Experimental Botany, Nature, Frontier in Plant Science, and Pant Biotechnology Journal, (and perhaps Trends in Plant Science, and Plant Physiology, too) a lot! These papers are my top 7 for both lists!

Overall, the top 10 journals from my citation counts account for ~50% of all the citations in my manuscript! Well, maybe I can say that my work is perhaps most relevant to these journals? lol I always feel one paper in Science or Nature is like 4-5 papers elsewhere combined. Each figure in these article can be a paper on its own… Anyway, that’s a story to be commented on some other times.

3. Did I overcite the work from our own research group?

Well, our own research group’s publications account for ~5.2% of journal articles, and ~4.8% of all citations. So, I think that’s a pretty reasonable number. Not too little, not too overcited, just relevant!

4. Then, how many research groups did I learn from?, and who might serve as an ‘examiner’ for our manuscript?

This one is tricky to answer, simply because how can I high-throughput identify research groups from meta citation data of hundreds of papers that I cited? I went back to this same data set and pull out the list of author using VBA Macro in excel (thanks google and several stack exchange threads). This approach allows me to see which authors come up most frequently. Keep in mind, this is tough when you have authors with the same names! (I did not look further into their affiliations)

My another BIG assumption is the last author is a senior author, and this is my proxy for a research group. I already know that this is not always true, especially when you collaborate with multiple groups. Sometimes there are multiple senior authors. But, this is the most practical mean to deal with so many entries at once.

From this analysis, I learned that my manuscript is based on 150+ research groups! And the top 10 groups represent ~25% of all the papers I cited (Which means 75% of papers are from the other 140ish groups). That sounds nice!!! — My most cited group was also not my own group lol. One collaborator top the top-10 list too!

Now, the other interesting question is also about the ‘first author’. Often these are early career researchers. But often when it comes to reviews or opinion articles, some senior scientist also become a first author. There are 14 scientists who authored more 2+ papers that I cited. The highest one is from my own group lol. Interestingly, these top 14 scientists represent 17% of the whole collection I cited, which makes sense. I expect this number to be lower than the % of the senior author. If I consider all groups that wrote at least 2 papers that I cited, then they represent ~40% of all the paper in my bibliography (i.e. one senior author would supervise more than one students or postdoc).

What if the researcher is a part of the project, but was neither first nor last author? I did check about that too! There are ~1500 individuals contributing to papers I cited.

Among these, when I count just a unique individual (i.e. unique name), there are ~1150 people!

Wow, what a team effort, right?!? (I also did check to see if the community write more papers with more contributors as the year goes by. The answer is not likely. But the largest project I saw on this list has 22 authors!).

Now, (drum rolls) the most collaborative researchers on my list have their names on 18, 11 and 9 papers respectively. Wow!

About ~20% of the researchers co-wrote at least 2 papers that I cited. The rest 80% of the list contributed to one paper each! I wonder how this numbers vary when you move to a larger sample size. Do we collaborate more, or not?

Well, circle back to my goal at the beginning of this task, I then looked at top names on these three lists of most cited by me (senior author, first author, all authors) to filter out collaborators and anyone with conflicts of interest, and come up with names to suggest as reviewers to the journal we submit this manuscript for publication.

Recently I also came across several tools that help you visualize networks between articles. I think they are super useful, though I did not get a chance to look into them much at all. Here are some examples: scite_, ResearchRabbit, Connectedpapers

I have a lot of other fun ideas for the future!

One thing to keep in mind is that I use google scholar as my based search engine. As those who have used google scholar might have seen, I think google is biased… even if you use incognito mode. The search is really skewed toward “popular” papers, and perhaps does not give us a comprehensive view of the whole body of literature. (You could argue then what could?!?). That might be something I look more into after I wrap up my current projects!

How about y’all? Do you have any ‘favorite’ authors you refer to A LOT when it comes to your own works? In that case, how do you think we can reduce biases in our litereature review approach?

Try this with your manuscript, and let’s see what you get!

category: thoughts
tag: science gradschool resources

Paul Kasemsap

พรพิพัฒน์ เกษมทรัพย์