Scrape Google Scholar


Google Scholar is a useful application. It refers every publications to its authors and allows to access easily the scientific output of every researcher. Two import key indicators are the number of citations and the H-Index. In this short python script you will see, how to extract/scrape these two parameters in Python.

hindex VS citations scrape Google Scholar

To scrape Google Scholar we first load important libraries for this task and define a function, which is able to scrape the H-Index from a Google Scholar profile as long as we feed the function with the link to this profile. If this is the case the function returns the H-index.

Use Scholarly to scrape Google Scholar

In the next step we use the Python module scholarly. Is has several feature. the most important is that it can search the Google Scholar database for names and return their number of citation or the direct link to the Google profile. Hence, we give this function a list of scientist in the field of nanopores and use it to get the number of citations and link to the Google Scholar profile. This link is then fed to the previously defined function to return the H-index.

We save the H-Index, number of citation and researcher name into one list and plot the two integer parameters in a plot.

The result is a plott with the number of citations on the X-axis and the H-Index on the Y-axis. From these we can deduce that with increasing number of citations the H-Index grows too. Publications analysing citations behavior in more detail can be found here.

hindex VS citations scrape Google Scholar

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.