Programmatic Scraping (without using the Graphical User Interface)

Programmatic Scraping (without using the Graphical User Interface)#

Even though SurVigilance is a GUI application to download data from various safety databases, we have also kept the possibility for a user to interact in a programmatic manner to download the data. This vignette demonstrates how to access the different databases programmatically and download the required data.

WHO VigiAccess#

In this example, we would like to download data from VigiAccess for the drug “paracetamol”.

import os

from SurVigilance.ui.scrapers import scrape_vigiaccess_sb


def main():
    out_dir = "vigi_out"
    os.makedirs(out_dir, exist_ok=True)

    med = "paracetamol"

    df = scrape_vigiaccess_sb(medicine=med, output_dir=out_dir, headless=True)
    print(df.head())


if __name__ == "__main__":
    main()
Warning: uc_driver not found. Getting it now:

*** chromedriver to download = 145.0.7632.117 (Previous Version)

Downloading chromedriver-linux64.zip from:
https://storage.googleapis.com/chrome-for-testing-public/145.0.7632.117/linux64/chromedriver-linux64.zip ...
Download Complete!

Extracting ['chromedriver'] from chromedriver-linux64.zip ...
Unzip Complete!

The file [uc_driver] was saved to:
/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/seleniumbase/drivers/
uc_driver

Making [uc_driver 145.0.7632.117] executable ...
[uc_driver 145.0.7632.117] is now ready for use!

                 PT  Count
0  Thrombocytopenia   1065
1      Coagulopathy    776
2           Anaemia    539
3   Agranulocytosis    523
4       Neutropenia    444
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry

NL Lareb#

In this example, we would like to download data from Lareb for the drug “atorvastatin”.

import os

from SurVigilance.ui.scrapers import scrape_lareb_sb


def main():
    out_dir = "lareb_out"
    os.makedirs(out_dir, exist_ok=True)

    med = "atorvastatin"

    df = scrape_lareb_sb(medicine=med, output_dir=out_dir, headless=True)
    print(df.head())


if __name__ == "__main__":
    main()
                    PT  Count
0               Asthma      2
1     Throat tightness      3
2       Sinus disorder      1
3            Epistaxis     20
4  Pharyngeal swelling      2
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry

DK DMA#

In this example, we would like to download data from Danish Medicines Agency for the drug “paracetamol”.

import os

from SurVigilance.ui.scrapers import scrape_dma_sb


def main():
    out_dir = "dma_out"
    os.makedirs(out_dir, exist_ok=True)

    med = "paracetamol"

    df = scrape_dma_sb(medicine=med, output_dir=out_dir, headless=True)
    print(df.head())


if __name__ == "__main__":
    main()
                                PT  Count
0                          Anaemia      2
1  Normochromic normocytic anaemia      1
2              Nephrogenic anaemia      1
3                 Aplastic anaemia      2
4                     Pancytopenia      1
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry

NZ MEDSAFE#

In this example, we would like to download data from NZ Medsafe for the medicine “atorvastatin”.

import os

from SurVigilance.ui.scrapers import scrape_medsafe_sb


def main():
    out_dir = "nzsmars_out"
    os.makedirs(out_dir, exist_ok=True)

    term = "atorvastatin"
    search_type = "medicine"  # or "vaccine"

    df = scrape_medsafe_sb(
        searching_for=search_type,
        drug_vaccine=term,
        output_dir=out_dir,
        headless=True,
    )
    print(df.head())


if __name__ == "__main__":
    main()
                                    SOC                PT  Count
0  Blood and lymphatic system disorders           Anaemia      5
1  Blood and lymphatic system disorders     Lymphocytosis      1
2  Blood and lymphatic system disorders       Neutropenia      1
3  Blood and lymphatic system disorders      Neutrophilia      1
4  Blood and lymphatic system disorders  Thrombocytopenia      2
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry

AU DAEN#

In this example, we would like to fetch data from the TGA DAEN for the medicine “aspirin”.

import os

from SurVigilance.ui.scrapers import scrape_daen_sb


def main():
    out_dir = "daen_out"
    os.makedirs(out_dir, exist_ok=True)

    med = "aspirin"

    df = scrape_daen_sb(medicine=med, output_dir=out_dir, headless=True)
    print(f"Data collected: {len(df)} rows, {len(df.columns)} columns")


if __name__ == "__main__":
    main()
Data collected: 1086 rows, 5 columns
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry

USA FAERS#

For the FAERS databases, we can download the ZIP files. To see which data files are available for different quarters, we should first review the list of available files.

import os

from SurVigilance.ui.scrapers import scrape_faers_sb


def main():
    out_dir = "faers_out"
    os.makedirs(out_dir, exist_ok=True)

    df = scrape_faers_sb(output_dir=out_dir, headless=True)
    print(df.head())


if __name__ == "__main__":
    main()
   Year                  Quarter
0  2025  October - December 2025
1  2025    July - September 2025
2  2025        April - June 2025
3  2025     January - March 2025
4  2024  October - December 2024
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry

From the list of available files, let’s try to download the data for Q1 (Jan - Mar), 2025 using the code below.

import os

from SurVigilance.ui.scrapers import download_file


def main():
    # Please note the year and quarters in this url should be changed corresponding to data to be downloaded.
    # Also for data prior to Q4 2012, please use url : https://fis.fda.gov/content/Exports/aers_ascii_YYYYQQ.zip
    url = "https://fis.fda.gov/content/Exports/faers_ascii_2025q1.zip"
    out_dir = "faers_out"
    os.makedirs(out_dir, exist_ok=True)

    path = download_file(url=url, download_dir=out_dir)

    # Show size of downloaded file
    size_bytes = os.path.getsize(path)
    size_mb = size_bytes / (1024**2)
    print(f"Downloaded file size: {size_bytes} bytes ({size_mb:.2f} MB)")


if __name__ == "__main__":
    main()
Downloaded file size: 67465250 bytes (64.34 MB)

Please note that for downloading the VAERS data, the user needs to provide a CAPTCHA, hence it is not possible to download the data without opening a GUI. It is for this reason, we have not included the example for VAERS in the programmatic access section.