Programmatic Scraping (without using the Graphical User Interface)#
Even though SurVigilance is a GUI application to download data from various safety databases, we have also kept the possibility for a user to interact in a programmatic manner to download the data. This vignette demonstrates how to access the different databases programmatically and download the required data.
WHO VigiAccess#
In this example, we would like to download data from VigiAccess for the drug “paracetamol”.
import os
from SurVigilance.ui.scrapers import scrape_vigiaccess_sb
def main():
out_dir = "vigi_out"
os.makedirs(out_dir, exist_ok=True)
med = "paracetamol"
df = scrape_vigiaccess_sb(medicine=med, output_dir=out_dir, headless=True)
print(df.head())
if __name__ == "__main__":
main()
Warning: uc_driver not found. Getting it now:
*** chromedriver to download = 145.0.7632.117 (Previous Version)
Downloading chromedriver-linux64.zip from:
https://storage.googleapis.com/chrome-for-testing-public/145.0.7632.117/linux64/chromedriver-linux64.zip ...
Download Complete!
Extracting ['chromedriver'] from chromedriver-linux64.zip ...
Unzip Complete!
The file [uc_driver] was saved to:
/opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/seleniumbase/drivers/
uc_driver
Making [uc_driver 145.0.7632.117] executable ...
[uc_driver 145.0.7632.117] is now ready for use!
PT Count
0 Thrombocytopenia 1065
1 Coagulopathy 776
2 Anaemia 539
3 Agranulocytosis 523
4 Neutropenia 444
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry
NL Lareb#
In this example, we would like to download data from Lareb for the drug “atorvastatin”.
import os
from SurVigilance.ui.scrapers import scrape_lareb_sb
def main():
out_dir = "lareb_out"
os.makedirs(out_dir, exist_ok=True)
med = "atorvastatin"
df = scrape_lareb_sb(medicine=med, output_dir=out_dir, headless=True)
print(df.head())
if __name__ == "__main__":
main()
PT Count
0 Asthma 2
1 Throat tightness 3
2 Sinus disorder 1
3 Epistaxis 20
4 Pharyngeal swelling 2
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry
DK DMA#
In this example, we would like to download data from Danish Medicines Agency for the drug “paracetamol”.
import os
from SurVigilance.ui.scrapers import scrape_dma_sb
def main():
out_dir = "dma_out"
os.makedirs(out_dir, exist_ok=True)
med = "paracetamol"
df = scrape_dma_sb(medicine=med, output_dir=out_dir, headless=True)
print(df.head())
if __name__ == "__main__":
main()
PT Count
0 Anaemia 2
1 Normochromic normocytic anaemia 1
2 Nephrogenic anaemia 1
3 Aplastic anaemia 2
4 Pancytopenia 1
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry
NZ MEDSAFE#
In this example, we would like to download data from NZ Medsafe for the medicine “atorvastatin”.
import os
from SurVigilance.ui.scrapers import scrape_medsafe_sb
def main():
out_dir = "nzsmars_out"
os.makedirs(out_dir, exist_ok=True)
term = "atorvastatin"
search_type = "medicine" # or "vaccine"
df = scrape_medsafe_sb(
searching_for=search_type,
drug_vaccine=term,
output_dir=out_dir,
headless=True,
)
print(df.head())
if __name__ == "__main__":
main()
SOC PT Count
0 Blood and lymphatic system disorders Anaemia 5
1 Blood and lymphatic system disorders Lymphocytosis 1
2 Blood and lymphatic system disorders Neutropenia 1
3 Blood and lymphatic system disorders Neutrophilia 1
4 Blood and lymphatic system disorders Thrombocytopenia 2
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry
AU DAEN#
In this example, we would like to fetch data from the TGA DAEN for the medicine “aspirin”.
import os
from SurVigilance.ui.scrapers import scrape_daen_sb
def main():
out_dir = "daen_out"
os.makedirs(out_dir, exist_ok=True)
med = "aspirin"
df = scrape_daen_sb(medicine=med, output_dir=out_dir, headless=True)
print(f"Data collected: {len(df)} rows, {len(df.columns)} columns")
if __name__ == "__main__":
main()
Data collected: 1086 rows, 5 columns
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry
USA FAERS#
For the FAERS databases, we can download the ZIP files. To see which data files are available for different quarters, we should first review the list of available files.
import os
from SurVigilance.ui.scrapers import scrape_faers_sb
def main():
out_dir = "faers_out"
os.makedirs(out_dir, exist_ok=True)
df = scrape_faers_sb(output_dir=out_dir, headless=True)
print(df.head())
if __name__ == "__main__":
main()
Year Quarter
0 2025 October - December 2025
1 2025 July - September 2025
2 2025 April - June 2025
3 2025 January - March 2025
4 2024 October - December 2024
The chromedriver version (146.0.7680.153) detected in PATH at /opt/hostedtoolcache/setup-chrome/chromedriver/stable/x64/chromedriver might not be compatible with the detected chrome version (145.0.7632.159); currently, chromedriver 145.0.7632.117 is recommended for chrome 145.*, so it is advised to delete the driver in PATH and retry
From the list of available files, let’s try to download the data for Q1 (Jan - Mar), 2025 using the code below.
import os
from SurVigilance.ui.scrapers import download_file
def main():
# Please note the year and quarters in this url should be changed corresponding to data to be downloaded.
# Also for data prior to Q4 2012, please use url : https://fis.fda.gov/content/Exports/aers_ascii_YYYYQQ.zip
url = "https://fis.fda.gov/content/Exports/faers_ascii_2025q1.zip"
out_dir = "faers_out"
os.makedirs(out_dir, exist_ok=True)
path = download_file(url=url, download_dir=out_dir)
# Show size of downloaded file
size_bytes = os.path.getsize(path)
size_mb = size_bytes / (1024**2)
print(f"Downloaded file size: {size_bytes} bytes ({size_mb:.2f} MB)")
if __name__ == "__main__":
main()
Downloaded file size: 67465250 bytes (64.34 MB)
Please note that for downloading the VAERS data, the user needs to provide a CAPTCHA, hence it is not possible to download the data without opening a GUI. It is for this reason, we have not included the example for VAERS in the programmatic access section.