Welcome to ZINC20-ML.docking.org.
Files are downloadable in chunks or in a single large file.
You do not need to download both the large file and the individual files.
Description of the contents is below.
This README.txt file was generated on 2021-10-06 by Francesco Gentile
--------------------
GENERAL INFORMATION
--------------------
1. Title of Dataset: ZINC20 library prepared for Deep Docking-accelerated virtual screening
2. Author Information
A. Principal Investigator Contact Information
Name: Artem Cherkasov
Institution: Vancouver Prostate Centre, The University of British Columbia
Email: acherkasovATprostatecentre.com
B. Contact Information for Dataset-related Questions
Name: Francesco Gentile
Institution: Vancouver Prostate Centre, The University of British Columbia
Email: gentile.francesco89ATgmail.com
--------------------------
ANCILLARY INFORMATION
---------------------------
1. Links to software packages: Deep Docking is freely available from https://github.com/jamesgleave/DD_protocol
2. Was data derived from another source? yes
A. If yes, list source(s): the Deep Docking-prepared ZINC20 library was derived from ZINC20 database
(Irwin, J. J. et al. ZINC20 - A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 60, 6065-6073 (2020)).
---------------------
DATA & FILE OVERVIEW
---------------------
1. File List
A. Filename: README.txt
Short description: readme file
B. Filename: ZINC20_smiles.tar.gz
Short description: archive with 100 SMILES files (10 million molecules each) encompassing all the ZINC20 molecules available
in early March 2021. Stereochemistry has been expanded where necessary. Names of molecules follow the format ZINCID_index,
where index is related to a specific stereoisomer. Dominant tautomer and protonation state at pH 7.4 is already provided
B1. smiles/ directory containing the above file in 20 easy-to-download chunks
C. Filename: ZINC20_fingerprints.tar.gz
Short description: archive with 100 Morgan fingerprint (radius 2, 1024 bits) files (10 million molecules each)
encompassing all the ZINC20 molecules. Format is appropriate for Deep Docking (comma-separated list of indexes of set bits)
C1. fingerprints/ directory containing the above file in 20 easy-to-download chunks
E. Filename: smiles_count.txt
Short description: file listing per-file and total number of SMILES
F. Filename: fingerprints_count.txt
Short description: file listing per-file and total number of Morgan fingerprints
--------------------------
METHODOLOGICAL INFORMATION
---------------------------
1. Description of methods used for collection/generation of data:
* SMILES were prepared using Openeye flipper (https://docs.eyesopen.com/applications/omega/flipper.html) and
* QUACPAC (https://www.eyesopen.com/quacpac) tools.
* Morgan fingerprints were then generated with rdkit (https://www.rdkit.org/docs/index.html) and
translated into Deep Docking format, using the
* compute_morgan_fp.sh script from https://github.com/jamesgleave/DD_protocol/tree/main/utilities
Last updated Oct 10, 2021 by jji