Welcome to ZINC20-ML.docking.org.

Files are downloadable in chunks or in a single large file.
You do not need to download both the large file and the individual files. Description of the contents is below.
[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  
[   ]ZINC20-ML_smiles.tar.gz2021-10-07 19:52 8.9G 
[DIR]fingerprints/ 2021-10-10 19:45 -  
[TXT]fingerprints_count.txt 2021-10-07 13:04 3.0K 
[   ]fp1 2021-10-10 20:23 92G 
[   ]fp2 2021-10-10 20:17 18G 
[DIR]smiles/ 2021-10-10 10:09 -  
[TXT]smiles_count.txt 2021-10-07 13:04 2.9K 

This README.txt file was generated on 2021-10-06 by Francesco Gentile

--------------------
GENERAL INFORMATION
--------------------

1. Title of Dataset: ZINC20 library prepared for Deep Docking-accelerated virtual screening

2. Author Information
	A. Principal Investigator Contact Information
		Name: Artem Cherkasov
		Institution: Vancouver Prostate Centre, The University of British Columbia
		Email: acherkasovATprostatecentre.com

	B. Contact Information for Dataset-related Questions
		Name: Francesco Gentile
		Institution: Vancouver Prostate Centre, The University of British Columbia
		Email: gentile.francesco89ATgmail.com

--------------------------
ANCILLARY INFORMATION
---------------------------

1. Links to software packages: Deep Docking is freely available from https://github.com/jamesgleave/DD_protocol

2. Was data derived from another source? yes
	A. If yes, list source(s): the Deep Docking-prepared ZINC20 library was derived from ZINC20 database 
(Irwin, J. J. et al. ZINC20 - A Free Ultralarge-Scale Chemical Database for Ligand Discovery. J. Chem. Inf. Model. 60, 6065-6073 (2020)).


---------------------
DATA & FILE OVERVIEW
---------------------

1. File List

   A. Filename: README.txt      
      Short description: readme file       

   B. Filename: ZINC20_smiles.tar.gz    
      Short description: archive with 100 SMILES files (10 million molecules each) encompassing all the ZINC20 molecules available
 in early March 2021. Stereochemistry has been expanded where necessary. Names of molecules follow the format ZINCID_index, 
where index is related to a specific stereoisomer. Dominant tautomer and protonation state at pH 7.4 is already provided
   B1. smiles/ directory containing the above file in 20 easy-to-download chunks

   C. Filename: ZINC20_fingerprints.tar.gz    
      Short description: archive with 100 Morgan fingerprint (radius 2, 1024 bits) files (10 million molecules each) 
encompassing all the ZINC20 molecules. Format is appropriate for Deep Docking (comma-separated list of indexes of set bits)
   C1. fingerprints/  directory containing the above file in 20 easy-to-download chunks

   E. Filename: smiles_count.txt   
      Short description: file listing per-file and total number of SMILES

   F. Filename: fingerprints_count.txt   
      Short description: file listing per-file and total number of Morgan fingerprints

--------------------------
METHODOLOGICAL INFORMATION
---------------------------

1. Description of methods used for collection/generation of data: 
* SMILES were prepared using Openeye flipper (https://docs.eyesopen.com/applications/omega/flipper.html) and 
* QUACPAC (https://www.eyesopen.com/quacpac) tools. 
* Morgan fingerprints were then generated with rdkit (https://www.rdkit.org/docs/index.html) and 
translated into Deep Docking format, using the 
* compute_morgan_fp.sh script from https://github.com/jamesgleave/DD_protocol/tree/main/utilities


Last updated Oct 10, 2021 by jji