Converting SMILES string and 2D ChemDraw structures into 3D molecular structures
Motivation
Being able to transform chemical representations from a SMILES representation or ChemDraw structures into 3D molecular structures is useful for generating new molecular systems and could open avenues for screening through many molecular structures. In this tutorial, we will go through generating SMILES string from a ChemDraw structure, then we will generated 3D structures from either SMILES or ChemDraw representations, as shown in the schematic below:
Software required for this tutorial
Bash terminal: Main language used to run code
On Mac: Command + space, type in "Terminal" and hit enter
On Windows 10:
Install gitbash: https://gitforwindows.org/
Alternatively, use the ubuntu add-on: https://www.omgubuntu.co.uk/2016/08/enable-bash-windows-10-anniversary-update
Openbabel Version 2.4.1 [Link]: Main workhorse for converting representations
On Mac:
Open terminal and run: brew install open-babel
On Linux:
Download bash installation file [Link]
Store in the server, then run: bash babel_install_code.sh
After running the code, make sure to add 'export PATH="${HOME}/local_install/openbabel-2.4.1/bin:$PATH"' into your ~/.bashrc
Restart terminal for .bashrc commands to take into effect
ChemDraw [Link]: Used for drawing 2D structures
Avogadro [Link]: Used for visualizing 3D structures
Files for this tutorial:
All input/output files are downloadable through GitLab [Link].
This folder contains the following:
images: Directory containing all images for this tutorial
output_files: Output files from this tutorial (e.g. converted .mol2 structures)
example_structure.cdx: ChemDraw example used in this tutorial
Part 1: Generating 2D chemical structures and SMILES strings
Step 1: Open a ChemDraw document and create a structure. In this example, I selected a ligand that I use in my research, shown in the right, which is experimentally accessible and is often used in the literature. You could access this structure in the "example_structure.cdx" file within the tutorial files.
Step 2a: To get the SMILES string of the structure in ChemDraw: Highlight the molecule, right click and go to Molecule > Copy As > SMILES, as shown in the right. The SMILES string for this structure is:
SCCCCCCCCCCCO
You could repeat this procedure for any chemical structure to generate SMILES strings.
Step 2b: In reverse, to get ChemDraw from SMILES:
Place the string as a text
Copy the text
Click Edit -> Paste Special -> SMILES
Part 2: Converting SMILES or ChemDraw structure to 3D geometries
2.0. ChemDraw to 3D geometry
Step 2.0.0: Open terminal with openbabel installed and go into the directory where your *.cdx is stored.
Step 2.0.1: Run obabel and convert into mol2 file:
obabel -icdx example_structure.cdx -omol2 -h > example_structure.mol2
"-icdx" means the input is a *.cdx file
"-omol2" means to output a *.mol2 file
"-h" means to add hydrogens if missing
"> example_structure.mol2" means to redirect output to .mol2 file.
Side note: I tried ".cdxml" as an input ChemDraw, but it does not correctly convert to .mol2 files.
Step 2.0.2: The .mol2 file is not energy minimized, so run the following to minimize structure
obminimize -sd -ff MMFF94s -c 1e-6 -n 50000 -o "mol2" example_structure.mol2 > example_structure_minimized.mol2
"-sd" means to energy minimize with steepest descent
"-ff MMFF94s" means to run the MMFF94s force field
"-c 1e-6" means to run minimization until tolerance of 1e-6 difference is reached, OR
"-n 50000" means 50,000 minimization steps have been performed
"> example_structure_minimized.mol2" means to redirect output to a mol2 file
Now, you should have successfully converted a ChemDraw structure into a 3D structure!
Alternative one step command (using gen3d):
obabel -icdx example_structure.cdx -omol2 -h --gen3d > example_structure_gen3d.mol2
"--gen3d" means to search for 3d conformers
The alternative method produces similar results to obminimize. For simplicity, you could use gen3d for a quick transformation. Feel free to use Avogadro to visualize the different structures and compare them.
2.1. SMILES to 3D geometry
Step 2.1.0: Open terminal with openbabel installed.
Step 2.1.1: Run the following command to generate the *.mol2 file
obabel -:"SCCCCCCCCCCCO" -omol2 -h --gen3d > smiles2mol_example_smiles_1.mol2
"-:" means the input is a SMILES string
"-h" means to add hydrogens if missing
"--gen3d" means to search for 3d conformers (see link: https://open-babel.readthedocs.io/en/latest/3DStructureGen/Overview.html)
Alternatively, if you have the SMILES in a file
echo "SCCCCCCCCCCCO" > smiles2mol_example_smiles_2.smi
obabel -ismi smiles2mol_example_smiles_2.smi -omol2 -h --gen3d > smiles2mol_example_smiles_2.mol2
"echo..." creates a file with the SMILES as the first line
"obabel..." converts the smiles string to a mol2 file with a "-ismi" flag to input the smiles file
Congrats! You should now be able to convert SMILES strings to 3D geometries!
Optional Step: Generating script to for-loop through many structures
Given that Openbabel could quickly change in-between molecules , one could imagine that you could input a large library of ChemDraw files (*.cdx) or SMILES strings. The Bash script below loops through all *.cdx files and converts them to 3D structures (*.mol2 files).
Code to convert *.cdx files to *.mol2 files in a for-loop fashion:
#!/bin/bash
# openbabel_convert_cdx_to_mol2.sh
# This function converts chemdraw structures to mol2 files.
# It will search for all possible *.cdx file, then run open babel to convert them to mol2
# It will also run energy minimization
#
# Written by: Alex K. Chew (04/16/2020)
# INPUT VARIABLES
# $1: input raw directory folder
## DEFINING MAIN DIRECTORY TO WORK ON
current_raw_dir="${1-${MDLIG_RAW_DIR}}"
#######################################
### DEFAULT VARIABLES FOR OPENBABEL ###
#######################################
## FORCE FIELD
em_forcefield="MMFF94s"
## TOLERENCE
tolerence="1e-6"
## DEFINING MAX NUMBER OF STEPS
max_steps="50000"
## FINDING ALL CDX FILES
read -a cdx_files <<< $(ls ${current_raw_dir}/*.cdx )
## CREATING TRASH
path_trash="${current_raw_dir}/openbabel_initial_structures"
## PRINTING
echo "---- CDX FILES ----"
for each_cdx in ${cdx_files[@]}; do
echo "=== ${each_cdx} ==="
## GETTING BASENAME
cdx_basename=$(basename ${each_cdx})
## GETTING THE FILE PREFIX
file_prefix="${cdx_basename%.cdx}"
## FINDING MOL2
mol2_file="${file_prefix}.mol2"
path_mol2="${current_raw_dir}/${mol2_file}"
## CHECKING IF EXISTING
if [ ! -e "${path_mol2}" ]; then
## CREATING TRASH DIRECTORY
if [ ! -e "${path_trash}" ]; then
mkdir -p "${path_trash}"
fi
echo "Performing mol2 conversion with openbabel"
## DEFINING TEMP FILE
temp_mol2_file="${file_prefix}_initial.mol2"
## CONVERTING CDX TO MOL2 AND ADD HYDROGENS
obabel -icdx "${each_cdx}" -O "${path_trash}/${temp_mol2_file}" -h
## PERFORMING ENERGY MINIMIZATION
obminimize -sd -ff "${em_forcefield}" \
-c "${tolerence}" \
-n "${max_steps}" \
-o "mol2" \
"${path_trash}/${temp_mol2_file}" > "${path_mol2}"
## PRINTING
echo "Openbabel conversion is complete!"
else
echo "${path_mol2} exists!"
fi
done
To run, simply copy the code into a script "openbabel_convert_cdx_to_mol2.sh" and run:
bash openbabel_convert_cdx_to_mol2.sh PATH_WITH_CDX_FILE
Replace "PATH_WITH_CDX_FILE" with the path to a folder with *.cdx files