Converting SMILES string and 2D ChemDraw structures into 3D molecular structures

Motivation

Being able to transform chemical representations from a SMILES representation or ChemDraw structures into 3D molecular structures is useful for generating new molecular systems and could open avenues for screening through many molecular structures. In this tutorial, we will go through generating SMILES string from a ChemDraw structure, then we will generated 3D structures from either SMILES or ChemDraw representations, as shown in the schematic below:

Software required for this tutorial

  • Bash terminal: Main language used to run code

  • Openbabel Version 2.4.1 [Link]: Main workhorse for converting representations

    • On Mac:

      • Open terminal and run: brew install open-babel

    • On Linux:

      • Download bash installation file [Link]

      • Store in the server, then run: bash babel_install_code.sh

      • After running the code, make sure to add 'export PATH="${HOME}/local_install/openbabel-2.4.1/bin:$PATH"' into your ~/.bashrc

      • Restart terminal for .bashrc commands to take into effect

  • ChemDraw [Link]: Used for drawing 2D structures

  • Avogadro [Link]: Used for visualizing 3D structures


Files for this tutorial:

All input/output files are downloadable through GitLab [Link].

This folder contains the following:

  • images: Directory containing all images for this tutorial

  • output_files: Output files from this tutorial (e.g. converted .mol2 structures)

  • example_structure.cdx: ChemDraw example used in this tutorial

Part 1: Generating 2D chemical structures and SMILES strings

Step 1: Open a ChemDraw document and create a structure. In this example, I selected a ligand that I use in my research, shown in the right, which is experimentally accessible and is often used in the literature. You could access this structure in the "example_structure.cdx" file within the tutorial files.

Step 2a: To get the SMILES string of the structure in ChemDraw: Highlight the molecule, right click and go to Molecule > Copy As > SMILES, as shown in the right. The SMILES string for this structure is:

SCCCCCCCCCCCO

You could repeat this procedure for any chemical structure to generate SMILES strings.

Step 2b: In reverse, to get ChemDraw from SMILES:

  • Place the string as a text

  • Copy the text

  • Click Edit -> Paste Special -> SMILES

Part 2: Converting SMILES or ChemDraw structure to 3D geometries

2.0. ChemDraw to 3D geometry

Step 2.0.0: Open terminal with openbabel installed and go into the directory where your *.cdx is stored.

Step 2.0.1: Run obabel and convert into mol2 file:

obabel -icdx example_structure.cdx -omol2 -h > example_structure.mol2

  • "-icdx" means the input is a *.cdx file

  • "-omol2" means to output a *.mol2 file

  • "-h" means to add hydrogens if missing

  • "> example_structure.mol2" means to redirect output to .mol2 file.

  • Side note: I tried ".cdxml" as an input ChemDraw, but it does not correctly convert to .mol2 files.

Step 2.0.2: The .mol2 file is not energy minimized, so run the following to minimize structure

obminimize -sd -ff MMFF94s -c 1e-6 -n 50000 -o "mol2" example_structure.mol2 > example_structure_minimized.mol2

  • "-sd" means to energy minimize with steepest descent

  • "-ff MMFF94s" means to run the MMFF94s force field

  • "-c 1e-6" means to run minimization until tolerance of 1e-6 difference is reached, OR

  • "-n 50000" means 50,000 minimization steps have been performed

  • "> example_structure_minimized.mol2" means to redirect output to a mol2 file

Now, you should have successfully converted a ChemDraw structure into a 3D structure!

Alternative one step command (using gen3d):

obabel -icdx example_structure.cdx -omol2 -h --gen3d > example_structure_gen3d.mol2

  • "--gen3d" means to search for 3d conformers

The alternative method produces similar results to obminimize. For simplicity, you could use gen3d for a quick transformation. Feel free to use Avogadro to visualize the different structures and compare them.


2.1. SMILES to 3D geometry

Step 2.1.0: Open terminal with openbabel installed.

Step 2.1.1: Run the following command to generate the *.mol2 file

obabel -:"SCCCCCCCCCCCO" -omol2 -h --gen3d > smiles2mol_example_smiles_1.mol2

Alternatively, if you have the SMILES in a file

echo "SCCCCCCCCCCCO" > smiles2mol_example_smiles_2.smi

obabel -ismi smiles2mol_example_smiles_2.smi -omol2 -h --gen3d > smiles2mol_example_smiles_2.mol2

  • "echo..." creates a file with the SMILES as the first line

  • "obabel..." converts the smiles string to a mol2 file with a "-ismi" flag to input the smiles file

Congrats! You should now be able to convert SMILES strings to 3D geometries!

Optional Step: Generating script to for-loop through many structures

Given that Openbabel could quickly change in-between molecules , one could imagine that you could input a large library of ChemDraw files (*.cdx) or SMILES strings. The Bash script below loops through all *.cdx files and converts them to 3D structures (*.mol2 files).

Code to convert *.cdx files to *.mol2 files in a for-loop fashion:

#!/bin/bash


# openbabel_convert_cdx_to_mol2.sh

# This function converts chemdraw structures to mol2 files.

# It will search for all possible *.cdx file, then run open babel to convert them to mol2

# It will also run energy minimization

#

# Written by: Alex K. Chew (04/16/2020)


# INPUT VARIABLES

# $1: input raw directory folder


## DEFINING MAIN DIRECTORY TO WORK ON

current_raw_dir="${1-${MDLIG_RAW_DIR}}"


#######################################

### DEFAULT VARIABLES FOR OPENBABEL ###

#######################################


## FORCE FIELD

em_forcefield="MMFF94s"

## TOLERENCE

tolerence="1e-6"

## DEFINING MAX NUMBER OF STEPS

max_steps="50000"


## FINDING ALL CDX FILES

read -a cdx_files <<< $(ls ${current_raw_dir}/*.cdx )


## CREATING TRASH

path_trash="${current_raw_dir}/openbabel_initial_structures"


## PRINTING

echo "---- CDX FILES ----"

for each_cdx in ${cdx_files[@]}; do


echo "=== ${each_cdx} ==="


## GETTING BASENAME

cdx_basename=$(basename ${each_cdx})


## GETTING THE FILE PREFIX

file_prefix="${cdx_basename%.cdx}"


## FINDING MOL2

mol2_file="${file_prefix}.mol2"

path_mol2="${current_raw_dir}/${mol2_file}"


## CHECKING IF EXISTING

if [ ! -e "${path_mol2}" ]; then

## CREATING TRASH DIRECTORY

if [ ! -e "${path_trash}" ]; then

mkdir -p "${path_trash}"

fi


echo "Performing mol2 conversion with openbabel"


## DEFINING TEMP FILE

temp_mol2_file="${file_prefix}_initial.mol2"


## CONVERTING CDX TO MOL2 AND ADD HYDROGENS

obabel -icdx "${each_cdx}" -O "${path_trash}/${temp_mol2_file}" -h


## PERFORMING ENERGY MINIMIZATION

obminimize -sd -ff "${em_forcefield}" \

-c "${tolerence}" \

-n "${max_steps}" \

-o "mol2" \

"${path_trash}/${temp_mol2_file}" > "${path_mol2}"

## PRINTING

echo "Openbabel conversion is complete!"


else

echo "${path_mol2} exists!"

fi


done

To run, simply copy the code into a script "openbabel_convert_cdx_to_mol2.sh" and run:

bash openbabel_convert_cdx_to_mol2.sh PATH_WITH_CDX_FILE

Replace "PATH_WITH_CDX_FILE" with the path to a folder with *.cdx files

Last updated: 12/22/2020