Thursday, August 4, 2016

A question: RDKit performance on Windows

Updated 6 August 2016 to fix an incomplete sentence.

This one is a request for advice/expertise on performance tuning/compiler flag tweaking on Windows. The short story is that when the RDKit is built using Visual Studio on Windows it ends up being substantially slower than when it's built with g++ and run using the Windows Subsystem for Linux. This doesn't seem like it should be true, but I'm not an expert with either Visual Studio or Windows, so I'm asking for help.

Some more details:

When I've used the RDKit on Windows machines it has always seemed slower than it should. I've never really quantified that and so I've always just kind of shrugged and moved on. Now I've measured a real difference and I'd like to try and do something about it.

Some experiments that I did with Docker on Windows convinced me that the effect was real, but with the advent of Bash on Windows 10 (https://msdn.microsoft.com/en-us/commandline/wsl/install_guide) - an awesome thing, by the way - I have some real numbers.

The RDKit includes some code that I've used over the years to track the performance of some basic tasks. This script - https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/timings.py -  looks at a broad subset of RDKit functionality.

The tests are:
  1. construct 1000 molecules from sdf
  2. construct 1000 molecules from smiles
  3. construct 823 fragment molecules from SMARTS (smiles really)
  4. 1000 x 100 HasSubstructMatch (100 from t3)
  5. 1000 x 100 GetSubstructMatches (100 from t3)
  6. construct 428 queries from RLewis_smarts.txt
  7. 1000 x 428 HasSubstructMatch
  8. 1000 x 428 GetSubstructMatches
  9. Generate canonical SMILES for 1000 molecules
  10. Generate mol blocks for 1000 molecules
  11. RECAP decomposition of the 1000 molecules
  12. Generate 2D coordinates for the 1000 molecules
  13. Generate 3D coordinates for the 1000 molecules
  14. Optimize those conformations using UFF
  15. Generate unique subgraphs of length 6 for the 1000 molecules
  16. Generate RDK fingerprints for the 1000 molecules
  17. Optimize the conformations above (test 13) using MMFF
Here are the results using the 2016.03 release conda builds (available from the rdkit channel in conda), the first line is the Windows build, the second is the Linux build, the tests were run directly after each other on the same laptop (a Dell XPS13 running Win10 Anniversary Edition):
0.8 || 0.4 || 0.1 || 1.2 || 1.3 || 0.0 || 4.2 || 4.2 || 0.2 || 0.3 || 7.4 || 0.3 || 7.5 || 18.5 || 2.2 || 1.4 || 41.7
0.6 || 0.3 || 0.1 || 0.9 || 1.0 || 0.0 || 3.1 || 3.2 || 0.1 || 0.2 || 6.3 || 0.3 || 6.2 || 15.2 || 2.1 || 1.0 || 29.8
that's a real difference.

The Windows build is done using build files generated by cmake. It's a release mode build with Visual Studio using the flags: "/MD /O2 /Ob2 /D NDEBUG" (those are the defaults that cmake creates).

It doesn't seem right to me that the code generated by Visual Studio and running under Windows should be so much slower than the code generated by g++ and running under the Windows Linux subsystem. I'm hoping, for the good of all of the users of the RDKit on Windows, to find a tweak for the Visual C++ command-line options that produces faster compild code.

For what it's worth, here's a different set of benchmarks, run on a larger set of molecules. The script is here (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/new_timings.py):

  1. construct 50K molecules from SMILES
  2. generate canonical SMILES for those
  3. construct 10K molecules from SDF
  4. construct 823 fragment molecules from SMARTS (smiles really)
  5. 60K x 100 HasSubstructMatch
  6. 60K x 100 GetSubstructMatches
  7. construct 428 queries from RLewis_smarts.txt
  8. 60K x 428 HasSubstructMatch
  9. 60K x 428 GetSubstructMatches
  10. Generate 60K mol blocks
  11. BRICS decomposition of the 60K molecules
  12. Generate 2D coords for the 60K molecules
  13. Generate RDKit fingerpirnts for the 60K molecules
  14. Generate Morgan (radius=2) fingerprints for the 60K molecules.

The timings show the same, at times dramatic, performance differences :
18.8 || 8.5 || 6.8 || 0.1 || 85.8 || 106.2 || 0.0 || 264.2 || 268.6 || 14.0 || 77.2 || 20.9 || 104.7 || 13.0
17.5 || 9.8 || 6.7 || 0.1 || 68.0 ||  74.2 || 0.0 || 204.6 || 208.2 ||  9.6 || 56.5 || 20.6 ||  89.0 ||  6.6

If you have thoughts about what's going on here, please comment here, reach out on twitter, google+, or linkedin, or post to the mailing list.
Thanks!