Buchwald-Hartwig Rare Ligand Explorer
This tool enables exploration of rarer ligands in search for recommendations.
Instructions
- It is highly recommended to order the upper table by Pareto Front so that you will see the ligands with highest ranking first
- The easiest way to reset everything is pressing F5
- The Yield slider to the left allows you to exclude ligands that have a median yield below that threshold. These ligands will not disappear from the list but instead will get a ranking coefficient of 999. This slider can be used to influence whether the table shows more risky recommendations (that might have higher yield but maybe not many reactions) or more sound recommendations (which typically have a smaller median yield but more datapoints).
- On the left you have a set of properties to select from. The algorithm will then take these properties and rank the ligands. Hold shift to select multiple properties
- The ranking coefficient is called "Pareto Front" and displayed in the upper table. The lower, the better
- The initial pareto coefficient is 999 and just means that this ligand is not part of the best ranked ligands (for details see the section below)
- The ligands shown initially after opening the html are randomly selected and not ranked yet. The ranking will only be done once you click on properties on the left
Examples
- If you want a recommendation for a nucleophile that has a lot of heteroatoms, the property to select on the left is Median -> numHeteroAtoms_nPhile
- If you want a recommendation for a sterically hindered electrohpile, select Median -> numOrtho-substituents_ePhile
- If you are designing a plate layout and want a ligand that performs well on electrophiles irrespective of their heteratom fraction you would rank by Standarddeviation -> numHeteroAtoms_ePhile. This ranks the ligands highest which had the highest variety in the heteroatom count of the ePhile. Another measure for diversity would be Median -> Tanimoto_Dist_ePhile.
Pareto Ranking Explanation
- Pareto ranking is a way to find optimal points regarding multiple objectives. It does not require the definition of a composite metric to be minimized
- Rather than that it will give you a set of optimal points. These points are optimal in the sense that all other points have at least one property that is worse
- With this definition we get what is called the pareto front = all points that are optimal in that sense
- If we iterate this we can get the second, third, ... pareto front etc. E.g. the second pareto front is the set of optimal points, but omitting the points that belonged to the first front
- This allows for ranking points as to what pareto front they belong to, the lower the better. This ranking is called "Pareto Front"
If you find these plots useful or have feedback we are happy to hear from you.
Author: Martin Fitzner
Contact: Martin Fitzner or Torsten Schindler, pREDi DataScience 1
Plot Version: Sept. 2020
Data Obtained: October 2019, sources are CAS, Reaxys and the USPTO
Many thanks to Jean-Michel Adam, Raffael Koller and Georg Wuitschik.