Interruption due to COVID
mmpdb development has been on hold since February 2020. The extra child minding and general upheaval of life meant I have had to focus on chemfp.
mmpdb development is expected to resume in mid-February 2021, once chemfp 3.5 is released.
Why is mmpdb interesting?
Many people have written software to find matched molecular pairs. I know of one company with several different in-house versions, and I doubt they are unique.
Some of these companies are looking to switch to mmpdb for internal use because it has several features beyond what most other tools have.
- mmpdb can handle large data sets.
- mmpdb uses the fragment-and-indexing approach of Hussain and Rea. The fragmentation step is parallelized. When re-fragmenting an updated compound dataset, the previous fragmentation information can be reused as a cache, rather than re-computing all of the fragments.
- The fragmentation is fully canonical, giving in fully canonical transforms. (The original Hussain and Rea method could identify up to 6 different, equivalent transforms for 3-cut fragmentations.)
- Chiral structures are handled through “up-enumeration”, where all 3n chiral, inverted chiral, and achiral forms (up to uniqueness) are fragmented and indexed.
- The local chemical environment may affect a transform. The fragmentation step computes circular fingerprints around the attachment points, up to a radius of 5 bonds, which is used during indexing to identify transforms at different levels of environment specificity.
- If physical property information is available, then indexing will generate “property rules”. These contain overall statistics and a list of the associated pairs, for each environment radius.
- The indexing results are stored in a relational database; currently SQLite-only. This crowdfunding effort will add Postgres support.
mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets, which was made an ACS Editors’ Choice.
How do you explain “crowdfunding” to accounting?
Don’t. (Unless you really want to.) Instead, tell them that you are going to purchase a new version of mmpdb with the following features:- Postgres support, as an alternative to the existing SQLite support;
- A new ‘
mmpdb proprulecat
’ command to export the property rules in the database (transformation details and property statistics of the associated pairs) in CSV form, along with a fragment SMILES for the given environment.
In addition, your purchase includes membership in the crowdfunding consortium. As more people join, and additional funding goals met, I will continue to improve mmpdb, and you will get those improvements as part of your membership.
If EUR 23 000 is raised, I will contribute the new code upstream to the main mmpdb repository for anyone to use by 1 October 2020. If EUR 50 000 is raised, I will contribute the new code upstream “immediately.” The deadline for joining the consortium is 15 February 2020 – join now!
Why should you fund mmpdb?
If you want to use mmpdb in-house, then you probably also want someone to support it, and improve it. While you can do that yourself – it is open-source and available for free – it’s cheaper for you if you can share that cost with other people.
Of course, it’s even cheaper if other people pay for mmpdb development, and you get the result for free. This crowdfunding effort uses a delayed release model to incentivize you to pay for mmpdb now, rather than wait most of a year for the new features.
In the long term, people have asked for new features like a web interface to mmpdb, or support for categorial properties and multi-valued properties. The current mmpdb code base is not set up for this sort of growth. It needs some cleanup, and has only a basic test suite.
If you want those sorts of interesting long-term features, then you should fund this effort now to set the groundwork for future growth and show that crowdfunding can be the way to get there.
How do I join?
Send me an email saying that you are interested. Once I get a purchase order, you are a member.
Here are the suggested consortium membership rates:
- Academics - EUR 1 000 (no warranty and only limited support)
- Industry - EUR 5 000 (includes 9 months of support)
Feel free to pay more if you want! We can also discuss the price to add other specific features you may want me to develop, as well as possible features to add in future crowdfunding efforts.
If invoicing doesn’t work for you, I can arrange a PayPal transaction, with a 5% overhead.