PubChemPy batching feature #148

csnbritt · 2025-12-04T22:40:54Z

csnbritt
Dec 4, 2025

PubChemPy does not properly handling batch requests using lists for certain namespaces. For example this works:
pcp.get_compounds(['5793', '2244'], 'cid')

These does not:

pcp.get_compounds(['glucose', 'aspirin'], 'name')
pcp.get_compounds(['InChI=1S/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2/t2-,3-,4+,5-,6?/m1/s1', 'InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)'], 'inchi')
pcp.get_compounds(['C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O', 'CC(=O)OC1=CC=CC=C1C(=O)O'], 'smiles')

These get_compound requests will silently fail and return an empty list.

That name, smiles, and inchi namespaces cannot be batched is in line with the documentation for PUG REST:
<identifiers> = comma-separated list of positive integers (e.g. cid, sid, aid) or identifier strings (source, inchikey, formula); in some cases only a single identifier string (name, smiles, xref; inchi, sdf by POST only)

To get_compounds for multiple compounds if one of the above namespaces is used, the current solution is to use for-loop, which is slow, several seconds per compound. Threading can be used to make this process faster, but is non-trivial to implement in a way that won't get you rate limited.

I'm interested in a better way to handle batch requests for the namespaces in ['inchi', 'smiles', 'name'], or at least to notify users that batching for these namespaces isn't allowed. If implementing batching is within the scope of this library, I have a fork that uses the Power User Gateway XML schema to batch convert these namespaces to CIDs before retrieving the data from PUG REST. For resolving large numbers of compounds, it provides a considerable speedup (up to 25x in my testing) compared to a simple for-loop, and it uses fewer calls to PubChem. If there's interest I'd be happy to make changes as necessary to my fork before submitting a pull request here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PubChemPy batching feature #148

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

PubChemPy batching feature #148

Uh oh!

csnbritt Dec 4, 2025

Replies: 0 comments

csnbritt
Dec 4, 2025