You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PubChemPy does not properly handling batch requests using lists for certain namespaces. For example this works: pcp.get_compounds(['5793', '2244'], 'cid')
These get_compound requests will silently fail and return an empty list.
That name, smiles, and inchi namespaces cannot be batched is in line with the documentation for PUG REST: <identifiers> = comma-separated list of positive integers (e.g. cid, sid, aid) or identifier strings (source, inchikey, formula); in some cases only a single identifier string (name, smiles, xref; inchi, sdf by POST only)
To get_compounds for multiple compounds if one of the above namespaces is used, the current solution is to use for-loop, which is slow, several seconds per compound. Threading can be used to make this process faster, but is non-trivial to implement in a way that won't get you rate limited.
I'm interested in a better way to handle batch requests for the namespaces in ['inchi', 'smiles', 'name'], or at least to notify users that batching for these namespaces isn't allowed. If implementing batching is within the scope of this library, I have a fork that uses the Power User Gateway XML schema to batch convert these namespaces to CIDs before retrieving the data from PUG REST. For resolving large numbers of compounds, it provides a considerable speedup (up to 25x in my testing) compared to a simple for-loop, and it uses fewer calls to PubChem. If there's interest I'd be happy to make changes as necessary to my fork before submitting a pull request here.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
PubChemPy does not properly handling batch requests using lists for certain namespaces. For example this works:
pcp.get_compounds(['5793', '2244'], 'cid')These does not:
These get_compound requests will silently fail and return an empty list.
That name, smiles, and inchi namespaces cannot be batched is in line with the documentation for PUG REST:
<identifiers> = comma-separated list of positive integers (e.g. cid, sid, aid) or identifier strings (source, inchikey, formula); in some cases only a single identifier string (name, smiles, xref; inchi, sdf by POST only)To get_compounds for multiple compounds if one of the above namespaces is used, the current solution is to use for-loop, which is slow, several seconds per compound. Threading can be used to make this process faster, but is non-trivial to implement in a way that won't get you rate limited.
I'm interested in a better way to handle batch requests for the namespaces in ['inchi', 'smiles', 'name'], or at least to notify users that batching for these namespaces isn't allowed. If implementing batching is within the scope of this library, I have a fork that uses the Power User Gateway XML schema to batch convert these namespaces to CIDs before retrieving the data from PUG REST. For resolving large numbers of compounds, it provides a considerable speedup (up to 25x in my testing) compared to a simple for-loop, and it uses fewer calls to PubChem. If there's interest I'd be happy to make changes as necessary to my fork before submitting a pull request here.
Beta Was this translation helpful? Give feedback.
All reactions