[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [h5md-user] [EXTERNAL] Re: topology
Hart, David Blaine
Re: [h5md-user] [EXTERNAL] Re: topology
Fri, 2 May 2014 18:43:10 +0000
Hi Pierre and Konrad,
> > Before discussing the technicalities, please define the scope of what
> > you call "topology". Which kinds of molecular models do you wish to
> > cover? Which categories of systems do you want to handle? And what are
> > the use cases for the information you plan to store?
> The scope is not defined yet. I would like to discuss the needs and
> experiences before we can decide something. There might be no universal
> solution, a case in which there could be different topology modules.
> I described my use case: store list of indices that represent bonded
> interaction in coarse-grained simulations. I don't expect my use case to be
> generic :-)
The group I work with does a fair amount of analysis on bond lengths, angle
distributions and dihedral angle distributions as validation for force field
development. So our main use case for a 'topology' module would be to store the
pairs, 3- and 4-tuples of atoms that define the bonds, angles and dihedrals.
That said, I have use cases for wanting to store lists of non-bonded pairs of
atoms such as opposing carbons on a ring or designated 'endpoint' atoms that
can be used to represent the overall orientation of a larger molecule, and in
this use case a specific 'bond' list would not really be appropriate.
> > > Depending on the direction of the discussion this could become either a
> > module
> > > or a part of the specification itself.
> > Unless we can come up with something good enough for all kinds of
> > particle-based simulations (which I doubt), it's better to make it a
> > module in order to allow for alternatives. One of the nice aspects of H5MD
> is its generality.
> No problem.
> > > 2. Within the groups, H5MD elements store bonds as
> [N_bonds][bond_order] data.
> > > For pairs, bond_order=2, for instance. This allows to store angles
> > and
> > Please note that the term "bond order" is already used in chemistry
> > for something different: the number of electrons implied in a covalent
> > bond. A chemist would take bond_order=2 for a double bond.
> Noted. I have no specific name in mind for this, though.
> > I have spent a lot of time thinking about these issues for MOSAIC, and
> > a part of the background behind the decisions that lead to MOSAIC 1.0
> > is described in the paper (free download at
> > http://pubs.acs.org/articlesonrequest/AOR-dADBta6jVTVtVb6bbGmJ, but
> > you need to create an ACS account for that). Note that the scope of
> > MOSAIC is different from H5MD, so the considerations to apply are not
> > the same, but there are many common points nevertheless.
> > One of the most important lessons from MOSAIC design, which I think
> > carries over to H5MD, is the need for both generic data structures and
> > precisely defined data items. For example of chemical bonds, that
> > would mean a generic data structure for storing pairs (or even
> > N-tuples) of particle indices, with some way of attaching semantic
> > information such as a text label. A bond list would then be stored as
> > a list of pairs with the "bonds" label.
> > If you provide only the generic data structure for pairs, then
> > everyone will come up with a different label for bonds, creating chaos
> > without any real gain in flexibility. If you provide only a bond list
> > but not a generic pair list, people will abuse the bond list for other
> > pair-related applications. The history of the PDB format provides lots
> > of examples of such abuses due to a lack of flexibility.
I am totally guilty of abusing the PDB (and CAR and MDF) formats to sneak in
extra information. :-)
> > H5MD actually applies this principle very well until now, so let's
> > keep that spirit for defining additional data items.
> Keeping all of that in mind, it would be beneficial to have an agreed-upon
> "storage scheme" (such as "[N_bonds][bond_order]" from my message) for
> connectivity-related modules, to avoid duplicating the work.
I would definitely agree that a standard storage scheme, would be useful, even
if the structure of a 'topology' section varies by module. That way, even if
the location of the lists varies, at least the same routines can be used to
read/write the data when it is located. As a possible example:
+-- type: String
+-- dimension: Integer
\-- values: Integer[n-tuples][D]
Where a pair list would be dimension(D) = 2, and the values in list would be
the particle IDs. Given what Konrad pointed out about flexibility vs.
specificity ,the "type" string could have specific list of acceptable values,
much like the boundary attribute in the 'box' item. For my particular use
cases, I can see the following types of atom-tuple lists being useful as
topology/connectivity information: bonds, angles, dihedrals, chains, and
- Re: [h5md-user] [EXTERNAL] Re: topology,
Hart, David Blaine <=