Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify types #23

Merged
merged 22 commits into from
Nov 17, 2021
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
c7e8097
first, just get rid of AbstractElement
rkurchin Nov 10, 2021
a615a79
oops, deleted export of AbstractParticle and AbstractAtom in first co…
rkurchin Nov 10, 2021
fede982
remove AbstractParticle, keep SimpleAtom for now
rkurchin Nov 10, 2021
7a32c77
and now AbstractSystem is gone too
rkurchin Nov 10, 2021
56eea33
Revert "and now AbstractSystem is gone too"
rkurchin Nov 12, 2021
ce9ac5b
remove export of ChemicalElement
rkurchin Nov 12, 2021
87d9c23
remove duplicated iterate dispatch
rkurchin Nov 12, 2021
bfcff94
rename SimpleAtom to StaticAtom, delete phantom implementation_simple…
rkurchin Nov 12, 2021
47ae467
rename element function to species, add "singular" species function, …
rkurchin Nov 12, 2021
95c1997
fix iterate
rkurchin Nov 12, 2021
b9bc083
readme updates, a couple other small fixes
rkurchin Nov 12, 2021
8e17ad8
printing tweaks
rkurchin Nov 12, 2021
4aca49f
Michael's suggested change for typing on box
rkurchin Nov 12, 2021
9f9743f
move atomic stuff to another file, fix some broken functions
rkurchin Nov 12, 2021
13db820
rename get_periodic, add tests
rkurchin Nov 12, 2021
89a68ba
add docstrings
rkurchin Nov 12, 2021
9f6d7bf
formatting
rkurchin Nov 12, 2021
9a01ba9
few tweaks per G's suggestions
rkurchin Nov 16, 2021
54a0a8d
mostly aesthetic changes per Michael's suggestions
rkurchin Nov 17, 2021
456a4fe
add indexed versions of position, velocity, species fcns
rkurchin Nov 17, 2021
0cb1b23
add tests, fix typo in SoA implementation
rkurchin Nov 17, 2021
f0896f8
remove getindex and length error throws, leaving boundary_conditions …
rkurchin Nov 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 31 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,46 +12,49 @@ AtomsBase is an abstract interface for representation of atomic geometries in Ju
* file I/O with standard formats (.cif, .xyz, ...)
* numerical tools: sampling, integration schemes, etc.
* automatic differentiation and machine learning systems
* visualization (e.g. plot recipes)

Currently, the design philosophy is to be as lightweight as possible, with only a small set of required function dispatches. We will also provide a couple of standard concrete implementations of the interface that we envision could be broadly applicable, but encourage developers to provide their own implementations as needed in new or existing packages.
Currently, the design philosophy is to be as lightweight as possible, with only a small set of required function dispatches to make adopting the interface into existing packages easy. We also provide a couple of standard concrete implementations of the interface that we envision could be broadly applicable, but encourage developers to provide their own implementations as needed in new or existing packages.

## Overview
AtomsBase defines a few abstract types used for specifying an atomic system. We will describe them briefly here, from the "top down." Users and/or prospective developers may also find `implementation_simple.jl` a useful reference for a simple concrete implementation of the interface.
The main abstract type introduced in AtomsBase `AbstractSystem{D,S}`. The `D` parameter indicates the number of spatial dimensions in the system, and `S` indicates the type identifying an individual species in this system.

**A remark on SoA vs. AoS:** The "struct-of-arrays" (SoA) vs. "array-of-structs" (AoS) is a common design dilemma in representations of systems such as these. We have deliberately designed this interface to be _agnostic_ to how a concrete implementation chooses to structure its data. Some specific notes regarding how implementations might differ for these two paradigms are included below.
The main power of the interface comes from predictable behavior of several core functions to access information about a system. Various categories of such functions are described below.

A way to think about this broadly is that the difference amounts to the ordering of function calls. For example, to get the position of a single particle in an AoS implementation, the explicit funciton chaining would be `position(getindex(sys))` (i.e. extract the single struct representing the particle of interest and query its position), while for SoA, one should prefer `getindex(position(sys))` (extract the array of positions, then index into it for a single particle). The beauty of an abstract interface in Julia is that these details can be, in large part, abstracted away through method dispatch such that the end user sees the same expected behavior irrespective of how things are implemented "under the hood."

### System
An object describing a system should be a subtype of `AbstractSystem` and will in general store identifiers, positions, and (if relevant) velocities of the particles that make it up, as well as a set of coordinate bounds.

An `AbstractSystem` takes several type parameters: `D` (an integer representing the dimensionality of the system), `ET<:AbstractElement`, and `AT<:AbstractParticle` (see below) to indicate what types of particles these are, and requires dispatch of the following functions:
* `bounding_box(::AbstractSystem{D})::SVector{D, SVector{D, <:Unitful.Length}}`: should return a set of basis vectors describing the boundaries of the coordinates in which the system resides
* `boundary_conditions(::AbstractSystem{D})::SVector{D, BoundaryCondition}`: returns the boundary conditions corresponding to each spatial dimension of the system (see below for more on the `BoundaryCondition` type)
### System geometry
Functions that need to be dispatched:
* `bounding_box(::AbstractSystem{D})::SVector{D,SVector{D,<:Unitful.Length}}`: returns `D` vectors of length `D` that describe the "box" in which the system lives
* `boundary_conditions(::AbstractSystem{D})::SVector{D,BoundaryCondition})`: returns a vector of length `D` of `BoundaryCondition` objects to describe what happens at the edges of the box

`AbstractSystem` implements [Julia's indexing interface](https://docs.julialang.org/en/v1/manual/interfaces/#Indexing), thus the following functions must also be dispatched:
* `Base.getindex(::AbstractSystem, ::Int)`
* `Base.size(::AbstractSystem)`
Functions that will work automatically:
* `get_periodic`: returns a vector of length `D` of `Bool`s for whether each dimension of the system has periodic boundary conditions
* `n_dimensions`: returns `D`, the number of spatial dimensions of the system

By default, the `position` and `velocity` functions dispatch onto `AbstractSystem` objects as a broadcast over the system (e.g. `position(sys::AbstractSystem) = position.(sys)`, invoking the dispatch of `position` onto the `AbstractParticle` type parameter of the system). In an SoA implementation, custom dispatches should probably be included to avoid the construction of the particle objects.
### Iteration and Indexing over systems
There is a presumption of the ability to somehow extract an individual component (e.g. a single atom or molecule) of this system, though there are no constraints on the type of this component. To achieve this, an `AbstractSystem` object is expected to implement the Julia interfaces for [iteration](https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-iteration) and [indexing](https://docs.julialang.org/en/v1/manual/interfaces/#Indexing) in order to access representations of individual components of the system. Some default dispatches of parts of these interfaces are already included, so the minimal set of functions to dispatch in a concrete implementation is `Base.getindex` and `Base.length`, though it may be desirable to customize additional behavior depending on context.

**A remark on fractional coordinates:** In many contexts, it is desirable to work with fractional coordinates, i.e. unitless multiples of some reference coordinate system (typically this reference is what would be returned by the `box` function above). Because a focus of this interface is interoperability and unitless, non-Cartesian coordinates introduce ambiguity, we've chosen to impose that coordinates returned by functions in the interface be unitful quantities in a Cartesian frame. Of course, a concrete implementation of the interface could (and likely would need to) also dispatch additional functions for working with fractional coordinates.
### System state and properties
The only required properties to be specified of the system is the species of each component of the system and the positions and velocities associated with each component. These are accessed through the functions `species`, `position`, and `velocity`, respectively. The default dispatch of these functions onto an `AbstractSystem` object is as a broadcast over it, which will "just work" provided the indexing/iteration interfaces have been implemented (see above) and the functions are defined on individual system components.
rkurchin marked this conversation as resolved.
Show resolved Hide resolved

### Particles
Particle objects are subtypes of `AbstractParticle`, and also take a type parameter that is `<:AbstractElement` (see below) to indicate how particles are identified.
As a concrete example, AtomsBase provides the `StaticAtom` type as this is anticipated to be a commonly needed representation. Its implementation looks as follows:
```julia
struct StaticAtom{D,L<:Unitful.Length}
position::SVector{D,L}
element::Element
end
StaticAtom(position, element) = StaticAtom{length(position)}(position, element)
position(atom::StaticAtom) = atom.position
species(atom::StaticAtom) = atom.element
```
Note that the default behavior of `velocity` is to return `missing`, so it doesn't need to be explicitly dispatched here.

In the SoA case, particle objects would only ever be explicitly constructed when `getindex` is invoked on the system, as a "view" into the system.
The two sample implementations provided in this repo are both "composed" of `StaticAtom` objects; refer to them as well as `sandbox/aos_vs_soa.jl` to see how this can work in practice.
### Struct-of-Arrays vs. Array-of-Structs
The "struct-of-arrays" (SoA) vs. "array-of-structs" (AoS) is a common design dilemma in representations of systems such as these. We have deliberately designed this interface to be _agnostic_ to how a concrete implementation chooses to structure its data. Some specific notes regarding how implementations might differ for these two paradigms are included below.

Particle objects should dispatch methods of the following functions:
* `position(::AbstractParticle)::AbstractVector{<: Unitful.Length}`
* `element(::AbstractParticle)::AbstractElement`

And, optionally,
* `velocity(::AbstractParticle)::AbstractVector{<: Unitful.Velocity}`, which defaults to returning `nothing` if not dispatched.
### Elements
Subtypes of `AbstractElement` serve as identifiers of particles. As the name suggests, a common case would be a chemical element (e.g. for a DFT simulation). However, it could also contain more detailed isotopic/spin information if that is necessary, or be a molecule (e.g. in MD), or even a much larger-scale object!
A way to think about this broadly is that the difference amounts to the ordering of function calls. For example, to get the position of a single particle in an AoS implementation, the explicit funciton chaining would be `position(getindex(sys))` (i.e. extract the single struct representing the particle of interest and query its position), while for SoA, one should prefer `getindex(position(sys))` (extract the array of positions, then index into it for a single particle). The beauty of an abstract interface in Julia is that these details can be, in large part, abstracted away through method dispatch such that the end user sees the same expected behavior irrespective of how things are implemented "under the hood."

For simulation purposes, the utility of this object would be to serve as a sufficiently specific "index" into a database of simulation parameters (e.g. a pseudopotential library for DFT, or interparticle potential parameters for MD). Because we envision a chemical element being an extremely common case, we provide an explicit subtype in the form of `Element`, which makes use of [PeriodicTable.jl](https://github.com/JuliaPhysics/PeriodicTable.jl) to access information such as atomic numbers and masses.
To demonstrate this, we provide two simple concrete implementations of the interface in `implementation_soa.jl` and `implementation_aos.jl` to show how analogous systems could be constructed within these paradigms. See also `sandbox/aos_vs_soa.jl` for how they can actually be constructed and queried.

### Boundary Conditions
Finally, we include support for defining boundary conditions. Currently included are `Periodic` and `DirichletZero`. There should be one boundary condition specified for each spatial dimension represented.
9 changes: 5 additions & 4 deletions sandbox/aos_vs_soa.jl
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
using AtomsBase
using StaticArrays
using Unitful
using PeriodicTable

box = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]u"m"
bcs = [Periodic(), Periodic(), Periodic()]
positions = [0.25 0.25 0.25; 0.75 0.75 0.75]u"m"
elements = [ChemicalElement(:C), ChemicalElement(:C)]
elems = [elements[:C], elements[:C]]
rkurchin marked this conversation as resolved.
Show resolved Hide resolved

atom1 = SimpleAtom(SVector{3}(positions[1,:]),elements[1])
atom2 = SimpleAtom(SVector{3}(positions[2,:]),elements[2])
atom1 = StaticAtom(SVector{3}(positions[1,:]),elems[1])
atom2 = StaticAtom(SVector{3}(positions[2,:]),elems[2])

aos = FlexibleSystem(box, bcs, [atom1, atom2])
soa = FastSystem(box, bcs, positions, elements)
soa = FastSystem(box, bcs, positions, elems)

# And now we can ask questions like...
soa .== aos
3 changes: 1 addition & 2 deletions src/AtomsBase.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
module AtomsBase

include("interface.jl")
include("atoms.jl")
include("implementation_soa.jl")
include("implementation_aos.jl")

# Write your package code here.

end
38 changes: 38 additions & 0 deletions src/atoms.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#
# Extra stuff only for Systems composed of atoms
#

export StaticAtom, AbstractAtomicSystem
export atomic_mass,
atomic_number,
atomic_symbol
rkurchin marked this conversation as resolved.
Show resolved Hide resolved
export atomic_property, has_atomic_property, atomic_propertynames

struct StaticAtom{D,L<:Unitful.Length}
position::SVector{D,L}
element::Element
end
StaticAtom(position, element) = StaticAtom{length(position)}(position, element)
position(atom::StaticAtom) = atom.position
species(atom::StaticAtom) = atom.element

function StaticAtom(position, symbol::Union{Integer,AbstractString,Symbol,AbstractVector})
StaticAtom(position, elements[symbol])
end

function Base.show(io::IO, a::StaticAtom)
print(io, "StaticAtom: $(a.element.symbol)")
end

const AbstractAtomicSystem{D} = AbstractSystem{D,Element}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time to look for the definition of AbstractSystem, so I'll add this comment here: I can sort of see why one might want to make the dimension a type parameter, though I'm not entirely sure where this is needed?

But does the element type really have to be a type parameter in the abstract type? What is the use case?


atomic_symbol(a::StaticAtom) = a.element.symbol
atomic_mass(a::StaticAtom) = a.element.atomic_mass
atomic_number(a::StaticAtom) = a.element.number
atomic_property(a::StaticAtom, property::Symbol) = getproperty(a.element, property)

atomic_symbol(sys::AbstractAtomicSystem) = atomic_symbol.(sys)
atomic_number(sys::AbstractAtomicSystem) = atomic_number.(sys)
atomic_mass(sys::AbstractAtomicSystem) = atomic_mass.(sys)
atomic_property(sys::AbstractAtomicSystem, property::Symbol)::Vector{Any} =
atomic_property.(sys, property)
32 changes: 24 additions & 8 deletions src/implementation_aos.jl
Original file line number Diff line number Diff line change
@@ -1,32 +1,48 @@
# Example implementation using as few function definitions as possible
#
#=
Implementation of AtomsBase interface in an array-of-structs
=#
rkurchin marked this conversation as resolved.
Show resolved Hide resolved

using StaticArrays

export FlexibleSystem

# TODO Switch order of type arguments?
struct FlexibleSystem{D,ET<:AbstractElement,AT<:AbstractParticle{ET}} <: AbstractSystem{D,ET,AT}
box::SVector{D,<:SVector{D,<:Unitful.Length}}
struct FlexibleSystem{D,S,L<:Unitful.Length,AT} <: AbstractSystem{D,S}
box::SVector{D,<:SVector{D,L}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"morally" it seems to me that box and boundary_conditions should be tuples of ... rather than SVectors. But I don't have too strong a view on that.

boundary_conditions::SVector{D,<:BoundaryCondition}
particles::Vector{AT}
FlexibleSystem(box, boundary_conditions, particles) = new{
length(boundary_conditions),
eltype(elements),
eltype(eltype(box)),
eltype(particles),
}(
box,
boundary_conditions,
particles,
)
end
# convenience constructor where we don't have to preconstruct all the static stuff...
function FlexibleSystem(
box::Vector{Vector{L}},
box::Vector{<:AbstractVector{L}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why type this? the box could be provided as a Vector, AbstractVector, Tuple, .... same with bc and particles below. This is a convenience constructor no? I would then just make it convenient to construct the thing and enforce such strict typing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would probably be better. I was having trouble figuring out how to write it in a general enough way...will file an issue for this too.

Copy link
Member

@cortner cortner Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I did it in JuLIP was to have a conversion interface e.g.

_convert_positions(X::Vector) = X 
_convert_positions(X::AbstractVector) = collect(X) 
_convert_positions(X::AbstractMatrix) = # convert into vector of SVectors 

_convert_bc(bc::NTuple{3, Bool}) = bc 
_convert_bc(bc::AbstractVector) = tuple(Bool.(bc)...)

and so forth. So this means that ANY sensible input will be automatically converted into the format needed internally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will repost in the issue.

boundary_conditions::Vector{BC},
particles::Vector{AT},
) where {BC<:BoundaryCondition,L<:Unitful.Length,AT<:AbstractParticle}
) where {BC<:BoundaryCondition,L<:Unitful.Length,AT}
D = length(box)
if !all(length.(box) .== D)
throw(ArgumentError("box must have D vectors of length D"))
end
FlexibleSystem(SVector{D,SVector{D,L}}(box), SVector{D,BoundaryCondition}(boundary_conditions), particles)
FlexibleSystem(
SVector{D,SVector{D,L}}(box),
SVector{D,BoundaryCondition}(boundary_conditions),
particles,
)
end

bounding_box(sys::FlexibleSystem) = sys.box
boundary_conditions(sys::FlexibleSystem) = sys.boundary_conditions

function Base.show(io::IO, ::MIME"text/plain", sys::FlexibleSystem)
function Base.show(io::IO, sys::FlexibleSystem)
print(io, "FlexibleSystem with ", length(sys), " particles")
end

Expand Down
40 changes: 0 additions & 40 deletions src/implementation_simple.jl

This file was deleted.

32 changes: 18 additions & 14 deletions src/implementation_soa.jl
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
# Example implementation using as few function definitions as possible
#
#=
Implementation of AtomsBase interface in an struct-of-arrays style.
=#
rkurchin marked this conversation as resolved.
Show resolved Hide resolved

using StaticArrays

export FastSystem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the FlexibleSystem and FastSystem terms a bit odd. Isn't it a bit context specific when they are fast / slow / convenient / inconvenient? Why not simply call them SystemSOA and SystemAOS then it is clear what they are. There could be a convenience constructor that can construct either, depending on a flag, with default probable being the AOS but I'm not sure about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 hahaha that's what they were originally called and I changed them at @mfherbst's suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah - sorry, that's what I get for not paying attention. Just don't have the time I'm afraid. I can't even quite put my finger on why I don't like those terms. I guess I prefer technically precise terms? Especially with an interface function a user will never have to worry about it until they know what they are doing and then they'll just go and read the docs?


struct FastSystem{D,ET<:AbstractElement,AT<:AbstractParticle{ET},L<:Unitful.Length} <:
AbstractSystem{D,ET,AT}
struct FastSystem{D,S,L<:Unitful.Length} <: AbstractSystem{D,S}
box::SVector{D,SVector{D,L}}
boundary_conditions::SVector{D,BoundaryCondition}
positions::Vector{SVector{D,L}}
elements::Vector{ET}
elements::Vector{S}
# janky inner constructor that we need for some reason
FastSystem(box, boundary_conditions, positions, elements) =
new{length(boundary_conditions),eltype(elements),SimpleAtom,eltype(eltype(positions))}(
new{length(boundary_conditions),eltype(elements),eltype(eltype(positions))}(
box,
boundary_conditions,
positions,
Expand All @@ -22,11 +23,11 @@ end

# convenience constructor where we don't have to preconstruct all the static stuff...
function FastSystem(
box::AbstractVector{Vector{L}},
box::Vector{<:AbstractVector{L}},
boundary_conditions::AbstractVector{BC},
positions::AbstractMatrix{M},
elements::AbstractVector{ET},
) where {L<:Unitful.Length,BC<:BoundaryCondition,M<:Unitful.Length,ET<:AbstractElement}
elements::AbstractVector{S},
) where {L<:Unitful.Length,BC<:BoundaryCondition,M<:Unitful.Length,S}
N = length(elements)
D = length(box)
if !all(length.(box) .== D)
Expand All @@ -46,16 +47,19 @@ function FastSystem(
)
end

function Base.show(io::IO, ::MIME"text/plain", sys::FastSystem)
function Base.show(io::IO, sys::FastSystem)
print(io, "FastSystem with ", length(sys), " particles")
end

bounding_box(sys::FastSystem) = sys.box
boundary_conditions(sys::FastSystem) = sys.boundary_conditions

# Base.size(sys::FastSystem) = size(sys.particles)
Base.length(sys::FastSystem{D,ET,AT}) where {D,ET,AT} = length(sys.elements)
Base.length(sys::FastSystem{D,S}) where {D,S} = length(sys.elements)

Base.getindex(sys::FastSystem{D,S,L}, i::Int) where {D,S,L} =
StaticAtom{D,L}(sys.positions[i], sys.elements[i])

# first piece of trickiness: can't do a totally abstract dispatch here because we need to know the signature of the constructor for AT
Base.getindex(sys::FastSystem{D,ET,SimpleAtom}, i::Int) where {D,ET} =
SimpleAtom{D}(sys.positions[i], sys.elements[i])
# these dispatches aren't strictly necessary, but they make these functions ~2x faster
position(s::FastSystem) = s.positions
species(s::FastSystem) = s.elements
Loading