API
Multiple Dispatch
By design, several functions have dispatches on multiple abstract types. Their docstrings are presented here for an overview, and then specific cases are duplicated in relevant sections below.
ChemistryFeaturization.encode
— Functionencode(val, codec)
Encode val
according to the scheme described by codec
.
encode(val, ohoc::OneHotOneCold)
A flexible version of Flux.onehot that can handle both categorical and continuous-valued encoding.
encode(atoms, fd::AbstractFeatureDescriptor)
encode(atoms, fd::AbstractFeatureDescriptor, codec::AbstractCodec)
Encode features for atoms
using the feature descriptor fd
using the default codec for fd
. if codec
is not specified.
encode(atoms, featurization)
Encode the features of atoms
according to the scheme described by featurization
.
ChemistryFeaturization.decode
— Functiondecode(encoded, codec)
Decode encoded
presuming it was encoded by codec
.
decode(encoded_feature, fd::AbstractFeatureDescriptor)
Decode encoded_feature
using the feature descriptor fd
, presuming it was encoded via fd
's default codec if codec
is not specified.
decode(encoded, featurization)
Decode encoded
, presuming it was encoded by featurization
.
decode(featurized_atoms::FeaturizedAtoms)
Decode a FeaturizedAtoms
object, and return the decoded value.
ChemistryFeaturization.encodable_elements
— Functionencodable_elements(fd::AbstractFeatureDescriptor)
Return a list of elemental symbols for which the feature associated with fd
is defined.
encodable_elements(featurization)
Return a list of elemental symbols that are valid constituents for structures that featurization
can featurize.
ChemistryFeaturization.output_shape
— Functionoutput_shape(codec)
output_shape(codec, val)
Return the shape of the encoded output of codec
(when applied to intput val
).
Feature Descriptors
Abstract Types
ChemistryFeaturization.AbstractFeatureDescriptor
— TypeAbstractFeatureDescriptor
All feature descriptors defined for different types of features must be a subtype of AbstractFeatureDescriptor.
ChemistryFeaturization.AbstractAtomFeatureDescriptor
— TypeAbstractAtomFeatureDescriptor
All feature descriptors that describe single atoms within a structure should subtype this.
ChemistryFeaturization.AbstractPairFeatureDescriptor
— TypeAbstractPairFeatureDescriptor
All feature descriptors that describe pairs of atoms within a structure should subtype this.
Working with Feature Descriptors
ChemistryFeaturization.get_value
— Functionget_value(fd::AbstractFeatureDescriptor, atoms)
Get the value(s) of feature corresponding to feature descriptor fd
for structure atoms
. This function computes and returns the value that would actually get encoded by encode
.
ChemistryFeaturization.encodable_elements
— Methodencodable_elements(fd::AbstractFeatureDescriptor)
Return a list of elemental symbols for which the feature associated with fd
is defined.
ChemistryFeaturization.default_codec
— Functiondefault_codec(fd::AbstractFeatureDescriptor)
Return the default codec to use for encoding values of fd
.
ChemistryFeaturization.encode
— Methodencode(atoms, fd::AbstractFeatureDescriptor)
encode(atoms, fd::AbstractFeatureDescriptor, codec::AbstractCodec)
Encode features for atoms
using the feature descriptor fd
using the default codec for fd
. if codec
is not specified.
ChemistryFeaturization.decode
— Methoddecode(encoded_feature, fd::AbstractFeatureDescriptor)
Decode encoded_feature
using the feature descriptor fd
, presuming it was encoded via fd
's default codec if codec
is not specified.
ElementFeature
submodule
The ElementFeature
submodule includes a concrete implementation of an AbstractAtomFeatureDescriptor
for the case of features whose values can be computed from a lookup table of elemental symbols. It also has some utility functions for defining default encoder settings.
ChemistryFeaturization.ElementFeature.ElementFeatureDescriptor
— TypeElementFeatureDescriptor
A descriptor for features associated with individual atoms that depend only upon their elemental identity (and whose values can hence be determined from a lookup table).
Fields
name::String
: Name of the featurelookup_table::DataFrame
: table containing values of feature for every encodable element
ChemistryFeaturization.ElementFeature.fea_minmax
— Functionfea_minmax(feature_name, lookup_table)
Compute the minimum and maximum possible values of an ElementFeatureDescriptor
, given a(n optional) lookup table.
ChemistryFeaturization.default_log
— Functiondefault_log(min_val, max_val; threshold = 2)
default_log(possible_vals; threshold = 2)
Determine whether a OneHotOneCold
codec used with a particular FeatureDescriptor
should have logarithmically spaced bins.
Operates by comparing the ratio of the maximum to minimum values to a specified order-of-magnitude threshold.
default_log(feature_name, lookup_table=element_data_df; threshold_oom=2)
Determine whether an element feature should be encoded by a OneHotOneCold
codec with logarithmically spaced bins.
ChemistryFeaturization.default_categorical
— Functiondefault_categorical(possible_vals; threshold_length=5)
Determine if a feature should be treated as categorical or continuous-valued.
If the value type is not a number, always returns true. If it is numerical, returns true if the number of possible values is less than threshold_length
and false otherwise.
default_categorical(feature_name, lookup_table=element_data_df; threshold_length=5)
Determine whether an element feature should be treated as categorical- or continuous-valued.
ChemistryFeaturization.get_bins
— Functionget_bins(possible_vals; threshold_oom=2, threshold_length=5, nbins=10, logspaced, categorical)
Given a list of possible values, return a list of bins, making sensible default choices for the binning parameters.
See also: default_log
, default_categorical
get_bins(feature_name, lookup_table=element_data_df; nbins=10, threshold_oom=2, threshold_length=5, logspaced, categorical)
Compute list of bins for an element feature.
ChemistryFeaturization.ElementFeature.get_param_vec
— FunctionLittle helper function to check that the logspace/categorical vector/boolean is appropriate and convert it to a vector as needed.
To see the concrete implementations of interface functions such as get_value
, encodable_elements
, and default_codec
on ElementFeatureDescriptor
objects, take a look at the source code at src/features/elementfeature.jl
.
The ElementFeature
module also makes use of the Data
module, which provides data to populate lookup tables for a variety of commonly-desired features. In particular, it provides the constants element_data_df
which will be automatically used as the lookup table for an ElementFeatureDescriptor
if none is provided, as well as elementfeature_info
, a dictionary providing information about the available features included in element_data_df
.
Codecs
Codec Interface
ChemistryFeaturization.AbstractCodec
— TypeAbstractCodec
All codecs defined for different encoding-decoding schemes must be a subtype of AbstractCodec.
ChemistryFeaturization.encode
— Methodencode(val, codec)
Encode val
according to the scheme described by codec
.
ChemistryFeaturization.decode
— Methoddecode(encoded, codec)
Decode encoded
presuming it was encoded by codec
.
ChemistryFeaturization.output_shape
— Methodoutput_shape(codec)
output_shape(codec, val)
Return the shape of the encoded output of codec
(when applied to intput val
).
Concrete Types
The OneHotOneCold
codec is a very common encoding scheme. For categorical-valued features, it encodes a bitstring with a length equal to the number of possible values composed of zeros except in the slot corresponding to the value in that instance. For continuous-valued features, a binning scheme must be specified, or defaults will be chosen using the utility functions shown below.
ChemistryFeaturization.OneHotOneCold
— TypeOneHotOneCold(categorical, bins)
AbstractCodec type which uses a dummy variable (as defined in statistical literature), i.e., which employs one-hot encoding and a one-cold decoding scheme.
ChemistryFeaturization.DirectCodec
— TypeDirectCodec
Codec type whose encoding function is some constant * the identity function.
Featurizations
ChemistryFeaturization.AbstractFeaturization
— TypeAbstractFeaturization
A featurization stores a set of FeatureDescriptors and associated Codecs and defines how to combine the encoded values into whatever format is required to feed into a model.
ChemistryFeaturization.features
— Functionfeatures(featurization::AbstractFeaturization)
Return the list of feature descriptors used by featurization
.
ChemistryFeaturization.encodable_elements
— Methodencodable_elements(featurization)
Return a list of elemental symbols that are valid constituents for structures that featurization
can featurize.
ChemistryFeaturization.encode
— Methodencode(atoms, featurization)
Encode the features of atoms
according to the scheme described by featurization
.
ChemistryFeaturization.decode
— Methoddecode(encoded, featurization)
Decode encoded
, presuming it was encoded by featurization
.
FeaturizedAtoms
objects
ChemistryFeaturization.FeaturizedAtoms
— TypeFeaturizedAtoms
Container object for an atomic structure object, a featurization, and the resulting encoded_features
from applying the featurization
to the atoms
.
Fields
atoms
: object to be featurizedfeaturization
: Featurization scheme meant to be used for featurizingatoms
encoded_features
: The result of featurizingatoms
usingfeaturization
encoded_features
will NOT change for a given atoms-featurization pair.
ChemistryFeaturization.featurize
— Functionfeaturize(atoms, featurization::AbstractFeaturization)
Featurize an atoms
object using a featurization
and return the FeaturizedAtoms
object created.
ChemistryFeaturization.decode
— Methoddecode(featurized_atoms::FeaturizedAtoms)
Decode a FeaturizedAtoms
object, and return the decoded value.