API
Multiple Dispatch
By design, several functions have dispatches on multiple abstract types. Their docstrings are presented here for an overview, and then specific cases are duplicated in relevant sections below.
ChemistryFeaturization.encode — Functionencode(val, codec)Encode val according to the scheme described by codec.
encode(val, ohoc::OneHotOneCold)A flexible version of Flux.onehot that can handle both categorical and continuous-valued encoding.
encode(atoms, fd::AbstractFeatureDescriptor)
encode(atoms, fd::AbstractFeatureDescriptor, codec::AbstractCodec)Encode features for atoms using the feature descriptor fd using the default codec for fd. if codec is not specified.
encode(atoms, featurization)Encode the features of atoms according to the scheme described by featurization.
ChemistryFeaturization.decode — Functiondecode(encoded, codec)Decode encoded presuming it was encoded by codec.
decode(encoded_feature, fd::AbstractFeatureDescriptor)Decode encoded_feature using the feature descriptor fd, presuming it was encoded via fd's default codec if codec is not specified.
decode(encoded, featurization)Decode encoded, presuming it was encoded by featurization.
decode(featurized_atoms::FeaturizedAtoms)Decode a FeaturizedAtoms object, and return the decoded value.
ChemistryFeaturization.encodable_elements — Functionencodable_elements(fd::AbstractFeatureDescriptor)Return a list of elemental symbols for which the feature associated with fd is defined.
encodable_elements(featurization)Return a list of elemental symbols that are valid constituents for structures that featurization can featurize.
ChemistryFeaturization.output_shape — Functionoutput_shape(codec)
output_shape(codec, val)Return the shape of the encoded output of codec (when applied to intput val).
Feature Descriptors
Abstract Types
ChemistryFeaturization.AbstractFeatureDescriptor — TypeAbstractFeatureDescriptorAll feature descriptors defined for different types of features must be a subtype of AbstractFeatureDescriptor.
ChemistryFeaturization.AbstractAtomFeatureDescriptor — TypeAbstractAtomFeatureDescriptorAll feature descriptors that describe single atoms within a structure should subtype this.
ChemistryFeaturization.AbstractPairFeatureDescriptor — TypeAbstractPairFeatureDescriptorAll feature descriptors that describe pairs of atoms within a structure should subtype this.
Working with Feature Descriptors
ChemistryFeaturization.get_value — Functionget_value(fd::AbstractFeatureDescriptor, atoms)Get the value(s) of feature corresponding to feature descriptor fd for structure atoms. This function computes and returns the value that would actually get encoded by encode.
ChemistryFeaturization.encodable_elements — Methodencodable_elements(fd::AbstractFeatureDescriptor)Return a list of elemental symbols for which the feature associated with fd is defined.
ChemistryFeaturization.default_codec — Functiondefault_codec(fd::AbstractFeatureDescriptor)Return the default codec to use for encoding values of fd.
ChemistryFeaturization.encode — Methodencode(atoms, fd::AbstractFeatureDescriptor)
encode(atoms, fd::AbstractFeatureDescriptor, codec::AbstractCodec)Encode features for atoms using the feature descriptor fd using the default codec for fd. if codec is not specified.
ChemistryFeaturization.decode — Methoddecode(encoded_feature, fd::AbstractFeatureDescriptor)Decode encoded_feature using the feature descriptor fd, presuming it was encoded via fd's default codec if codec is not specified.
ElementFeature submodule
The ElementFeature submodule includes a concrete implementation of an AbstractAtomFeatureDescriptor for the case of features whose values can be computed from a lookup table of elemental symbols. It also has some utility functions for defining default encoder settings.
ChemistryFeaturization.ElementFeature.ElementFeatureDescriptor — TypeElementFeatureDescriptorA descriptor for features associated with individual atoms that depend only upon their elemental identity (and whose values can hence be determined from a lookup table).
Fields
- name::String: Name of the feature
- lookup_table::DataFrame: table containing values of feature for every encodable element
ChemistryFeaturization.ElementFeature.fea_minmax — Functionfea_minmax(feature_name, lookup_table)Compute the minimum and maximum possible values of an ElementFeatureDescriptor, given a(n optional) lookup table.
ChemistryFeaturization.default_log — Functiondefault_log(min_val, max_val; threshold = 2)
default_log(possible_vals; threshold = 2)Determine whether a OneHotOneCold codec used with a particular FeatureDescriptor should have logarithmically spaced bins. 
Operates by comparing the ratio of the maximum to minimum values to a specified order-of-magnitude threshold.
default_log(feature_name, lookup_table=element_data_df; threshold_oom=2)Determine whether an element feature should be encoded by a OneHotOneCold codec with logarithmically spaced bins.
ChemistryFeaturization.default_categorical — Functiondefault_categorical(possible_vals; threshold_length=5)Determine if a feature should be treated as categorical or continuous-valued.
If the value type is not a number, always returns true. If it is numerical, returns true if the number of possible values is less than threshold_length and false otherwise.
default_categorical(feature_name, lookup_table=element_data_df; threshold_length=5)Determine whether an element feature should be treated as categorical- or continuous-valued.
ChemistryFeaturization.get_bins — Functionget_bins(possible_vals; threshold_oom=2, threshold_length=5, nbins=10, logspaced, categorical)Given a list of possible values, return a list of bins, making sensible default choices for the binning parameters.
See also: default_log, default_categorical
get_bins(feature_name, lookup_table=element_data_df; nbins=10, threshold_oom=2, threshold_length=5, logspaced, categorical)Compute list of bins for an element feature.
ChemistryFeaturization.ElementFeature.get_param_vec — FunctionLittle helper function to check that the logspace/categorical vector/boolean is appropriate and convert it to a vector as needed.
To see the concrete implementations of interface functions such as get_value, encodable_elements, and default_codec on ElementFeatureDescriptor objects, take a look at the source code at src/features/elementfeature.jl.
The ElementFeature module also makes use of the Data module, which provides data to populate lookup tables for a variety of commonly-desired features. In particular, it provides the constants element_data_df which will be automatically used as the lookup table for an ElementFeatureDescriptor if none is provided, as well as elementfeature_info, a dictionary providing information about the available features included in element_data_df. 
Codecs
Codec Interface
ChemistryFeaturization.AbstractCodec — TypeAbstractCodecAll codecs defined for different encoding-decoding schemes must be a subtype of AbstractCodec.
ChemistryFeaturization.encode — Methodencode(val, codec)Encode val according to the scheme described by codec.
ChemistryFeaturization.decode — Methoddecode(encoded, codec)Decode encoded presuming it was encoded by codec.
ChemistryFeaturization.output_shape — Methodoutput_shape(codec)
output_shape(codec, val)Return the shape of the encoded output of codec (when applied to intput val).
Concrete Types
The OneHotOneCold codec is a very common encoding scheme. For categorical-valued features, it encodes a bitstring with a length equal to the number of possible values composed of zeros except in the slot corresponding to the value in that instance. For continuous-valued features, a binning scheme must be specified, or defaults will be chosen using the utility functions shown below.
ChemistryFeaturization.OneHotOneCold — TypeOneHotOneCold(categorical, bins)AbstractCodec type which uses a dummy variable (as defined in statistical literature), i.e., which employs one-hot encoding and a one-cold decoding scheme.
ChemistryFeaturization.DirectCodec — TypeDirectCodecCodec type whose encoding function is some constant * the identity function.
Featurizations
ChemistryFeaturization.AbstractFeaturization — TypeAbstractFeaturizationA featurization stores a set of FeatureDescriptors and associated Codecs and defines how to combine the encoded values into whatever format is required to feed into a model.
ChemistryFeaturization.features — Functionfeatures(featurization::AbstractFeaturization)Return the list of feature descriptors used by featurization.
ChemistryFeaturization.encodable_elements — Methodencodable_elements(featurization)Return a list of elemental symbols that are valid constituents for structures that featurization can featurize.
ChemistryFeaturization.encode — Methodencode(atoms, featurization)Encode the features of atoms according to the scheme described by featurization.
ChemistryFeaturization.decode — Methoddecode(encoded, featurization)Decode encoded, presuming it was encoded by featurization.
FeaturizedAtoms objects
ChemistryFeaturization.FeaturizedAtoms — TypeFeaturizedAtomsContainer object for an atomic structure object, a featurization, and the resulting encoded_features from applying the featurization to the atoms.
Fields
- atoms: object to be featurized
- featurization: Featurization scheme meant to be used for featurizing- atoms
- encoded_features: The result of featurizing- atomsusing- featurization
encoded_features will NOT change for a given atoms-featurization pair.
ChemistryFeaturization.featurize — Functionfeaturize(atoms, featurization::AbstractFeaturization)Featurize an atoms object using a featurization and return the FeaturizedAtoms object created.
ChemistryFeaturization.decode — Methoddecode(featurized_atoms::FeaturizedAtoms)Decode a FeaturizedAtoms object, and return the decoded value.