API

Multiple Dispatch

By design, several functions have dispatches on multiple abstract types. Their docstrings are presented here for an overview, and then specific cases are duplicated in relevant sections below.

ChemistryFeaturization.encodeFunction
encode(val, codec)

Encode val according to the scheme described by codec.

source
encode(val, ohoc::OneHotOneCold)

A flexible version of Flux.onehot that can handle both categorical and continuous-valued encoding.

source
encode(atoms, fd::AbstractFeatureDescriptor)
encode(atoms, fd::AbstractFeatureDescriptor, codec::AbstractCodec)

Encode features for atoms using the feature descriptor fd using the default codec for fd. if codec is not specified.

source
encode(atoms, featurization)

Encode the features of atoms according to the scheme described by featurization.

source
ChemistryFeaturization.decodeFunction
decode(encoded, codec)

Decode encoded presuming it was encoded by codec.

source
decode(encoded_feature, fd::AbstractFeatureDescriptor)

Decode encoded_feature using the feature descriptor fd, presuming it was encoded via fd's default codec if codec is not specified.

source
decode(encoded, featurization)

Decode encoded, presuming it was encoded by featurization.

source
decode(featurized_atoms::FeaturizedAtoms)

Decode a FeaturizedAtoms object, and return the decoded value.

source
ChemistryFeaturization.encodable_elementsFunction
encodable_elements(fd::AbstractFeatureDescriptor)

Return a list of elemental symbols for which the feature associated with fd is defined.

source
encodable_elements(featurization)

Return a list of elemental symbols that are valid constituents for structures that featurization can featurize.

source

Feature Descriptors

Abstract Types

Working with Feature Descriptors

ChemistryFeaturization.get_valueFunction
get_value(fd::AbstractFeatureDescriptor, atoms)

Get the value(s) of feature corresponding to feature descriptor fd for structure atoms. This function computes and returns the value that would actually get encoded by encode.

source
ChemistryFeaturization.encodeMethod
encode(atoms, fd::AbstractFeatureDescriptor)
encode(atoms, fd::AbstractFeatureDescriptor, codec::AbstractCodec)

Encode features for atoms using the feature descriptor fd using the default codec for fd. if codec is not specified.

source
ChemistryFeaturization.decodeMethod
decode(encoded_feature, fd::AbstractFeatureDescriptor)

Decode encoded_feature using the feature descriptor fd, presuming it was encoded via fd's default codec if codec is not specified.

source

ElementFeature submodule

The ElementFeature submodule includes a concrete implementation of an AbstractAtomFeatureDescriptor for the case of features whose values can be computed from a lookup table of elemental symbols. It also has some utility functions for defining default encoder settings.

ChemistryFeaturization.ElementFeature.ElementFeatureDescriptorType
ElementFeatureDescriptor

A descriptor for features associated with individual atoms that depend only upon their elemental identity (and whose values can hence be determined from a lookup table).

Fields

  • name::String: Name of the feature
  • lookup_table::DataFrame: table containing values of feature for every encodable element
source
ChemistryFeaturization.default_logFunction
default_log(min_val, max_val; threshold = 2)
default_log(possible_vals; threshold = 2)

Determine whether a OneHotOneCold codec used with a particular FeatureDescriptor should have logarithmically spaced bins.

Operates by comparing the ratio of the maximum to minimum values to a specified order-of-magnitude threshold.

source
default_log(feature_name, lookup_table=element_data_df; threshold_oom=2)

Determine whether an element feature should be encoded by a OneHotOneCold codec with logarithmically spaced bins.

source
ChemistryFeaturization.default_categoricalFunction
default_categorical(possible_vals; threshold_length=5)

Determine if a feature should be treated as categorical or continuous-valued.

If the value type is not a number, always returns true. If it is numerical, returns true if the number of possible values is less than threshold_length and false otherwise.

source
default_categorical(feature_name, lookup_table=element_data_df; threshold_length=5)

Determine whether an element feature should be treated as categorical- or continuous-valued.

source
ChemistryFeaturization.get_binsFunction
get_bins(possible_vals; threshold_oom=2, threshold_length=5, nbins=10, logspaced, categorical)

Given a list of possible values, return a list of bins, making sensible default choices for the binning parameters.

See also: default_log, default_categorical

source
get_bins(feature_name, lookup_table=element_data_df; nbins=10, threshold_oom=2, threshold_length=5, logspaced, categorical)

Compute list of bins for an element feature.

source

To see the concrete implementations of interface functions such as get_value, encodable_elements, and default_codec on ElementFeatureDescriptor objects, take a look at the source code at src/features/elementfeature.jl.

The ElementFeature module also makes use of the Data module, which provides data to populate lookup tables for a variety of commonly-desired features. In particular, it provides the constants element_data_df which will be automatically used as the lookup table for an ElementFeatureDescriptor if none is provided, as well as elementfeature_info, a dictionary providing information about the available features included in element_data_df.

Codecs

Codec Interface

Concrete Types

The OneHotOneCold codec is a very common encoding scheme. For categorical-valued features, it encodes a bitstring with a length equal to the number of possible values composed of zeros except in the slot corresponding to the value in that instance. For continuous-valued features, a binning scheme must be specified, or defaults will be chosen using the utility functions shown below.

ChemistryFeaturization.OneHotOneColdType
OneHotOneCold(categorical, bins)

AbstractCodec type which uses a dummy variable (as defined in statistical literature), i.e., which employs one-hot encoding and a one-cold decoding scheme.

source

Featurizations

ChemistryFeaturization.AbstractFeaturizationType
AbstractFeaturization

A featurization stores a set of FeatureDescriptors and associated Codecs and defines how to combine the encoded values into whatever format is required to feed into a model.

source

FeaturizedAtoms objects

ChemistryFeaturization.FeaturizedAtomsType
FeaturizedAtoms

Container object for an atomic structure object, a featurization, and the resulting encoded_features from applying the featurization to the atoms.

Fields

  • atoms: object to be featurized
  • featurization: Featurization scheme meant to be used for featurizing atoms
  • encoded_features: The result of featurizing atoms using featurization
Note

encoded_features will NOT change for a given atoms-featurization pair.

source
ChemistryFeaturization.featurizeFunction
featurize(atoms, featurization::AbstractFeaturization)

Featurize an atoms object using a featurization and return the FeaturizedAtoms object created.

source