package classification
import "github.com/cyralinc/dmap/classification"
Package classification provides various types and functions to facilitate data classification. The type Classifier provides an interface which takes arbitrary data as input and returns a classified version of that data as output. The package contains at least one implementation which uses OPA and Rego to perform the actual classification logic (see LabelClassifier), however other implementations may be added in the future.
Index
- type Classification
- type Classifier
- type InvalidLabelsError
- type Label
- type LabelClassifier
- type LabelSet
- type Result
Types
type Classification
type Classification struct { // AttributePath is the full path of the data repository attribute // (e.g. the column). Each element corresponds to a component, in increasing // order of granularity (e.g. [database, schema, table, column]). AttributePath []string `json:"attributePath"` // Labels is the set of labels that the attribute was classified as. Labels LabelSet `json:"labels"` }
Classification represents the classification of a data repository attribute.
type Classifier
type Classifier interface { // Classify takes the given input, which amounts to essentially a "row of // data", and returns the data classifications for that input. The input is // a map of attribute names (i.e. columns) to their values. The returned // Result is a map of attribute names to the set of labels that attributes // were classified as. Classify(ctx context.Context, input map[string]any) (Result, error) }
Classifier is an interface that represents a data classifier. A classifier takes a set of data attributes and classifies them into a set of labels.
type InvalidLabelsError
type InvalidLabelsError struct { // contains filtered or unexported fields }
InvalidLabelsError is an error type that represents an error when one or more labels are invalid, e.g. they have invalid classification rules. The error contains a slice of errors that caused the error, which can be unwrapped to get the individual errors that caused the problems.
func (InvalidLabelsError) Error
func (e InvalidLabelsError) Error() string
Error returns a string representation of the InvalidLabelsError.
func (InvalidLabelsError) Unwrap
func (e InvalidLabelsError) Unwrap() []error
Unwrap returns the errors that caused the InvalidLabelsError.
type Label
type Label struct { // Name is the name of the label. Name string `yaml:"name" json:"name"` // Description is a brief description of the label. Description string `yaml:"description" json:"description"` // Tags are a list of arbitrary tags associated with the label. Tags []string `yaml:"tags" json:"tags"` // ClassificationRule is the compiled Rego classification rule used to // classify data. ClassificationRule *ast.Module `yaml:"-" json:"-"` }
Label represents a data classification label.
func GetCustomLabels
func GetCustomLabels(labelsYamlFname string) ([]Label, error)
GetCustomLabels loads and returns the labels and their classification rules defined in the given labels yaml file. The labels are read from the file along with their classification rule Rego files (defined in the yaml). If there is an error unmarshalling the labels file, it is returned. If there are errors reading or parsing a classification rules for labels, the errors are aggregated into an InvalidLabelsError and returned, along with the labels that were successfully read.
func GetPredefinedLabels
func GetPredefinedLabels() ([]Label, error)
GetPredefinedLabels loads and returns the predefined embedded labels and their classification rules. The labels are read from the embedded labels.yaml file and the classification rules are read from the embedded Rego files. If there is an error reading or unmarshalling the labels file, it is returned. If there are errors reading or parsing a classification rules for labels, the errors are aggregated into an InvalidLabelsError and returned, along with the labels that were successfully read. Note that this should not return an error in reality, as the embedded labels should always be valid. If it does, it indicates a problem with the embedded labels!
func NewLabel
func NewLabel(name, description, classificationRule string, tags ...string) (Label, error)
NewLabel creates a new Label with the given name, description, classification rule, and tags. The classification rule is expected to be the raw Rego code that will be used to classify data. If the classification rule is invalid, an error is returned.
type LabelClassifier
type LabelClassifier struct { // contains filtered or unexported fields }
LabelClassifier is a Classifier implementation that uses a set of labels and their classification rules to classify data.
func NewLabelClassifier
func NewLabelClassifier(ctx context.Context, labels ...Label) (*LabelClassifier, error)
NewLabelClassifier creates a new LabelClassifier with the provided labels.
func (*LabelClassifier) Classify
func (c *LabelClassifier) Classify(ctx context.Context, input map[string]any) (Result, error)
Classify performs the classification of the provided input using the classifier's labels and their corresponding classification rules. The input parameter is a map of attribute names to their values, e.g. a single database row. The classifier returns a Result, which is a map of attribute names to the set of labels that the attribute was classified as.
type LabelSet
type LabelSet map[string]struct{}
LabelSet is a set of unique label names.
func (LabelSet) MarshalJSON
func (l LabelSet) MarshalJSON() ([]byte, error)
MarshalJSON marshals the LabelSet into a JSON array of strings, where each string is the name of a label in the set.
type Result
type Result map[string]LabelSet
Result represents the classifications for a set of data attributes. The key is the attribute (i.e. column) name and the value is the set of labels that attribute was classified as.