package classification

import "github.com/cyralinc/dmap/classification"

Package classification provides various types and functions to facilitate data classification. The type Classifier provides an interface which takes arbitrary data as input and returns a classified version of that data as output. The package contains at least one implementation which uses OPA and Rego to perform the actual classification logic (see LabelClassifier), however other implementations may be added in the future.

Index

Types

type Classification

type Classification struct {
	// AttributePath is the full path of the data repository attribute
	// (e.g. the column). Each element corresponds to a component, in increasing
	// order of granularity (e.g. [database, schema, table, column]).
	AttributePath []string `json:"attributePath"`
	// Labels is the set of labels that the attribute was classified as.
	Labels LabelSet `json:"labels"`
}

Classification represents the classification of a data repository attribute.

type Classifier

type Classifier interface {
	// Classify takes the given input, which amounts to essentially a "row of
	// data", and returns the data classifications for that input. The input is
	// a map of attribute names (i.e. columns) to their values. The returned
	// Result is a map of attribute names to the set of labels that attributes
	// were classified as.
	Classify(ctx context.Context, input map[string]any) (Result, error)
}

Classifier is an interface that represents a data classifier. A classifier takes a set of data attributes and classifies them into a set of labels.

type InvalidLabelsError

type InvalidLabelsError struct {
	// contains filtered or unexported fields
}

InvalidLabelsError is an error type that represents an error when one or more labels are invalid, e.g. they have invalid classification rules. The error contains a slice of errors that caused the error, which can be unwrapped to get the individual errors that caused the problems.

func (InvalidLabelsError) Error

func (e InvalidLabelsError) Error() string

Error returns a string representation of the InvalidLabelsError.

func (InvalidLabelsError) Unwrap

func (e InvalidLabelsError) Unwrap() []error

Unwrap returns the errors that caused the InvalidLabelsError.

type Label

type Label struct {
	// Name is the name of the label.
	Name string `yaml:"name" json:"name"`
	// Description is a brief description of the label.
	Description string `yaml:"description" json:"description"`
	// Tags are a list of arbitrary tags associated with the label.
	Tags []string `yaml:"tags" json:"tags"`
	// ClassificationRule is the compiled Rego classification rule used to
	// classify data.
	ClassificationRule *ast.Module `yaml:"-" json:"-"`
}

Label represents a data classification label.

func GetCustomLabels

func GetCustomLabels(labelsYamlFname string) ([]Label, error)

GetCustomLabels loads and returns the labels and their classification rules defined in the given labels yaml file. The labels are read from the file along with their classification rule Rego files (defined in the yaml). If there is an error unmarshalling the labels file, it is returned. If there are errors reading or parsing a classification rules for labels, the errors are aggregated into an InvalidLabelsError and returned, along with the labels that were successfully read.

func GetPredefinedLabels

func GetPredefinedLabels() ([]Label, error)

GetPredefinedLabels loads and returns the predefined embedded labels and their classification rules. The labels are read from the embedded labels.yaml file and the classification rules are read from the embedded Rego files. If there is an error reading or unmarshalling the labels file, it is returned. If there are errors reading or parsing a classification rules for labels, the errors are aggregated into an InvalidLabelsError and returned, along with the labels that were successfully read. Note that this should not return an error in reality, as the embedded labels should always be valid. If it does, it indicates a problem with the embedded labels!

func NewLabel

func NewLabel(name, description, classificationRule string, tags ...string) (Label, error)

NewLabel creates a new Label with the given name, description, classification rule, and tags. The classification rule is expected to be the raw Rego code that will be used to classify data. If the classification rule is invalid, an error is returned.

type LabelClassifier

type LabelClassifier struct {
	// contains filtered or unexported fields
}

LabelClassifier is a Classifier implementation that uses a set of labels and their classification rules to classify data.

func NewLabelClassifier

func NewLabelClassifier(ctx context.Context, labels ...Label) (*LabelClassifier, error)

NewLabelClassifier creates a new LabelClassifier with the provided labels.

func (*LabelClassifier) Classify

func (c *LabelClassifier) Classify(ctx context.Context, input map[string]any) (Result, error)

Classify performs the classification of the provided input using the classifier's labels and their corresponding classification rules. The input parameter is a map of attribute names to their values, e.g. a single database row. The classifier returns a Result, which is a map of attribute names to the set of labels that the attribute was classified as.

type LabelSet

type LabelSet map[string]struct{}

LabelSet is a set of unique label names.

func (LabelSet) MarshalJSON

func (l LabelSet) MarshalJSON() ([]byte, error)

MarshalJSON marshals the LabelSet into a JSON array of strings, where each string is the name of a label in the set.

type Result

type Result map[string]LabelSet

Result represents the classifications for a set of data attributes. The key is the attribute (i.e. column) name and the value is the set of labels that attribute was classified as.