SSVC CSV Analyzer

This module provides a script for analyzing an SSVC tree csv file.

usage: csv_analyzer.py [-h] [--outcol OUTCOL] [--permutation] csvfile

Analyze an SSVC tree csv file

positional arguments:
  csvfile          the csv file to analyze

options:
  -h, --help       show this help message and exit
  --outcol OUTCOL  the name of the outcome column
  --permutation    use permutation importance instead of drop column importance

Example

Given a test.csv file like this:

row,Exploitation,Exposure,Automatable,Human Impact,Priority
1,none,small,no,low,defer
2,none,small,no,medium,defer
3,none,small,no,high,scheduled
...

Analyze the csv file:

$ python csv_analyzer.py test.csv

Feature Importance after Dropping Each Feature in test.csv
         feature  feature_importance
0  exploitation_            0.347222
1  human_impact_            0.291667
2   automatable_            0.180556
3      exposure_            0.166667

Higher values imply more important features.

`drop_col_feature_importance(df, target)`

Compute feature importance using drop column feature importance

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	the dataframe to analyze	required
`target`	`str`	the name of the target column to analyze against	required

Returns:

Type	Description
`DataFrame`	a dataframe of feature importances

Source code in src/ssvc/csv_analyzer.py

def drop_col_feature_importance(df: pd.DataFrame, target: str) -> pd.DataFrame:
    """
    Compute feature importance using drop column feature importance

    Args:
        df: the dataframe to analyze
        target: the name of the target column to analyze against

    Returns:
        a dataframe of feature importances
    """
    X2, y = _prepare_data(df, target)
    # construct tree
    dt = DecisionTreeClassifier(random_state=99, criterion="entropy")

    imp = _drop_col_feat_imp(dt, X2, y)
    return imp

`permute_feature_importance(df, target)`

Compute feature importance using permutation feature importance

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	the dataframe to analyze	required
`target`	`str`	the name of the target column to analyze against	required

Returns:

Type	Description
`DataFrame`	a dataframe of feature importances

Source code in src/ssvc/csv_analyzer.py

def permute_feature_importance(df: pd.DataFrame, target: str) -> pd.DataFrame:
    """
    Compute feature importance using permutation feature importance

    Args:
        df: the dataframe to analyze
        target: the name of the target column to analyze against

    Returns:
        a dataframe of feature importances
    """
    X2, y = _prepare_data(df, target)
    # construct tree
    dt = DecisionTreeClassifier(random_state=99, criterion="entropy")

    imp = _perm_feat_imp(dt, X2, y)
    return imp