Skip to content

SSVC CSV Analyzer

This module provides a script for analyzing an SSVC tree csv file.

usage: csv_analyzer.py [-h] [--outcol OUTCOL] [--permutation] csvfile

Analyze an SSVC tree csv file

positional arguments:
  csvfile          the csv file to analyze

options:
  -h, --help       show this help message and exit
  --outcol OUTCOL  the name of the outcome column
  --permutation    use permutation importance instead of drop column importance
Example

Given a test.csv file like this:

row,Exploitation,Exposure,Automatable,Human Impact,Priority
1,none,small,no,low,defer
2,none,small,no,medium,defer
3,none,small,no,high,scheduled
...
Analyze the csv file:
$ python csv_analyzer.py test.csv

Feature Importance after Dropping Each Feature in test.csv
         feature  feature_importance
0  exploitation_            0.347222
1  human_impact_            0.291667
2   automatable_            0.180556
3      exposure_            0.166667

Higher values imply more important features.

drop_col_feature_importance(df, target)

Compute feature importance using drop column feature importance

Parameters:

Name Type Description Default
df DataFrame

the dataframe to analyze

required
target str

the name of the target column to analyze against

required

Returns:

Type Description
DataFrame

a dataframe of feature importances

Source code in src/ssvc/csv_analyzer.py
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
def drop_col_feature_importance(df: pd.DataFrame, target: str) -> pd.DataFrame:
    """
    Compute feature importance using drop column feature importance

    Args:
        df: the dataframe to analyze
        target: the name of the target column to analyze against

    Returns:
        a dataframe of feature importances
    """
    X2, y = _prepare_data(df, target)
    # construct tree
    dt = DecisionTreeClassifier(random_state=99, criterion="entropy")

    imp = _drop_col_feat_imp(dt, X2, y)
    return imp

permute_feature_importance(df, target)

Compute feature importance using permutation feature importance

Parameters:

Name Type Description Default
df DataFrame

the dataframe to analyze

required
target str

the name of the target column to analyze against

required

Returns:

Type Description
DataFrame

a dataframe of feature importances

Source code in src/ssvc/csv_analyzer.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
def permute_feature_importance(df: pd.DataFrame, target: str) -> pd.DataFrame:
    """
    Compute feature importance using permutation feature importance

    Args:
        df: the dataframe to analyze
        target: the name of the target column to analyze against

    Returns:
        a dataframe of feature importances
    """
    X2, y = _prepare_data(df, target)
    # construct tree
    dt = DecisionTreeClassifier(random_state=99, criterion="entropy")

    imp = _perm_feat_imp(dt, X2, y)
    return imp