jaxvacua.util.is_outlier

Contents

jaxvacua.util.is_outlier#

is_outlier(data, column=None, percentile_cut=5)#

Boolean outlier mask for data based on a symmetric percentile cut.

Parameters:
  • data (Any) – Input data. If a DataFrame, column must be supplied.

  • column (Optional[str]) – Column name (DataFrame inputs only).

  • percentile_cut (float) – Half-width of the cut, in percentile units. Default 5 (drops the bottom 5% and top 5%).

Returns:
  • np.ndarray – Boolean mask, True where the corresponding sample is

  • below the lower or above the upper percentile.

Return type:

ndarray

Example

cols = df.columns.to_list()
all_outliers = np.logical_or.reduce(
    [is_outlier(df, c) for c in cols]
)
df_reduced = df.loc[~all_outliers]