jaxvacua.util.is_outlier#
- is_outlier(data, column=None, percentile_cut=5)#
Boolean outlier mask for
databased on a symmetric percentile cut.- Parameters:
data (
Any) – Input data. If a DataFrame,columnmust be supplied.column (
Optional[str]) – Column name (DataFrame inputs only).percentile_cut (
float) – Half-width of the cut, in percentile units. Default5(drops the bottom 5% and top 5%).
- Returns:
np.ndarray – Boolean mask,
Truewhere the corresponding sample isbelow the lower or above the upper percentile.
- Return type:
ndarray
Example
cols = df.columns.to_list() all_outliers = np.logical_or.reduce( [is_outlier(df, c) for c in cols] ) df_reduced = df.loc[~all_outliers]