downcast_numeric_columns

pyhelpers.ops.downcast_numeric_columns(*dataframes)[source]

Downcasts numeric columns in pandas DataFrames to optimal dtypes to reduce memory usage.

This function processes multiple DataFrames in one pass, converting:
  • Integer columns to the smallest signed integer dtype (int8, int16, int32, or int64)

  • Floating-point columns to the smallest floating dtype (float32 or float64) that can safely represent their values.

Parameters:

dataframes (pandas.DataFrame | polars.DataFrame) – One or more pandas DataFrames to optimize

Returns:

New DataFrame(s) with downcasted numeric columns.

Return type:

None | pandas.DataFrame | polars.DataFrame | tuple[pandas.DataFrame | polars.DataFrame]

Note

  • Modifies DataFrames in place for memory efficiency.

  • Skips non-numeric columns automatically.

  • Uses pandas’ built-in optimisations for batch processing.

Examples:

>>> from pyhelpers.ops import downcast_numeric_columns
>>> from pyhelpers._cache import example_dataframe
>>> import polars as pl
>>> df1 = example_dataframe().copy()
>>> df1.dtypes
Longitude    float64
Latitude     float64
dtype: object
>>> df2 = example_dataframe().T.copy()
>>> df2.dtypes
City
London        float64
Birmingham    float64
Manchester    float64
Leeds         float64
dtype: object

>>> df11, df21 = downcast_numeric_columns(df1, df2)
>>> df11.dtypes
Longitude    float32
Latitude     float32
dtype: object
>>> df21.dtypes
City
London        float32
Birmingham    float32
Manchester    float32
Leeds         float32
dtype: object

>>> df1, df2 = map(pl.from_pandas, (df1, df2))
>>> df1.dtypes
[Float64, Float64]
>>> df2.dtypes
[Float64, Float64, Float64, Float64]
>>> df21, df22 = downcast_numeric_columns(df1, df2)
>>> df21.dtypes
[Float32, Float32]
>>> df22.dtypes
[Float32, Float32, Float32, Float32]