跳到主要内容

Essential basic functionality

Attributes and underlying data

Note, these attributes can be safely assigned to!

df.columns = [x.lower() for x in df.columns]

To get the actual data inside a Index or Series, use the .array property array will always be an ExtensionArray

s.array

s.index.array

If you know you need a NumPy array, use to_numpy() or numpy.asarray().

s.to_numpy()

np.asarray(s)

Accelerated operations

pandas has support for accelerating certain types of binary numerical and boolean operations using the numexpr library and the bottleneck libraries

Descriptive statistics

df.mean(0)

df.mean(1)

ts_stand = (df - df.mean()) / df.std()

Index of min/max values

s1.idxmin(), s1.idxmax()

df1.idxmin(axis=0)

Value counts (histogramming) / mode

s.value_counts()

s5.mode()

Discretization and quantiling

factor = pd.cut(arr, 4)

factor = pd.cut(arr, [-5, -1, 0, 1, 5])

factor = pd.qcut(arr, [0, 0.25, 0.5, 0.75, 1])

Function application

Row or column-wise function application

df.apply(lambda x: np.mean(x))

df.apply(lambda x: np.mean(x), axis=1)

df.apply(lambda x: x.max() - x.min())

df.apply(np.cumsum)

df.apply(np.exp)

tsdf.apply(lambda x: x.idxmax())

df_udf.apply(subtract_and_divide, args=(5,), divide=3)

tsdf.apply(pd.Series.interpolate)

Reindexing and altering labels

s.reindex(["e", "b", "f", "d"])

df.reindex(index=["c", "f", "b"], columns=["three", "two", "one"])

rs = s.reindex(df.index)

Dropping labels from an axis

df.drop(["a", "d"], axis=0)

df.drop(["one"], axis=1)

Iteration

When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. DataFrames follow the dict-like convention of iterating over the “keys” of the objects

  • Series: values

  • DataFrame: column labels

  • iterrows(): Iterate over the rows of a DataFrame as (index, Series) pairs. This converts the rows to Series objects, which can change the dtypes and has some performance implications.

  • itertuples(): Iterate over the rows of a DataFrame as namedtuples of the values. This is a lot faster than iterrows(), and is in most cases preferable to use to iterate over the values of a DataFrame

Sorting

By index

unsorted_df.sort_index()

unsorted_df.sort_index(ascending=False)

unsorted_df.sort_index(axis=1)

By values

The optional by parameter to DataFrame.sort_values() may used to specify one or more columns to use to determine the sorted order

df1.sort_values(by="two")

df1[["one", "two", "three"]].sort_values(by=["one", "two"])

s1.sort_values(key=lambda x: x.str.lower())

df.sort_values(by="a", key=lambda col: col.str.lower())

smallest / largest values

Series has the nsmallest() and nlargest() methods which return the smallest or largest values

s.nsmallest(3)

df.nlargest(3, "a")

df.nlargest(5, ["a", "c"])

dtypes

astype

df3.astype("float32").dtypes

dft[["a", "b"]] = dft[["a", "b"]].astype(np.uint8)

dft1 = dft1.astype({"a": np.bool_, "c": np.float64})