note
create
by constructor
s = pd.Series([1, 3, 5, np.nan, 6, 8])
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df2 = pd.DataFrame(
{
"A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo",
}
)
df2.dtypes
Out[11]:
A float64
B datetime64[s]
C float32
D int32
E category
F object
dtype: object
read
df.head()
df.tail(3)
df.index
df.columns
df.to_numpy()
df.describe()
df.T
df.sort_index(axis=1, ascending=False)
df.sort_values(by="B")
Getitem ([])
For a DataFrame
, passing a single label selects a columns and yields a Series
equivalent to df.A
df["A"]
For a DataFrame
, passing a slice : selects matching rows
df[0:3]
df["20130102":"20130104"]
Selection by label
Selecting a row matching a label
df.loc[dates[0]]
Selecting all rows (:) with a select column labels
df.loc[:, ["A", "B"]]
For label slicing, both endpoints are included
df.loc["20130102":"20130104", ["A", "B"]]
Out[29]:
A B
2013-01-02 1.212112 -0.173215
2013-01-03 -0.861849 -2.104569
2013-01-04 0.721555 -0.706771
Selecting a single row and column label returns a scalar
df.loc[dates[0], "A"]
For getting fast access to a scalar (equivalent to the prior method)
df.at[dates[0], "A"]
Selection by position
Select via the position of the passed integers
df.iloc[3]
Integer slices acts similar to NumPy/Python:
df.iloc[3:5, 0:2]
Lists of integer position locations:
df.iloc[[1, 2, 4], [0, 2]]
For slicing rows explicitly:
df.iloc[1:3, :]
For slicing columns explicitly:
df.iloc[:, 1:3]
For getting a value explicitly:
df.iloc[1, 1]
For getting fast access to a scalar (equivalent to the prior method):
df.iat[1, 1]
Boolean indexing
Select rows where df.A
is greater than 0
df[df["A"] > 0]
Using isin()
method for filtering:
df2[df2["E"].isin(["two", "four"])]
update
Assigning new columns in method chains
iris.assign(sepal_ratio=iris["SepalWidth"] / iris["SepalLength"])
Concat
Concatenating pandas objects together row-wise with concat()
pieces = [df[:3], df[3:7], df[7:]]
pd.concat(pieces)
Adding a column to a DataFrame is relatively fast. However, adding a row requires a copy, and may be expensive. We recommend passing a pre-built list of records to the DataFrame constructor instead of building a DataFrame by iteratively appending records to it
drop
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html
Drop columns
df.drop(['B', 'C'], axis=1)
Drop a row by index
df.drop([0, 1])
dropna
DataFrame.dropna()
drops any rows that have missing data:
df1.dropna(how="any")
reindex
Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data:
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
Setting
df["F"] = s1
Setting by assigning with a NumPy array
:
df.loc[:, "D"] = np.array([5] * len(df))
sort
df.sort_index(axis=1, ascending=False)
df.sort_values(by="B")