Python数据分析

- PolarsBook中文版: https://www.pythondataanalysis.com/docs/polars_book_cn/
- Polars快速入门: https://www.pythondataanalysis.com/docs/polars_book_cn/quickstart/
- Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/
- Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/expressions/
- Polars上下文: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/contexts/
- Polars分组: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/groupby/
- Polars折叠: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/folds/
- Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/custom_functions/
- Polars实例: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/introduction_polars/
- Polars表达式方法: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/api/
- Polars视频介绍: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/video_intro/
- Polars与Numpy交互: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/numpy/
- Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/window_functions/
- Polars索引: https://www.pythondataanalysis.com/docs/polars_book_cn/indexing/
- Polars数据类型: https://www.pythondataanalysis.com/docs/polars_book_cn/datatypes/
- 来自Pandas: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_pandas/
- 来自ApacheSpark: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_spark/
- Polars性能: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/
- 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/strings/
- Polars优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/
- Polars惰性方法: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/
- 谓词下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/predicate-pushdown/
- 投影下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/projection-pushdown/
- 其它优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/other-optimizations/
- Polars参考指南: https://www.pythondataanalysis.com/docs/polars_book_cn/references/
- Polars时间序列: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/
- Polars时间序列实例: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/time-series/
- Polars使用范围: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/
- IO: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/
- Polars操作CSV文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/csv/
- Polars操作Parquet文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/parquet/
- Polars处理多个文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/multiple_files/
- Polars读取数据库: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/read_db/
- Polars与AWS交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/aws/
- Polars与Google BigQuery交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/google-big-query/
- Polars与Postgres交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/postgres/
- 互通性: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/
- Arrow: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/arrow/
- Numpy: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/numpy/
- 数据: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/
- 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/strings/
- 时间戳: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/timestamps/
- 数据帧: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/
- 选中行或列: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/row_col_selection/
- 常用操作: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/common-manipulations/
- 聚合: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/aggregate/
- 分组: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/groupby/
- 过滤: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/filter/
- 连接: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/join/
- 重塑: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/melt/
- 条件应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/conditionally-apply/
- 排序: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/sorting/
- 透视: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/pivot/
- 应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/
- Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/udfs/
- Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/window-functions/
- Python数据分析 第二版: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/
- 第 1 章 准备工作: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-01/
- 第 2 章 Python 语法基础: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-02/
- 第 3 章 Python 的数据结构、函数和文件: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-03/
- 第 4 章 NumPy 基础：数组和向量计算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-04/
- 第 5 章 Pandas 入门: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-05/
- 第 6 章 数据加载、存储与文件格式: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-06/
- 第 7 章 数据清洗和准备: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-07/
- 第 10 章 数据聚合与分组运算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-10/
- 第 11 章 时间序列: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-11/
- 第 12 章 pandas 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-12/
- 第 13 章 Python 建模库介绍: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-13/
-  第 14 章 数据分析案例: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-14/
-  附录 A NumPy 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-A/
-  附录 B 更多关于 IPython 的内容: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-B/
- 第 8 章 数据规整：聚合、合并和重塑: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-08/
- 第 9 章 绘图和可视化: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-09/
- Polars用户指南: https://www.pythondataanalysis.com/docs/Polars_user_guide/
- Polars入门: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_getting_started/
- 安装Polars: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_installation/
- Polars核心概念: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/
- Polars数据类型和结构: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/data-types-and-structures/
- Polars表达式和上下文: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/expressions-and-contexts/
- Polars延迟API: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/lazy-api/
- Streaming: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/_streaming/
- Polars表达式: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/
- Polars基本操作: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/basic-operations/
- Aggregation: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/aggregation/
- Casting: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/casting/
- Categorical Data and Enums: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/categorical-data-and-enums/
- Expression Expansion: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/expression-expansion/
- Folds: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/folds/
- Lists and Arrays: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/lists-and-arrays/
- Missing Data: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/missing-data/
- Numpy Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/numpy-functions/
- Strings: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/strings/
- Structs: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/structs/
- User Defined Python Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/user-defined-python-functions/
- Window Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/window-functions/
- Reference: https://www.pythondataanalysis.com/docs/Polars_user_guide/api/reference/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/contributing/
- Versioning: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/versioning/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars-cloud/
- Ecosystem: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/ecosystem/
- Gpu Support: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/gpu-support/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/io/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/lazy/
- Pandas: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/pandas/
- Spark: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/spark/
- Arrow: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/arrow/
- Comparison: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/comparison/
- Multiprocessing: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/multiprocessing/
- Polars Llms: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/polars_llms/
- Styling: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/styling/
- Visualization: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/visualization/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/plugins/
- Create: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/create/
- Cte: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/cte/
- Intro: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/intro/
- Select: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/select/
- Show: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/show/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/transformations/

# Coming from Pandas

Here we set out the key points that anyone who has experience with pandas and wants to try Polars
should know. We include both differences in the concepts the libraries are built on and differences
in how you should write Polars code compared to pandas code.

## Differences in concepts between Polars and pandas

### Polars does not have a multi-index/index

pandas gives a label to each row with an index. Polars does not use an index and each row is indexed
by its integer position in the table.

Polars aims to have predictable results and readable queries, as such we think an index does not
help us reach that objective. We believe the semantics of a query should not change by the state of
an index or a `reset_index` call.

In Polars a DataFrame will always be a 2D table with heterogeneous data-types. The data-types may
have nesting, but the table itself will not. Operations like resampling will be done by specialized
functions or methods that act like 'verbs' on a table explicitly stating the columns that that
'verb' operates on. As such, it is our conviction that not having indices make things simpler, more
explicit, more readable and less error-prone.

Note that an 'index' data structure as known in databases will be used by Polars as an optimization
technique.

### Polars adheres to the Apache Arrow memory format to represent data in memory while pandas uses NumPy arrays

Polars represents data in memory according to the Arrow memory spec while pandas by default
represents data in memory with NumPy arrays. Apache Arrow is an emerging standard for in-memory
columnar analytics that can accelerate data load times, reduce memory usage and accelerate
calculations.

Polars can convert data to NumPy format with the `to_numpy` method.

### Polars has more support for parallel operations than pandas

Polars exploits the strong support for concurrency in Rust to run many operations in parallel. While
some operations in pandas are multi-threaded the core of the library is single-threaded and an
additional library such as `Dask` must be used to parallelize operations. Polars is faster than all
open source solutions that parallelize pandas code.

### Polars has support for different engines

Polars has native support for an engine optimized for in-memory processing and a streaming engine
optimized for large scale data processing. Furthermore Polars has native integration with a CuDF
supported engine. All these engines benefit from Polars' query optimizer and Polars ensures semantic
correctness between all those engines. In pandas the implementation can dispatch between numpy and
Pyarrow, but because of pandas' loose strictness guarantees, the data-type outputs and semantics
between those backends can differ. This can lead to subtle bugs.

### Polars can lazily evaluate queries and apply query optimization

Eager evaluation is when code is evaluated as soon as you run the code. Lazy evaluation is when
running a line of code means that the underlying logic is added to a query plan rather than being
evaluated.

Polars supports eager evaluation and lazy evaluation whereas pandas only supports eager evaluation.
The lazy evaluation mode is powerful because Polars carries out automatic query optimization when it
examines the query plan and looks for ways to accelerate the query or reduce memory usage.

`Dask` also supports lazy evaluation when it generates a query plan.

### Polars is strict

Polars is strict about data types. Data type resolution in Polars is dependent on the operation
graph, whereas pandas converts types loosely (e.g. new missing data can lead to integer columns
being converted to floats). This strictness leads to fewer bugs and more predictable behavior.

### Polars has a more verstatile API

Polars is built on expressions and allows expression inputs in almost all operations. This means
that when you understand how expressions work, your knowledge in Polars extrapolates. Pandas doesn't
have an expression system and often requires Python `lambda`s to express the complexity you want.
Polars sees the requirement of a Python `lambda` as a lack of expressiveness of its API, and tries
to give you native support whenever possible.

## Key syntax differences

Users coming from pandas generally need to know one thing...

```
polars != pandas
```

If your Polars code looks like it could be pandas code, it might run, but it likely runs slower than
it should.

Let's go through some typical pandas code and see how we might rewrite it in Polars.

### Selecting data

As there is no index in Polars there is no `.loc` or `iloc` method in Polars - and there is also no
`SettingWithCopyWarning` in Polars.

However, the best way to select data in Polars is to use the expression API. For example, if you
want to select a column in pandas, you can do one of the following:

```python
df["a"]
df.loc[:,"a"]
```

but in Polars you would use the `.select` method:

```python
df.select("a")
```

If you want to select rows based on the values then in Polars you use the `.filter` method:

```python
df.filter(pl.col("a") < 10)
```

As noted in the section on expressions below, Polars can run operations in `.select` and `filter` in
parallel and Polars can carry out query optimization on the full set of data selection criteria.

### Be lazy

Working in lazy evaluation mode is straightforward and should be your default in Polars as the lazy
mode allows Polars to do query optimization.

We can run in lazy mode by either using an implicitly lazy function (such as `scan_csv`) or
explicitly using the `lazy` method.

Take the following simple example where we read a CSV file from disk and do a group by. The CSV file
has numerous columns but we just want to do a group by on one of the id columns (`id1`) and then sum
by a value column (`v1`). In pandas this would be:

```python
df = pd.read_csv(csv_file, usecols=["id1","v1"])
grouped_df = df.loc[:,["id1","v1"]].groupby("id1").sum("v1")
```

In Polars you can build this query in lazy mode with query optimization and evaluate it by replacing
the eager pandas function `read_csv` with the implicitly lazy Polars function `scan_csv`:

```python
df = pl.scan_csv(csv_file)
grouped_df = df.group_by("id1").agg(pl.col("v1").sum()).collect()
```

Polars optimizes this query by identifying that only the `id1` and `v1` columns are relevant and so
will only read these columns from the CSV. By calling the `.collect` method at the end of the second
line we instruct Polars to eagerly evaluate the query.

If you do want to run this query in eager mode you can just replace `scan_csv` with `read_csv` in
the Polars code.

Read more about working with lazy evaluation in the [lazy API](../lazy/using.md) section.

### Express yourself

A typical pandas script consists of multiple data transformations that are executed sequentially.
However, in Polars these transformations can be executed in parallel using expressions.

#### Column assignment

We have a dataframe `df` with a column called `value`. We want to add two new columns, a column
called `tenXValue` where the `value` column is multiplied by 10 and a column called `hundredXValue`
where the `value` column is multiplied by 100.

In pandas this would be:

```python
df.assign(
    tenXValue=lambda df_: df_.value * 10,
    hundredXValue=lambda df_: df_.value * 100
)
```

These column assignments are executed sequentially.

In Polars we add columns to `df` using the `.with_columns` method:

```python
df.with_columns(
    tenXValue=pl.col("value") * 10,
    hundredXValue=pl.col("value") * 100,
)
```

These column assignments are executed in parallel.

#### Column assignment based on predicate

In this case we have a dataframe `df` with columns `a`,`b` and `c`. We want to re-assign the values
in column `a` based on a condition. When the value in column `c` is equal to 2 then we replace the
value in `a` with the value in `b`.

In pandas this would be:

```python
df.assign(a=lambda df_: df_["a"].mask(df_["c"] == 2, df_["b"]))
```

while in Polars this would be:

```python
df.with_columns(
    pl.when(pl.col("c") == 2)
    .then(pl.col("b"))
    .otherwise(pl.col("a")).alias("a")
)
```

Polars can compute every branch of an `if -> then -> otherwise` in parallel. This is valuable, when
the branches get more expensive to compute.

#### Filtering

We want to filter the dataframe `df` with housing data based on some criteria.

In pandas you filter the dataframe by passing Boolean expressions to the `query` method:

```python
df.query("m2_living > 2500 and price < 300000")
```

or by directly evaluating a mask:

```python
df[(df["m2_living"] > 2500) & (df["price"] < 300000)]
```

while in Polars you call the `filter` method:

```python
df.filter(
    (pl.col("m2_living") > 2500) & (pl.col("price") < 300000)
)
```

The query optimizer in Polars can also detect if you write multiple filters separately and combine
them into a single filter in the optimized plan.

## pandas transform

The pandas documentation demonstrates an operation on a group by called `transform`. In this case we
have a dataframe `df` and we want a new column showing the number of rows in each group.

In pandas we have:

```python
df = pd.DataFrame({
    "c": [1, 1, 1, 2, 2, 2, 2],
    "type": ["m", "n", "o", "m", "m", "n", "n"],
})

df["size"] = df.groupby("c")["type"].transform(len)
```

Here pandas does a group by on `"c"`, takes column `"type"`, computes the group length and then
joins the result back to the original `DataFrame` producing:

```
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4
```

In Polars the same can be achieved with `window` functions:

```python
df.with_columns(
    pl.col("type").count().over("c").alias("size")
)
```

```
shape: (7, 3)
┌─────┬──────┬──────┐
│ c   ┆ type ┆ size │
│ --- ┆ ---  ┆ ---  │
│ i64 ┆ str  ┆ u32  │
╞═════╪══════╪══════╡
│ 1   ┆ m    ┆ 3    │
│ 1   ┆ n    ┆ 3    │
│ 1   ┆ o    ┆ 3    │
│ 2   ┆ m    ┆ 4    │
│ 2   ┆ m    ┆ 4    │
│ 2   ┆ n    ┆ 4    │
│ 2   ┆ n    ┆ 4    │
└─────┴──────┴──────┘
```

Because we can store the whole operation in a single expression, we can combine several `window`
functions and even combine different groups!

Polars will cache window expressions that are applied over the same group, so storing them in a
single `with_columns` is both convenient **and** optimal. In the following example we look at a case
where we are calculating group statistics over `"c"` twice:

```python
df.with_columns(
    pl.col("c").count().over("c").alias("size"),
    pl.col("c").sum().over("type").alias("sum"),
    pl.col("type").reverse().over("c").alias("reverse_type")
)
```

```
shape: (7, 5)
┌─────┬──────┬──────┬─────┬──────────────┐
│ c   ┆ type ┆ size ┆ sum ┆ reverse_type │
│ --- ┆ ---  ┆ ---  ┆ --- ┆ ---          │
│ i64 ┆ str  ┆ u32  ┆ i64 ┆ str          │
╞═════╪══════╪══════╪═════╪══════════════╡
│ 1   ┆ m    ┆ 3    ┆ 5   ┆ o            │
│ 1   ┆ n    ┆ 3    ┆ 5   ┆ n            │
│ 1   ┆ o    ┆ 3    ┆ 1   ┆ m            │
│ 2   ┆ m    ┆ 4    ┆ 5   ┆ n            │
│ 2   ┆ m    ┆ 4    ┆ 5   ┆ n            │
│ 2   ┆ n    ┆ 4    ┆ 5   ┆ m            │
│ 2   ┆ n    ┆ 4    ┆ 5   ┆ m            │
└─────┴──────┴──────┴─────┴──────────────┘
```

## Missing data

pandas uses `NaN` and/or `None` values to indicate missing values depending on the dtype of the
column. In addition the behaviour in pandas varies depending on whether the default dtypes or
optional nullable arrays are used. In Polars missing data corresponds to a `null` value for all data
types.

For float columns Polars permits the use of `NaN` values. These `NaN` values are not considered to
be missing data but instead a special floating point value.

In pandas an integer column with missing values is cast to be a float column with `NaN` values for
the missing values (unless using optional nullable integer dtypes). In Polars any missing values in
an integer column are simply `null` values and the column remains an integer column.

See the [missing data](../expressions/missing-data.md) section for more details.

## Pipe littering

A common usage in pandas is utilizing `pipe` to apply some function to a `DataFrame`. Copying this
coding style to Polars is unidiomatic and leads to suboptimal query plans.

The snippet below shows a common pattern in pandas.

```python
def add_foo(df: pd.DataFrame) -> pd.DataFrame:
    df["foo"] = ...
    return df

def add_bar(df: pd.DataFrame) -> pd.DataFrame:
    df["bar"] = ...
    return df


def add_ham(df: pd.DataFrame) -> pd.DataFrame:
    df["ham"] = ...
    return df

(df
 .pipe(add_foo)
 .pipe(add_bar)
 .pipe(add_ham)
)
```

If we do this in polars, we would create 3 `with_columns` contexts, that forces Polars to run the 3
pipes sequentially, utilizing zero parallelism.

The way to get similar abstractions in polars is creating functions that create expressions. The
snippet below creates 3 expressions that run on a single context and thus are allowed to run in
parallel.

```python
def get_foo(input_column: str) -> pl.Expr:
    return pl.col(input_column).some_computation().alias("foo")

def get_bar(input_column: str) -> pl.Expr:
    return pl.col(input_column).some_computation().alias("bar")

def get_ham(input_column: str) -> pl.Expr:
    return pl.col(input_column).some_computation().alias("ham")

# This single context will run all 3 expressions in parallel
df.with_columns(
    get_ham("col_a"),
    get_bar("col_b"),
    get_foo("col_c"),
)
```

If you need the schema in the functions that generate the expressions, you can utilize a single
`pipe`:

```python
from collections import OrderedDict

def get_foo(input_column: str, schema: OrderedDict) -> pl.Expr:
    if "some_col" in schema:
        # branch_a
        ...
    else:
        # branch b
        ...

def get_bar(input_column: str, schema: OrderedDict) -> pl.Expr:
    if "some_col" in schema:
        # branch_a
        ...
    else:
        # branch b
        ...

def get_ham(input_column: str) -> pl.Expr:
    return pl.col(input_column).some_computation().alias("ham")

# Use pipe (just once) to get hold of the schema of the LazyFrame.
lf.pipe(lambda lf: lf.with_columns(
    get_ham("col_a"),
    get_bar("col_b", lf.schema),
    get_foo("col_c", lf.schema),
)
```

Another benefit of writing functions that return expressions, is that these functions are composable
as expressions can be chained and partially applied, leading to much more flexibility in the design.