Python数据分析 - PolarsBook中文版: https://www.pythondataanalysis.com/docs/polars_book_cn/ - Polars快速入门: https://www.pythondataanalysis.com/docs/polars_book_cn/quickstart/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/expressions/ - Polars上下文: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/contexts/ - Polars分组: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/groupby/ - Polars折叠: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/folds/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/custom_functions/ - Polars实例: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/introduction_polars/ - Polars表达式方法: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/api/ - Polars视频介绍: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/video_intro/ - Polars与Numpy交互: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/numpy/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/window_functions/ - Polars索引: https://www.pythondataanalysis.com/docs/polars_book_cn/indexing/ - Polars数据类型: https://www.pythondataanalysis.com/docs/polars_book_cn/datatypes/ - 来自Pandas: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_pandas/ - 来自ApacheSpark: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_spark/ - Polars性能: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/strings/ - Polars优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/ - Polars惰性方法: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/ - 谓词下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/predicate-pushdown/ - 投影下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/projection-pushdown/ - 其它优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/other-optimizations/ - Polars参考指南: https://www.pythondataanalysis.com/docs/polars_book_cn/references/ - Polars时间序列: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/ - Polars时间序列实例: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/time-series/ - Polars使用范围: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/ - IO: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/ - Polars操作CSV文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/csv/ - Polars操作Parquet文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/parquet/ - Polars处理多个文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/multiple_files/ - Polars读取数据库: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/read_db/ - Polars与AWS交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/aws/ - Polars与Google BigQuery交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/google-big-query/ - Polars与Postgres交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/postgres/ - 互通性: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/ - Arrow: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/arrow/ - Numpy: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/numpy/ - 数据: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/strings/ - 时间戳: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/timestamps/ - 数据帧: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/ - 选中行或列: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/row_col_selection/ - 常用操作: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/common-manipulations/ - 聚合: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/aggregate/ - 分组: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/groupby/ - 过滤: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/filter/ - 连接: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/join/ - 重塑: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/melt/ - 条件应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/conditionally-apply/ - 排序: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/sorting/ - 透视: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/pivot/ - 应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/udfs/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/window-functions/ - Python数据分析 第二版: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/ - 第 1 章 准备工作: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-01/ - 第 2 章 Python 语法基础: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-02/ - 第 3 章 Python 的数据结构、函数和文件: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-03/ - 第 4 章 NumPy 基础:数组和向量计算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-04/ - 第 5 章 Pandas 入门: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-05/ - 第 6 章 数据加载、存储与文件格式: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-06/ - 第 7 章 数据清洗和准备: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-07/ - 第 10 章 数据聚合与分组运算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-10/ - 第 11 章 时间序列: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-11/ - 第 12 章 pandas 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-12/ - 第 13 章 Python 建模库介绍: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-13/ - 第 14 章 数据分析案例: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-14/ - 附录 A NumPy 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-A/ - 附录 B 更多关于 IPython 的内容: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-B/ - 第 8 章 数据规整:聚合、合并和重塑: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-08/ - 第 9 章 绘图和可视化: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-09/ - Polars用户指南: https://www.pythondataanalysis.com/docs/Polars_user_guide/ - Polars入门: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_getting_started/ - 安装Polars: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_installation/ - Polars核心概念: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/ - Polars数据类型和结构: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/data-types-and-structures/ - Polars表达式和上下文: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/expressions-and-contexts/ - Polars延迟API: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/lazy-api/ - Streaming: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/_streaming/ - Polars表达式: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/ - Polars基本操作: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/basic-operations/ - Aggregation: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/aggregation/ - Casting: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/casting/ - Categorical Data and Enums: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/categorical-data-and-enums/ - Expression Expansion: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/expression-expansion/ - Folds: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/folds/ - Lists and Arrays: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/lists-and-arrays/ - Missing Data: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/missing-data/ - Numpy Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/numpy-functions/ - Strings: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/strings/ - Structs: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/structs/ - User Defined Python Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/user-defined-python-functions/ - Window Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/window-functions/ - Reference: https://www.pythondataanalysis.com/docs/Polars_user_guide/api/reference/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/contributing/ - Versioning: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/versioning/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars-cloud/ - Ecosystem: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/ecosystem/ - Gpu Support: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/gpu-support/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/io/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/lazy/ - Pandas: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/pandas/ - Spark: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/spark/ - Arrow: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/arrow/ - Comparison: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/comparison/ - Multiprocessing: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/multiprocessing/ - Polars Llms: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/polars_llms/ - Styling: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/styling/ - Visualization: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/visualization/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/plugins/ - Create: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/create/ - Cte: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/cte/ - Intro: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/intro/ - Select: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/select/ - Show: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/show/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/transformations/ # Polars分组 ## 多线程 处理表状数据最高效的方式就是通过“分割-处理-组合”的方式并行地进行。这样的操作正是 `Polars` 的 分组操作的核心,也是 `Polars` 如此高效的秘密。特别指出,分割和处理都是多线程执行的。 下面的例子展示了分组操作的流程: ![](/split-apply-combine.svg) 对于分割阶段的哈希操作,`Polars` 使用了无锁多线程方式,如下图所示: ![](/lock-free-hash.svg) 这样的并行操作可以让分组和联合操作非常非常高效。 > 更多解释参考 [这篇博客](https://www.ritchievink.com/blog/2021/02/28/i-wrote-one-of-the-fastest-dataframe-libraries/) ## 不要“杀死”并行 众所周知,`Python` 慢、水平拓展不好。除了因为是解释型语言,Python 还收到全局解释器锁,GIL。 这就意味着,如果你传入一个 `lambda` 或者 `Python` 自定义函数,`Polars` 速度会被限制,即 无法使用多核进行并行计算。 这是个很糟糕的情况,特别我们在做 `.groupby` 的时候会经常传入 `lambda` 函数。虽然 `Polars` 支持这种操作,但是请注意 Python 的限制,特别是解释器和GIL。 为了解决这个问题,`Polars` 实现了一种非常强大的语法,在其延迟执行API和即时执行API上都有定义。 ## Polars Expressions 刚才我们提到自定义 Python 函数会损伤并行能力,`Polars` 提供了惰性 API 来应对这种情况。接下来 我们看看这是什么意思。 我们可以从这个数据集开始:[US congress dataset](https://github.com/unitedstates/congress-legislators). ```python import polars as pl from .dataset import dataset q = ( dataset.lazy() .groupby("first_name") .agg( [ pl.count(), pl.col("gender"), pl.first("last_name"), ] ) .sort("count", descending=True) .limit(5) ) df = q.collect() ``` #### 基本聚合操作 你可以轻松地把多个聚合表达式放在一个 `list` 里面,并没有数量限制,你可以任意组合你放入任何数量的表达式。 下面这段代码中我们做如下聚合操作: 对于每一个 `first_name` 分组: - 统计每组的行数: - 短版:`pl.count("party")` - 长版:`pl.col("party").count()` - 把每组的性别放入一个列表: - 长版: `pl.col("gender").list()` - 找到每组的第一个 `last_name`: - 短版: `pl.first("last_name")` - 长版: `pl.col("last_name").first()` 除了聚合,我们还立即对结果进行排序,并取其中前5条记录,这样我们能更好地从宏观角度理解这组数据的特征。 ```python import polars as pl from .dataset import dataset q = ( dataset.lazy() .groupby("first_name") .agg( [ pl.count(), pl.col("gender"), pl.first("last_name"), ] ) .sort("count", descending=True) .limit(5) ) df = q.collect() ``` ```text shape: (5, 4) ┌────────────┬───────┬───────────────────┬───────────┐ │ first_name ┆ count ┆ gender ┆ last_name │ │ --- ┆ --- ┆ --- ┆ --- │ │ cat ┆ u32 ┆ list[cat] ┆ str │ ╞════════════╪═══════╪═══════════════════╪═══════════╡ │ John ┆ 1256 ┆ ["M", "M", … "M"] ┆ Walker │ │ William ┆ 1022 ┆ ["M", "M", … "M"] ┆ Few │ │ James ┆ 714 ┆ ["M", "M", … "M"] ┆ Armstrong │ │ Thomas ┆ 454 ┆ ["M", "M", … "M"] ┆ Tucker │ │ Charles ┆ 439 ┆ ["M", "M", … "M"] ┆ Carroll │ └────────────┴───────┴───────────────────┴───────────┘ ``` #### 条件 简单吧!我们加点料!假设我们想要知道对于每个 `state` 有多少 `Pro` 和 `Anti`。我们可以 不用 `lambda` 而直接查询。 ```python import polars as pl from .dataset import dataset q = ( dataset.lazy() .groupby("state") .agg( [ (pl.col("party") == "Anti-Administration").sum().alias("anti"), (pl.col("party") == "Pro-Administration").sum().alias("pro"), ] ) .sort("pro", descending=True) .limit(5) ) df = q.collect() ``` ```text shape: (5, 3) ┌───────┬──────┬─────┐ │ state ┆ anti ┆ pro │ │ --- ┆ --- ┆ --- │ │ cat ┆ u32 ┆ u32 │ ╞═══════╪══════╪═════╡ │ NJ ┆ 0 ┆ 3 │ │ CT ┆ 0 ┆ 3 │ │ NC ┆ 1 ┆ 2 │ │ VA ┆ 3 ┆ 1 │ │ MA ┆ 0 ┆ 1 │ └───────┴──────┴─────┘ ``` 类似的,我们可以通过多层聚合实现,但是这不利于我显摆这些很酷的特征😉! ```python import polars as pl from .dataset import dataset q = ( dataset.lazy() .groupby(["state", "party"]) .agg([pl.count("party").alias("count")]) .filter((pl.col("party") == "Anti-Administration") | (pl.col("party") == "Pro-Administration")) .sort("count", descending=True) .limit(5) ) df = q.collect() ``` ```text shape: (5, 3) ┌───────┬─────────────────────┬───────┐ │ state ┆ party ┆ count │ │ --- ┆ --- ┆ --- │ │ cat ┆ cat ┆ u32 │ ╞═══════╪═════════════════════╪═══════╡ │ VA ┆ Anti-Administration ┆ 3 │ │ CT ┆ Pro-Administration ┆ 3 │ │ NJ ┆ Pro-Administration ┆ 3 │ │ NC ┆ Pro-Administration ┆ 2 │ │ VA ┆ Pro-Administration ┆ 1 │ └───────┴─────────────────────┴───────┘ ``` #### 过滤 我们也可以过滤分组。假设我们想要计算每组的均值,但是我们不希望计算所有值的均值,我们也不希望直接 从 `DataFrame` 过滤,因为我们后需还需要那些行做其他操作。 下面的例子说明我们是如何做到的。注意,我们可以写明 `Python` 的自定义函数,这些函数没有什么 运行时开销。因为这些函数返回了 `Polars` 表达式,我们并没在运行时让 `Series` 调用自动函数。 ```python from datetime import date import polars as pl from .dataset import dataset def compute_age() -> pl.Expr: return date(2021, 1, 1).year - pl.col("birthday").dt.year() def avg_birthday(gender: str) -> pl.Expr: return compute_age().filter(pl.col("gender") == gender).mean().alias(f"avg {gender} birthday") q = ( dataset.lazy() .groupby(["state"]) .agg( [ avg_birthday("M"), avg_birthday("F"), (pl.col("gender") == "M").sum().alias("# male"), (pl.col("gender") == "F").sum().alias("# female"), ] ) .limit(5) ) df = q.collect() ``` ```text shape: (5, 5) ┌───────┬────────────────┬────────────────┬────────┬──────────┐ │ state ┆ avg M birthday ┆ avg F birthday ┆ # male ┆ # female │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ cat ┆ f64 ┆ f64 ┆ u32 ┆ u32 │ ╞═══════╪════════════════╪════════════════╪════════╪══════════╡ │ WI ┆ 152.939698 ┆ null ┆ 199 ┆ 0 │ │ LA ┆ 157.195531 ┆ 97.8 ┆ 194 ┆ 5 │ │ OH ┆ 171.836735 ┆ 79.444444 ┆ 672 ┆ 9 │ │ MO ┆ 163.741433 ┆ 81.625 ┆ 329 ┆ 8 │ │ PA ┆ 179.724846 ┆ 91.857143 ┆ 1050 ┆ 7 │ └───────┴────────────────┴────────────────┴────────┴──────────┘ ``` #### 排序 我们经常把一个 `DataFrame` 排序为了在分组操作的时候保持某种顺序。假设我们我们希望知道 每个 `state` 政治家的名字,并按照年龄排序。我们可以用 `sort` 和 `groupby`: ```python import polars as pl from .dataset import dataset def get_person() -> pl.Expr: return pl.col("first_name") + pl.lit(" ") + pl.col("last_name") q = ( dataset.lazy() .sort("birthday", descending=True) .groupby(["state"]) .agg( [ get_person().first().alias("youngest"), get_person().last().alias("oldest"), ] ) .limit(5) ) df = q.collect() ``` ```text shape: (5, 3) ┌───────┬──────────────────┬─────────────────┐ │ state ┆ youngest ┆ oldest │ │ --- ┆ --- ┆ --- │ │ cat ┆ str ┆ str │ ╞═══════╪══════════════════╪═════════════════╡ │ VT ┆ Benjamin Deming ┆ Moses Robinson │ │ MT ┆ Greg Gianforte ┆ James Cavanaugh │ │ MN ┆ Erik Paulsen ┆ Cyrus Aldrich │ │ AS ┆ Eni Faleomavaega ┆ Fofó Sunia │ │ NC ┆ James McKay ┆ Samuel Johnston │ └───────┴──────────────────┴─────────────────┘ ``` 但是,**如果**我们想把名字也按照字母排序,上面的代码就不行了。 幸运的是,我们可以在 `groupby` 上下文中进行排序,与 `DataFrame` 无关。 ```python import polars as pl from .dataset import dataset def get_person() -> pl.Expr: return pl.col("first_name") + pl.lit(" ") + pl.col("last_name") q = ( dataset.lazy() .sort("birthday", descending=True) .groupby(["state"]) .agg( [ get_person().first().alias("youngest"), get_person().last().alias("oldest"), get_person().sort().first().alias("alphabetical_first"), ] ) .limit(5) ) df = q.collect() ``` ```text shape: (5, 4) ┌───────┬─────────────────────┬─────────────────┬────────────────────┐ │ state ┆ youngest ┆ oldest ┆ alphabetical_first │ │ --- ┆ --- ┆ --- ┆ --- │ │ cat ┆ str ┆ str ┆ str │ ╞═══════╪═════════════════════╪═════════════════╪════════════════════╡ │ OH ┆ Amos Townsend ┆ Paul Fearing ┆ Aaron Harlan │ │ KY ┆ Benjamin Grey ┆ Matthew Lyon ┆ Aaron Harding │ │ HI ┆ Tulsi Gabbard ┆ Robert Wilcox ┆ Cecil Heftel │ │ LA ┆ John Slidell ┆ Thomas Posey ┆ Adolph Meyer │ │ PR ┆ Aníbal Acevedo-Vilá ┆ Tulio Larrinaga ┆ Antonio Colorado │ └───────┴─────────────────────┴─────────────────┴────────────────────┘ ``` 我们甚至可以在 `groupby` 上下文中增加另一个列,并且按照男女排序: `pl.col("gender").sort_by("first_name").first().alias("gender")` ```python import polars as pl from .dataset import dataset def get_person() -> pl.Expr: return pl.col("first_name") + pl.lit(" ") + pl.col("last_name") q = ( dataset.lazy() .sort("birthday", descending=True) .groupby(["state"]) .agg( [ get_person().first().alias("youngest"), get_person().last().alias("oldest"), get_person().sort().first().alias("alphabetical_first"), pl.col("gender").sort_by("first_name").first().alias("gender"), ] ) .sort("state") .limit(5) ) df = q.collect() ``` ```text shape: (5, 5) ┌───────┬────────────────┬─────────────────┬────────────────────┬────────┐ │ state ┆ youngest ┆ oldest ┆ alphabetical_first ┆ gender │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ cat ┆ str ┆ str ┆ str ┆ cat │ ╞═══════╪════════════════╪═════════════════╪════════════════════╪════════╡ │ CT ┆ Samuel Simons ┆ Roger Sherman ┆ Abner Sibal ┆ M │ │ KY ┆ Benjamin Grey ┆ Matthew Lyon ┆ Aaron Harding ┆ M │ │ FL ┆ George Hawkins ┆ Joseph White ┆ Abijah Gilbert ┆ M │ │ NY ┆ Robert Baker ┆ Philip Schuyler ┆ A. Foster ┆ M │ │ MI ┆ Samuel Clark ┆ Gabriel Richard ┆ Aaron Bliss ┆ M │ └───────┴────────────────┴─────────────────┴────────────────────┴────────┘ ``` ### 结论 上面的例子中我们知道通过组合表达式可以完成复杂的查询。而且,我们避免了使用自定义 `Python` 函数 带来的性能损失 (解释器和 GIL)。 如果这里少了哪类表达式,清在这里开一个 Issue: [feature request](https://github.com/pola-rs/polars/issues/new/choose)!