Python数据分析 - PolarsBook中文版: https://www.pythondataanalysis.com/docs/polars_book_cn/ - Polars快速入门: https://www.pythondataanalysis.com/docs/polars_book_cn/quickstart/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/expressions/ - Polars上下文: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/contexts/ - Polars分组: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/groupby/ - Polars折叠: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/folds/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/custom_functions/ - Polars实例: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/introduction_polars/ - Polars表达式方法: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/api/ - Polars视频介绍: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/video_intro/ - Polars与Numpy交互: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/numpy/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/window_functions/ - Polars索引: https://www.pythondataanalysis.com/docs/polars_book_cn/indexing/ - Polars数据类型: https://www.pythondataanalysis.com/docs/polars_book_cn/datatypes/ - 来自Pandas: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_pandas/ - 来自ApacheSpark: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_spark/ - Polars性能: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/strings/ - Polars优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/ - Polars惰性方法: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/ - 谓词下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/predicate-pushdown/ - 投影下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/projection-pushdown/ - 其它优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/other-optimizations/ - Polars参考指南: https://www.pythondataanalysis.com/docs/polars_book_cn/references/ - Polars时间序列: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/ - Polars时间序列实例: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/time-series/ - Polars使用范围: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/ - IO: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/ - Polars操作CSV文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/csv/ - Polars操作Parquet文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/parquet/ - Polars处理多个文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/multiple_files/ - Polars读取数据库: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/read_db/ - Polars与AWS交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/aws/ - Polars与Google BigQuery交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/google-big-query/ - Polars与Postgres交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/postgres/ - 互通性: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/ - Arrow: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/arrow/ - Numpy: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/numpy/ - 数据: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/strings/ - 时间戳: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/timestamps/ - 数据帧: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/ - 选中行或列: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/row_col_selection/ - 常用操作: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/common-manipulations/ - 聚合: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/aggregate/ - 分组: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/groupby/ - 过滤: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/filter/ - 连接: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/join/ - 重塑: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/melt/ - 条件应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/conditionally-apply/ - 排序: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/sorting/ - 透视: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/pivot/ - 应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/udfs/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/window-functions/ - Python数据分析 第二版: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/ - 第 1 章 准备工作: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-01/ - 第 2 章 Python 语法基础: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-02/ - 第 3 章 Python 的数据结构、函数和文件: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-03/ - 第 4 章 NumPy 基础:数组和向量计算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-04/ - 第 5 章 Pandas 入门: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-05/ - 第 6 章 数据加载、存储与文件格式: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-06/ - 第 7 章 数据清洗和准备: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-07/ - 第 10 章 数据聚合与分组运算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-10/ - 第 11 章 时间序列: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-11/ - 第 12 章 pandas 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-12/ - 第 13 章 Python 建模库介绍: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-13/ - 第 14 章 数据分析案例: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-14/ - 附录 A NumPy 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-A/ - 附录 B 更多关于 IPython 的内容: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-B/ - 第 8 章 数据规整:聚合、合并和重塑: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-08/ - 第 9 章 绘图和可视化: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-09/ - Polars用户指南: https://www.pythondataanalysis.com/docs/Polars_user_guide/ - Polars入门: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_getting_started/ - 安装Polars: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_installation/ - Polars核心概念: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/ - Polars数据类型和结构: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/data-types-and-structures/ - Polars表达式和上下文: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/expressions-and-contexts/ - Polars延迟API: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/lazy-api/ - Streaming: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/_streaming/ - Polars表达式: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/ - Polars基本操作: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/basic-operations/ - Aggregation: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/aggregation/ - Casting: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/casting/ - Categorical Data and Enums: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/categorical-data-and-enums/ - Expression Expansion: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/expression-expansion/ - Folds: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/folds/ - Lists and Arrays: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/lists-and-arrays/ - Missing Data: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/missing-data/ - Numpy Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/numpy-functions/ - Strings: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/strings/ - Structs: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/structs/ - User Defined Python Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/user-defined-python-functions/ - Window Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/window-functions/ - Reference: https://www.pythondataanalysis.com/docs/Polars_user_guide/api/reference/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/contributing/ - Versioning: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/versioning/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars-cloud/ - Ecosystem: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/ecosystem/ - Gpu Support: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/gpu-support/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/io/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/lazy/ - Pandas: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/pandas/ - Spark: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/spark/ - Arrow: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/arrow/ - Comparison: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/comparison/ - Multiprocessing: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/multiprocessing/ - Polars Llms: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/polars_llms/ - Styling: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/styling/ - Visualization: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/visualization/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/plugins/ - Create: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/create/ - Cte: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/cte/ - Intro: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/intro/ - Select: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/select/ - Show: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/show/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/transformations/ # Polars基本操作 本节展示如何对dataframe的列执行基本操作,比如进行基本的算术计算、执行比较操作以及其他通用操作。在接下来的示例中,我们将使用以下dataframe: ```python import polars as pl import numpy as np np.random.seed(42) # For reproducibility. df = pl.DataFrame( { "nrs": [1, 2, 3, None, 5], "names": ["foo", "ham", "spam", "egg", "spam"], "random": np.random.rand(5), "groups": ["A", "A", "B", "A", "B"], } ) print(df) ``` ``` shape: (5, 4) ┌──────┬───────┬──────────┬────────┐ │ nrs ┆ names ┆ random ┆ groups │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ f64 ┆ str │ ╞══════╪═══════╪══════════╪════════╡ │ 1 ┆ foo ┆ 0.37454 ┆ A │ │ 2 ┆ ham ┆ 0.950714 ┆ A │ │ 3 ┆ spam ┆ 0.731994 ┆ B │ │ null ┆ egg ┆ 0.598658 ┆ A │ │ 5 ┆ spam ┆ 0.156019 ┆ B │ └──────┴───────┴──────────┴────────┘ ``` ## 基本算术运算 Polars 支持相同长度的序列之间进行基本算术运算,也支持序列与字面值之间的基本算术运算。当字面值与序列混合使用时,字面值会被广播扩展,以匹配与之运算的序列的长度。 ```python result = df.select( (pl.col("nrs") + 5).alias("nrs + 5"), (pl.col("nrs") - 5).alias("nrs - 5"), (pl.col("nrs") * pl.col("random")).alias("nrs * random"), (pl.col("nrs") / pl.col("random")).alias("nrs / random"), (pl.col("nrs") ** 2).alias("nrs ** 2"), (pl.col("nrs") % 3).alias("nrs % 3"), ) print(result) ``` ``` shape: (5, 6) ┌─────────┬─────────┬──────────────┬──────────────┬──────────┬─────────┐ │ nrs + 5 ┆ nrs - 5 ┆ nrs * random ┆ nrs / random ┆ nrs ** 2 ┆ nrs % 3 │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ f64 ┆ f64 ┆ i64 ┆ i64 │ ╞═════════╪═════════╪══════════════╪══════════════╪══════════╪═════════╡ │ 6 ┆ -4 ┆ 0.37454 ┆ 2.669941 ┆ 1 ┆ 1 │ │ 7 ┆ -3 ┆ 1.901429 ┆ 2.103681 ┆ 4 ┆ 2 │ │ 8 ┆ -2 ┆ 2.195982 ┆ 4.098395 ┆ 9 ┆ 0 │ │ null ┆ null ┆ null ┆ null ┆ null ┆ null │ │ 10 ┆ 0 ┆ 0.780093 ┆ 32.047453 ┆ 25 ┆ 2 │ └─────────┴─────────┴──────────────┴──────────────┴──────────┴─────────┘ ``` 上面的示例表明,当算术运算的其中一个操作数为 `null` 时,结果也为 `null`。 Polars 使用运算符重载,允许你在表达式中使用你所使用语言的原生算术运算符。如果你愿意,在 Python 中你可以使用相应的具名函数,如下代码片段所示: ```python # Python only: result_named_operators = df.select( (pl.col("nrs").add(5)).alias("nrs + 5"), (pl.col("nrs").sub(5)).alias("nrs - 5"), (pl.col("nrs").mul(pl.col("random"))).alias("nrs * random"), (pl.col("nrs").truediv(pl.col("random"))).alias("nrs / random"), (pl.col("nrs").pow(2)).alias("nrs ** 2"), (pl.col("nrs").mod(3)).alias("nrs % 3"), ) print(result.equals(result_named_operators)) ``` ``` True ``` ## 比较运算 和算术运算一样,Polars支持通过重载运算符或具名函数来进行比较操作: ```python result = df.select( (pl.col("nrs") > 1).alias("nrs > 1"), # .gt (pl.col("nrs") >= 3).alias("nrs >= 3"), # ge (pl.col("random") < 0.2).alias("random < .2"), # .lt (pl.col("random") <= 0.5).alias("random <= .5"), # .le (pl.col("nrs") != 1).alias("nrs != 1"), # .ne (pl.col("nrs") == 1).alias("nrs == 1"), # .eq ) print(result) ``` ``` shape: (5, 6) ┌─────────┬──────────┬─────────────┬──────────────┬──────────┬──────────┐ │ nrs > 1 ┆ nrs >= 3 ┆ random < .2 ┆ random <= .5 ┆ nrs != 1 ┆ nrs == 1 │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ bool ┆ bool ┆ bool ┆ bool ┆ bool ┆ bool │ ╞═════════╪══════════╪═════════════╪══════════════╪══════════╪══════════╡ │ false ┆ false ┆ false ┆ true ┆ false ┆ true │ │ true ┆ false ┆ false ┆ false ┆ true ┆ false │ │ true ┆ true ┆ false ┆ false ┆ true ┆ false │ │ null ┆ null ┆ false ┆ false ┆ null ┆ null │ │ true ┆ true ┆ true ┆ true ┆ true ┆ false │ └─────────┴──────────┴─────────────┴──────────────┴──────────┴──────────┘ ``` ## 布尔运算和按位运算 根据所使用的编程语言,你可以分别使用运算符 `&`、`|` 和 `~` 来进行布尔运算中的 “与”、“或” 和 “非” 操作,也可以使用同名的函数来进行这些操作: ```python # Boolean operators & | ~ result = df.select( ((~pl.col("nrs").is_null()) & (pl.col("groups") == "A")).alias( "number not null and group A" ), ((pl.col("random") < 0.5) | (pl.col("groups") == "B")).alias( "random < 0.5 or group B" ), ) print(result) # Corresponding named functions `and_`, `or_`, and `not_`. result2 = df.select( (pl.col("nrs").is_null().not_().and_(pl.col("groups") == "A")).alias( "number not null and group A" ), ((pl.col("random") < 0.5).or_(pl.col("groups") == "B")).alias( "random < 0.5 or group B" ), ) print(result.equals(result2)) ``` ``` shape: (5, 2) ┌─────────────────────────────┬─────────────────────────┐ │ number not null and group A ┆ random < 0.5 or group B │ │ --- ┆ --- │ │ bool ┆ bool │ ╞═════════════════════════════╪═════════════════════════╡ │ true ┆ true │ │ true ┆ false │ │ false ┆ true │ │ false ┆ false │ │ false ┆ true │ └─────────────────────────────┴─────────────────────────┘ True ``` ### Python 小知识 在 Python 中,函数被命名为 `and_`、`or_` 和 `not_`,这是因为 `and`、`or` 和 `not` 这些词在 Python 中是保留关键字。同样地,我们不能将 `and`、`or` 和 `not` 这些关键字用作布尔运算符,因为这些 Python 关键字会通过双下划线方法 `__bool__`,在真值和假值的上下文中对其操作数进行解释。因此,我们将按位运算符 `&`、`|` 和 `~` 重载为布尔运算符,因为它们是次优选择。 这些运算符/函数也可用于相应的按位运算,同时还有按位运算符 `^` / 函数 `xor` 也可用于按位运算: ```python result = df.select( pl.col("nrs"), (pl.col("nrs") & 6).alias("nrs & 6"), (pl.col("nrs") | 6).alias("nrs | 6"), (~pl.col("nrs")).alias("not nrs"), (pl.col("nrs") ^ 6).alias("nrs ^ 6"), ) print(result) ``` ``` shape: (5, 5) ┌──────┬─────────┬─────────┬─────────┬─────────┐ │ nrs ┆ nrs & 6 ┆ nrs | 6 ┆ not nrs ┆ nrs ^ 6 │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞══════╪═════════╪═════════╪═════════╪═════════╡ │ 1 ┆ 0 ┆ 7 ┆ -2 ┆ 7 │ │ 2 ┆ 2 ┆ 6 ┆ -3 ┆ 4 │ │ 3 ┆ 2 ┆ 7 ┆ -4 ┆ 5 │ │ null ┆ null ┆ null ┆ null ┆ null │ │ 5 ┆ 4 ┆ 7 ┆ -6 ┆ 3 │ └──────┴─────────┴─────────┴─────────┴─────────┘ ``` ## Counting (unique) values Polars has two functions to count the number of unique values in a series. The function `n_unique` can be used to count the exact number of unique values in a series. However, for very large data sets, this operation can be quite slow. In those cases, if an approximation is good enough, you can use the function `approx_n_unique` that uses the algorithm [HyperLogLog++](https://en.wikipedia.org/wiki/HyperLogLog) to estimate the result. ## 统计唯一值 Polars 有两个函数可用于统计一个序列中唯一值的数量。函数 `n_unique` 可用于精确统计一个序列中唯一值的数量。然而,对于非常大的数据集,此操作可能会相当缓慢。在这种情况下,如果近似值足够好,你可以使用函数 `approx_n_unique`,它使用 [HyperLogLog++](https://en.wikipedia.org/wiki/HyperLogLog) 算法来估计结果。 下面的示例展示了一个序列示例,其中 `approx_n_unique` 的估计结果有 0.9% 的误差: ```python long_df = pl.DataFrame({"numbers": np.random.randint(0, 100_000, 100_000)}) result = long_df.select( pl.col("numbers").n_unique().alias("n_unique"), pl.col("numbers").approx_n_unique().alias("approx_n_unique"), ) print(result) ``` ``` shape: (1, 2) ┌──────────┬─────────────────┐ │ n_unique ┆ approx_n_unique │ │ --- ┆ --- │ │ u32 ┆ u32 │ ╞══════════╪═════════════════╡ │ 63218 ┆ 63784 │ └──────────┴─────────────────┘ ``` 你可以使用 Polars 同样提供的 `value_counts` 函数来获取有关唯一值及其计数的更多信息: ```python result = df.select( pl.col("names").value_counts().alias("value_counts"), ) print(result) ``` ``` shape: (4, 1) ┌──────────────┐ │ value_counts │ │ --- │ │ struct[2] │ ╞══════════════╡ │ {"egg",1} │ │ {"spam",2} │ │ {"ham",1} │ │ {"foo",1} │ └──────────────┘ ``` `value_counts` 函数以结构体的形式返回结果,结构体这种数据类型我们将在[后面的章节](structs.md)中探讨。 或者,如果你只需要一个包含唯一值的序列,或者一个包含唯一值计数的序列,只需再调用一个函数即可实现: ```python result = df.select( pl.col("names").unique(maintain_order=True).alias("unique"), pl.col("names").unique_counts().alias("unique_counts"), ) print(result) ``` ``` shape: (4, 2) ┌────────┬───────────────┐ │ unique ┆ unique_counts │ │ --- ┆ --- │ │ str ┆ u32 │ ╞════════╪═══════════════╡ │ foo ┆ 1 │ │ ham ┆ 1 │ │ spam ┆ 2 │ │ egg ┆ 1 │ └────────┴───────────────┘ ``` 请注意,我们需要在 `unique` 函数中指定 `maintain_order=True`,这样结果的顺序就会与 `unique_counts` 中的结果顺序一致。更多信息请参阅 API 参考文档。 ## 条件判断 Polars 通过 `when` 函数支持类似于三元运算符的功能,`when` 函数后面跟着一个 `then` 函数,以及一个可选的 `otherwise` 函数。 `when` 函数接受一个谓词表达式。计算结果为 `True` 的值会被 `then` 函数内部表达式的相应值所替换。计算结果为 `False` 的值会被 `otherwise` 函数内部表达式的相应值所替换;如果没有提供 `otherwise` 函数,这些值将被替换为 `null`。 下面的示例对 “nrs” 列中的数字应用了一步[考拉兹猜想](https://en.wikipedia.org/wiki/Collatz_conjecture): {{code_block('user-guide/expressions/operations', 'collatz', ['when'])}} ```python result = df.select( pl.col("nrs"), pl.when(pl.col("nrs") % 2 == 1) # Is the number odd? .then(3 * pl.col("nrs") + 1) # If so, multiply by 3 and add 1. .otherwise(pl.col("nrs") // 2) # If not, divide by 2. .alias("Collatz"), ) print(result) ``` ``` shape: (5, 2) ┌──────┬─────────┐ │ nrs ┆ Collatz │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞══════╪═════════╡ │ 1 ┆ 4 │ │ 2 ┆ 1 │ │ 3 ┆ 10 │ │ null ┆ null │ │ 5 ┆ 16 │ └──────┴─────────┘ ``` 你还可以通过链式连接任意数量的连续 `.when(...) .then(...)` 代码块,来模拟任意数量的条件判断链,这类似于 Python 中的 `elif` 语句。在这种情况下,对于每个给定的值,只有当之前的所有谓词对该值的判断都为假时,Polars 才会考虑条件判断链中更靠后的替换表达式。