Python数据分析 - PolarsBook中文版: https://www.pythondataanalysis.com/docs/polars_book_cn/ - Polars快速入门: https://www.pythondataanalysis.com/docs/polars_book_cn/quickstart/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/expressions/ - Polars上下文: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/contexts/ - Polars分组: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/groupby/ - Polars折叠: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/folds/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/custom_functions/ - Polars实例: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/introduction_polars/ - Polars表达式方法: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/api/ - Polars视频介绍: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/video_intro/ - Polars与Numpy交互: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/numpy/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/window_functions/ - Polars索引: https://www.pythondataanalysis.com/docs/polars_book_cn/indexing/ - Polars数据类型: https://www.pythondataanalysis.com/docs/polars_book_cn/datatypes/ - 来自Pandas: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_pandas/ - 来自ApacheSpark: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_spark/ - Polars性能: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/strings/ - Polars优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/ - Polars惰性方法: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/ - 谓词下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/predicate-pushdown/ - 投影下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/projection-pushdown/ - 其它优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/other-optimizations/ - Polars参考指南: https://www.pythondataanalysis.com/docs/polars_book_cn/references/ - Polars时间序列: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/ - Polars时间序列实例: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/time-series/ - Polars使用范围: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/ - IO: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/ - Polars操作CSV文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/csv/ - Polars操作Parquet文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/parquet/ - Polars处理多个文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/multiple_files/ - Polars读取数据库: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/read_db/ - Polars与AWS交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/aws/ - Polars与Google BigQuery交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/google-big-query/ - Polars与Postgres交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/postgres/ - 互通性: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/ - Arrow: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/arrow/ - Numpy: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/numpy/ - 数据: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/strings/ - 时间戳: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/timestamps/ - 数据帧: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/ - 选中行或列: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/row_col_selection/ - 常用操作: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/common-manipulations/ - 聚合: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/aggregate/ - 分组: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/groupby/ - 过滤: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/filter/ - 连接: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/join/ - 重塑: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/melt/ - 条件应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/conditionally-apply/ - 排序: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/sorting/ - 透视: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/pivot/ - 应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/udfs/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/window-functions/ - Python数据分析 第二版: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/ - 第 1 章 准备工作: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-01/ - 第 2 章 Python 语法基础: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-02/ - 第 3 章 Python 的数据结构、函数和文件: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-03/ - 第 4 章 NumPy 基础:数组和向量计算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-04/ - 第 5 章 Pandas 入门: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-05/ - 第 6 章 数据加载、存储与文件格式: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-06/ - 第 7 章 数据清洗和准备: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-07/ - 第 10 章 数据聚合与分组运算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-10/ - 第 11 章 时间序列: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-11/ - 第 12 章 pandas 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-12/ - 第 13 章 Python 建模库介绍: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-13/ - 第 14 章 数据分析案例: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-14/ - 附录 A NumPy 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-A/ - 附录 B 更多关于 IPython 的内容: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-B/ - 第 8 章 数据规整:聚合、合并和重塑: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-08/ - 第 9 章 绘图和可视化: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-09/ - Polars用户指南: https://www.pythondataanalysis.com/docs/Polars_user_guide/ - Polars入门: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_getting_started/ - 安装Polars: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_installation/ - Polars核心概念: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/ - Polars数据类型和结构: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/data-types-and-structures/ - Polars表达式和上下文: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/expressions-and-contexts/ - Polars延迟API: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/lazy-api/ - Streaming: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/_streaming/ - Polars表达式: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/ - Polars基本操作: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/basic-operations/ - Aggregation: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/aggregation/ - Casting: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/casting/ - Categorical Data and Enums: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/categorical-data-and-enums/ - Expression Expansion: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/expression-expansion/ - Folds: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/folds/ - Lists and Arrays: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/lists-and-arrays/ - Missing Data: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/missing-data/ - Numpy Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/numpy-functions/ - Strings: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/strings/ - Structs: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/structs/ - User Defined Python Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/user-defined-python-functions/ - Window Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/window-functions/ - Reference: https://www.pythondataanalysis.com/docs/Polars_user_guide/api/reference/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/contributing/ - Versioning: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/versioning/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars-cloud/ - Ecosystem: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/ecosystem/ - Gpu Support: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/gpu-support/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/io/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/lazy/ - Pandas: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/pandas/ - Spark: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/spark/ - Arrow: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/arrow/ - Comparison: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/comparison/ - Multiprocessing: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/multiprocessing/ - Polars Llms: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/polars_llms/ - Styling: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/styling/ - Visualization: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/visualization/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/plugins/ - Create: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/create/ - Cte: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/cte/ - Intro: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/intro/ - Select: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/select/ - Show: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/show/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/transformations/ # Polars数据类型和结构 ## Polars数据类型 Polars支持多种数据类型,大致可分为以下几类: - 数字数据类型:有符号整数、无符号整数、浮点数和小数。 - 嵌套数据类型:列表、结构体和数组。 - 时间:日期、日期时间、时间和时间增量。 - 杂项:字符串、二进制数据、布尔值、分类、枚举和对象。 所有类型都支持由特殊值`null`表示的缺失值。在浮点数据类型中与特殊值`NaN`合并;参考有关[浮点数](#floating-point-numbers)的更多信息。 你也可以找到附录中支持的[所有数据类型](/docs/Polars_user_guide/concepts/data-types-and-structures/#%E9%99%84%E5%BD%95%E5%AE%8C%E6%95%B4%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B%E8%A1%A8)的完整表,说明何时使用每种数据类型,并提供指向文档相关部分的链接。 ## 序列(Series) Polars提供的核心基础数据结构是序列(series)和数据框(dataframe)。序列是一种一维的同构数据结构。这里所说的“同构”是指一个序列中的所有元素都具有相同的数据类型。下面的代码片段展示了如何创建一个带名称的序列: ```python import polars as pl s = pl.Series("ints", [1, 2, 3, 4, 5]) print(s) ``` ``` shape: (5,) Series: 'ints' [i64] [ 1 2 3 4 5 ] ``` 创建一个序列时,Polars会从你提供的值中推断数据类型。你可以指定一个具体的数据类型来覆盖这种推断机制: ```python s1 = pl.Series("ints", [1, 2, 3, 4, 5]) s2 = pl.Series("uints", [1, 2, 3, 4, 5], dtype=pl.UInt64) print(s1.dtype, s2.dtype) ``` ``` Int64 UInt64 ``` ## 数据框(Dataframe) 数据框是一种二维的异构数据结构,它包含了具有唯一名称的序列。通过将数据存储在数据框中,你将能够使用Polars API编写用于操作数据的查询。你可以通过使用Polars提供的[上下文和表达式](/docs/Polars_user_guide/concepts/expressions-and-contexts/)来实现这一点,我们接下来会讲到这些内容。 下面的代码片段展示了如何根据一个由列表组成的字典来创建一个数据框: ```python from datetime import date df = pl.DataFrame( { "name": ["Alice Archer", "Ben Brown", "Chloe Cooper", "Daniel Donovan"], "birthdate": [ date(1997, 1, 10), date(1985, 2, 15), date(1983, 3, 22), date(1981, 4, 30), ], "weight": [57.9, 72.5, 53.6, 83.1], # (kg) "height": [1.56, 1.77, 1.65, 1.75], # (m) } ) print(df) ``` ``` shape: (4, 4) ┌────────────────┬────────────┬────────┬────────┐ │ name ┆ birthdate ┆ weight ┆ height │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ date ┆ f64 ┆ f64 │ ╞════════════════╪════════════╪════════╪════════╡ │ Alice Archer ┆ 1997-01-10 ┆ 57.9 ┆ 1.56 │ │ Ben Brown ┆ 1985-02-15 ┆ 72.5 ┆ 1.77 │ │ Chloe Cooper ┆ 1983-03-22 ┆ 53.6 ┆ 1.65 │ │ Daniel Donovan ┆ 1981-04-30 ┆ 83.1 ┆ 1.75 │ └────────────────┴────────────┴────────┴────────┘ ``` ### 检查数据框dataframe 在本小节中,我们将展示一些有用的方法来快速检查一个数据框。我们将以之前创建的数据框作为参考。 #### `head`函数 `head`函数用于显示数据框的前几行。默认情况下,你会得到前5行数据,但你也可以指定想要显示的行数: ```python print(df.head(3)) ``` ``` shape: (3, 4) ┌──────────────┬────────────┬────────┬────────┐ │ name ┆ birthdate ┆ weight ┆ height │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ date ┆ f64 ┆ f64 │ ╞══════════════╪════════════╪════════╪════════╡ │ Alice Archer ┆ 1997-01-10 ┆ 57.9 ┆ 1.56 │ │ Ben Brown ┆ 1985-02-15 ┆ 72.5 ┆ 1.77 │ │ Chloe Cooper ┆ 1983-03-22 ┆ 53.6 ┆ 1.65 │ └──────────────┴────────────┴────────┴────────┘ ``` #### glimpse函数 `glimpse`函数是另一个用于显示数据框前几行数据值的函数,但它的输出格式与 `head` 函数不同。在这里,输出的每一行对应于单个列,这使得检查更宽(列数更多)的数据框变得更加容易: ```python print(df.glimpse(return_as_string=True)) ``` ``` Rows: 4 Columns: 4 $ name 'Alice Archer', 'Ben Brown', 'Chloe Cooper', 'Daniel Donovan' $ birthdate 1997-01-10, 1985-02-15, 1983-03-22, 1981-04-30 $ weight 57.9, 72.5, 53.6, 83.1 $ height 1.56, 1.77, 1.65, 1.75 ``` **注意**:`glimpse` 函数仅对于Python可用。 #### `tail`函数 `tail`函数用于显示数据框的最后几行。默认情况下,你会得到最后5行数据,但你也可以像`head`函数那样指定想要显示的行数: ```python print(df.tail(3)) ``` ``` shape: (3, 4) ┌────────────────┬────────────┬────────┬────────┐ │ name ┆ birthdate ┆ weight ┆ height │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ date ┆ f64 ┆ f64 │ ╞════════════════╪════════════╪════════╪════════╡ │ Ben Brown ┆ 1985-02-15 ┆ 72.5 ┆ 1.77 │ │ Chloe Cooper ┆ 1983-03-22 ┆ 53.6 ┆ 1.65 │ │ Daniel Donovan ┆ 1981-04-30 ┆ 83.1 ┆ 1.75 │ └────────────────┴────────────┴────────┴────────┘ ``` #### `sample`函数 如果你认为数据框的首行或末行不能代表你的数据,你可以使用`sample`函数从数据框中获取任意数量的随机选择的行。请注意,返回的行顺序不一定与它们在数据框中出现的顺序相同: ```python import random random.seed(42) # For reproducibility. print(df.sample(2)) ``` ``` shape: (2, 4) ┌────────────────┬────────────┬────────┬────────┐ │ name ┆ birthdate ┆ weight ┆ height │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ date ┆ f64 ┆ f64 │ ╞════════════════╪════════════╪════════╪════════╡ │ Daniel Donovan ┆ 1981-04-30 ┆ 83.1 ┆ 1.75 │ │ Chloe Cooper ┆ 1983-03-22 ┆ 53.6 ┆ 1.65 │ └────────────────┴────────────┴────────┴────────┘ ``` #### `describe`函数 你还可以使用`describe`函数来计算数据框中所有列的汇总统计信息: ```python print(df.describe()) ``` ``` shape: (9, 5) ┌────────────┬────────────────┬─────────────────────┬───────────┬──────────┐ │ statistic ┆ name ┆ birthdate ┆ weight ┆ height │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ str ┆ f64 ┆ f64 │ ╞════════════╪════════════════╪═════════════════════╪═══════════╪══════════╡ │ count ┆ 4 ┆ 4 ┆ 4.0 ┆ 4.0 │ │ null_count ┆ 0 ┆ 0 ┆ 0.0 ┆ 0.0 │ │ mean ┆ null ┆ 1986-09-04 00:00:00 ┆ 66.775 ┆ 1.6825 │ │ std ┆ null ┆ null ┆ 13.560082 ┆ 0.097082 │ │ min ┆ Alice Archer ┆ 1981-04-30 ┆ 53.6 ┆ 1.56 │ │ 25% ┆ null ┆ 1983-03-22 ┆ 57.9 ┆ 1.65 │ │ 50% ┆ null ┆ 1985-02-15 ┆ 72.5 ┆ 1.75 │ │ 75% ┆ null ┆ 1985-02-15 ┆ 72.5 ┆ 1.75 │ │ max ┆ Daniel Donovan ┆ 1997-01-10 ┆ 83.1 ┆ 1.77 │ └────────────┴────────────────┴─────────────────────┴───────────┴──────────┘ ``` ## 模式(Schema) 当我们讨论数据(无论是在数据框中还是其他情况)时,我们可以提及它的模式。模式是将列名或序列名与这些相同列或序列的数据类型进行映射的关系。 你可以使用`schema`来检查数据框的模式。 ```python print(df.schema) ``` ``` Schema({'name': String, 'birthdate': Date, 'weight': Float64, 'height': Float64}) ``` 与序列(series)非常相似,当你创建数据框(dataframe)时,Polars会推断其模式(schema),但如果有需要,你可以覆盖这种推断机制。 在Python中,你可以通过使用字典将列名映射到数据类型来指定一个明确的模式(schema)。如果你不想覆盖某一给定列的推断结果,你可以使用值`None`: ```python df = pl.DataFrame( { "name": ["Alice", "Ben", "Chloe", "Daniel"], "age": [27, 39, 41, 43], }, schema={"name": None, "age": pl.UInt8}, ) print(df) ``` ``` shape: (4, 2) ┌────────┬─────┐ │ name ┆ age │ │ --- ┆ --- │ │ str ┆ u8 │ ╞════════╪═════╡ │ Alice ┆ 27 │ │ Ben ┆ 39 │ │ Chloe ┆ 41 │ │ Daniel ┆ 43 │ └────────┴─────┘ ``` 如果你只需要覆盖某些列的推断结果,`schema_overrides`参数往往会更方便,因为它允许你省略那些你不想覆盖其推断结果的列。 ```python df = pl.DataFrame( { "name": ["Alice", "Ben", "Chloe", "Daniel"], "age": [27, 39, 41, 43], }, schema_overrides={"age": pl.UInt8}, ) print(df) ``` ``` shape: (4, 2) ┌────────┬─────┐ │ name ┆ age │ │ --- ┆ --- │ │ str ┆ u8 │ ╞════════╪═════╡ │ Alice ┆ 27 │ │ Ben ┆ 39 │ │ Chloe ┆ 41 │ │ Daniel ┆ 43 │ └────────┴─────┘ ``` ## 数据类型内部机制 Polars 在数据存储结构方面采用了[Arrow 列式](https://arrow.apache.org/docs/format/Columnar.html)存储格式。遵循这一规范使得Polars能够在与同样使用Arrow 规范的其他工具之间传输数据,且几乎不会产生额外开销。 Polars 的高性能在很大程度上得益于其查询引擎、对查询计划所执行的优化操作,以及在运行[表达式](expressions-and-contexts.md#expressions)时所采用的并行处理机制。 ## 浮点数 Polars generally follows the IEEE 754 floating point standard for `Float32` and `Float64`, with some exceptions: Polars 在处理`Float32`和`Float64`类型时,总体上遵循IEEE 754浮点数标准,但也存在一些例外情况: - 任何一个`NaN`(非数字)都与其他任何`NaN`相等,并且大于任何非`NaN`值。 - 对于零的符号、`NaN`,以及`NaN`值的有效载荷,操作并不保证有特定的行为。这不仅局限于算术运算,例如,排序或分组操作可能会将所有的零规范为+0,将所有的NaN规范为无有效载荷的正NaN,以便进行高效的相等性检查。 Polars总是试图为浮点计算提供相当准确的结果,但除非另有说明,否则不保证不存在误差。一般来说,要达到100%准确的结果,成本高得令人望而却步(需要比64位浮点数大得多的内部表示形式),因此,总是会存在一些误差。 ## 附录:完整数据类型表 | 类型 | 说明 | | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `Boolean` | 布尔类型 | | `Int8`, `Int16`, `Int32`, `Int64` | 可变精度的有符号整数类型。 | | `UInt8`, `UInt16`, `UInt32`, `UInt64` | 可变精度的无符号整数类型。 | | `Float32`, `Float64` | 可变精度的有符号浮点数。 | | `Decimal` | Decimal 128-bit type with optional precision and non-negative scale. Use this if you need fine-grained control over the precision of your floats and the operations you make on them. See [Python's `decimal.Decimal`](https://docs.python.org/3/library/decimal.html) for documentation on what a decimal data type is. | | `String` | 可变长度的UTF-8编码字符串数据。 | | `Binary` | 存储任意的、长度可变的原始二进制数据。 | | `Date` | 表示一个公历日期。 | | `Time` | 表示一天中的某个时刻。 | | `Datetime` | 表示一个公历日期以及一天中的具体时刻。 | | `Duration` | 表示一个时间间隔。 | | `Array` | Arrays with a known, fixed shape per series; akin to numpy arrays. [Learn more about how arrays and lists differ and how to work with both](../expressions/lists-and-arrays.md). | | `List` | Homogeneous 1D container with variable length. [Learn more about how arrays and lists differ and how to work with both](../expressions/lists-and-arrays.md). | | `Object` | 包装任意的Python对象。 | | `Categorical` | Efficient encoding of string data where the categories are inferred at runtime. [Learn more about how categoricals and enums differ and how to work with both](../expressions/categorical-data-and-enums.md). | | `Enum` | Efficient ordered encoding of a set of predetermined string categories. [Learn more about how categoricals and enums differ and how to work with both](../expressions/categorical-data-and-enums.md). | | `Struct` | Composite product type that can store multiple fields. [Learn more about the data type `Struct` in its dedicated documentation section.](../expressions/structs.md). | | `Null` | 表示空值。 |