Python数据分析 - PolarsBook中文版: https://www.pythondataanalysis.com/docs/polars_book_cn/ - Polars快速入门: https://www.pythondataanalysis.com/docs/polars_book_cn/quickstart/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/expressions/ - Polars上下文: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/contexts/ - Polars分组: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/groupby/ - Polars折叠: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/folds/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/custom_functions/ - Polars实例: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/introduction_polars/ - Polars表达式方法: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/api/ - Polars视频介绍: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/video_intro/ - Polars与Numpy交互: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/numpy/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/window_functions/ - Polars索引: https://www.pythondataanalysis.com/docs/polars_book_cn/indexing/ - Polars数据类型: https://www.pythondataanalysis.com/docs/polars_book_cn/datatypes/ - 来自Pandas: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_pandas/ - 来自ApacheSpark: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_spark/ - Polars性能: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/strings/ - Polars优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/ - Polars惰性方法: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/ - 谓词下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/predicate-pushdown/ - 投影下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/projection-pushdown/ - 其它优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/other-optimizations/ - Polars参考指南: https://www.pythondataanalysis.com/docs/polars_book_cn/references/ - Polars时间序列: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/ - Polars时间序列实例: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/time-series/ - Polars使用范围: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/ - IO: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/ - Polars操作CSV文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/csv/ - Polars操作Parquet文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/parquet/ - Polars处理多个文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/multiple_files/ - Polars读取数据库: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/read_db/ - Polars与AWS交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/aws/ - Polars与Google BigQuery交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/google-big-query/ - Polars与Postgres交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/postgres/ - 互通性: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/ - Arrow: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/arrow/ - Numpy: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/numpy/ - 数据: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/strings/ - 时间戳: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/timestamps/ - 数据帧: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/ - 选中行或列: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/row_col_selection/ - 常用操作: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/common-manipulations/ - 聚合: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/aggregate/ - 分组: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/groupby/ - 过滤: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/filter/ - 连接: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/join/ - 重塑: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/melt/ - 条件应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/conditionally-apply/ - 排序: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/sorting/ - 透视: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/pivot/ - 应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/udfs/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/window-functions/ - Python数据分析 第二版: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/ - 第 1 章 准备工作: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-01/ - 第 2 章 Python 语法基础: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-02/ - 第 3 章 Python 的数据结构、函数和文件: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-03/ - 第 4 章 NumPy 基础:数组和向量计算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-04/ - 第 5 章 Pandas 入门: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-05/ - 第 6 章 数据加载、存储与文件格式: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-06/ - 第 7 章 数据清洗和准备: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-07/ - 第 10 章 数据聚合与分组运算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-10/ - 第 11 章 时间序列: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-11/ - 第 12 章 pandas 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-12/ - 第 13 章 Python 建模库介绍: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-13/ - 第 14 章 数据分析案例: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-14/ - 附录 A NumPy 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-A/ - 附录 B 更多关于 IPython 的内容: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-B/ - 第 8 章 数据规整:聚合、合并和重塑: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-08/ - 第 9 章 绘图和可视化: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-09/ - Polars用户指南: https://www.pythondataanalysis.com/docs/Polars_user_guide/ - Polars入门: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_getting_started/ - 安装Polars: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_installation/ - Polars核心概念: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/ - Polars数据类型和结构: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/data-types-and-structures/ - Polars表达式和上下文: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/expressions-and-contexts/ - Polars延迟API: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/lazy-api/ - Streaming: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/_streaming/ - Polars表达式: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/ - Polars基本操作: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/basic-operations/ - Aggregation: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/aggregation/ - Casting: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/casting/ - Categorical Data and Enums: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/categorical-data-and-enums/ - Expression Expansion: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/expression-expansion/ - Folds: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/folds/ - Lists and Arrays: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/lists-and-arrays/ - Missing Data: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/missing-data/ - Numpy Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/numpy-functions/ - Strings: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/strings/ - Structs: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/structs/ - User Defined Python Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/user-defined-python-functions/ - Window Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/window-functions/ - Reference: https://www.pythondataanalysis.com/docs/Polars_user_guide/api/reference/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/contributing/ - Versioning: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/versioning/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars-cloud/ - Ecosystem: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/ecosystem/ - Gpu Support: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/gpu-support/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/io/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/lazy/ - Pandas: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/pandas/ - Spark: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/spark/ - Arrow: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/arrow/ - Comparison: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/comparison/ - Multiprocessing: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/multiprocessing/ - Polars Llms: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/polars_llms/ - Styling: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/styling/ - Visualization: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/visualization/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/plugins/ - Create: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/create/ - Cte: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/cte/ - Intro: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/intro/ - Select: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/select/ - Show: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/show/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/transformations/ # 安装Polars Polars是一个库,安装起来就像调用相应编程语言的包管理器一样简单。 ``` bash pip install polars #或者对于那些不支持高级矢量扩展指令集2(AVX2)的旧CPU pip install polars-lts-cpu ``` ``` shell cargo add polars -F lazy # Or Cargo.toml [dependencies] polars = { version = "x", features = ["lazy", ...]} ``` ## 大索引 默认情况下,Polars dataframes的行数限制为2^32(约43亿)行。通过启用大索引扩展功能,可将此限制提升至2^64(约1800京)行: ``` bash pip install polars-u64-idx ``` ``` shell cargo add polars -F bigidx # Or Cargo.toml [dependencies] polars = { version = "x", features = ["bigidx", ...] } ``` ## 旧款CPU 在不支持高级矢量扩展指令集([AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions))的旧款 CPU 上为 Python 安装 Polars,请运行: ``` bash pip install polars-lts-cpu ``` ## 导入polars 要使用polars库,只需将其导入到你的项目中即可: ``` python import polars as pl ``` ``` rust use polars::prelude::*; ``` ## 特性标志 通过使用上述命令,你可以将 Polars 的核心部分安装到你的系统上。然而,根据你的使用场景,你可能还需要安装一些可选的依赖项。将这些设置为可选的目的是尽量减少占用空间。根据编程语言的不同,相应的标志也有所不同。在整个用户指南中,当所使用的某项功能需要额外的依赖项时,将会提别提醒。 ### Python ```text # 示例 pip install 'polars[numpy,fsspec]' ``` #### All | 标志 | 说明 | | --- | ---------------------------------- | | all | 安装所有可选的依赖项。 | #### GPU | 标志 | 说明 | | --- | --------------------------- | | gpu | 在英伟达(NVIDIA)图形处理器(GPU)上运行查询。 | **说明** 有关更详细的说明和先决条件,请参阅[GPU支持](gpu-support.md)相关内容。 #### 互操作性 | 标志 | 说明 | | -------- | -------------------------------------------------- | | pandas | Convert data to and from pandas dataframes/series. | | numpy | Convert data to and from NumPy arrays. | | pyarrow | Convert data to and from PyArrow tables/arrays. | | pydantic | Convert data from Pydantic models to Polars. | #### Excel | 标志 | 说明 | | ---------- | ------------------------------------------------ | | calamine | Read from Excel files with the calamine engine. | | openpyxl | Read from Excel files with the openpyxl engine. | | xlsx2csv | Read from Excel files with the xlsx2csv engine. | | xlsxwriter | Write to Excel files with the XlsxWriter engine. | | excel | Install all supported Excel engines. | #### 数据库 | 标志 | 说明 | | ---------- | ------------------------------------------------------------------------------------ | | adbc | Read from and write to databases with the Arrow Database Connectivity (ADBC) engine. | | connectorx | Read from databases with the ConnectorX engine. | | sqlalchemy | Write to databases with the SQLAlchemy engine. | | database | Install all supported database engines. | #### 云 | 标志 | 说明 | | ------ | ------------------------------------------- | | fsspec | Read from and write to remote file systems. | #### 其他I/O | 标志 | 说明 | | --------- | ------------------------------------ | | deltalake | Read from and write to Delta tables. | | iceberg | Read from Apache Iceberg tables. | #### 其他 | 标志 | 说明 | | ----------- | ----------------------------------------------- | | async | Collect LazyFrames asynchronously. | | cloudpickle | Serialize user-defined functions. | | graph | Visualize LazyFrames as a graph. | | plot | Plot dataframes through the `plot` namespace. | | style | Style dataframes through the `style` namespace. | | timezone | Timezone support.仅使用Windows时才需要 | ### Rust ```toml # Cargo.toml [dependencies] polars = { version = "0.26.1", features = ["lazy", "temporal", "describe", "json", "parquet", "dtype-datetime"] } ``` 可选择启用的功能如下: - 额外的数据类型: - `dtype-date` - `dtype-datetime` - `dtype-time` - `dtype-duration` - `dtype-i8` - `dtype-i16` - `dtype-u8` - `dtype-u16` - `dtype-categorical` - `dtype-struct` - `lazy` - Lazy API: - `regex` - 在列选择中使用正则表达式. - `dot_diagram` - 根据惰性逻辑计划创建点图。 - `sql` - 将 SQL 查询传递给 Polars。 - `streaming` - 能够处理比内存容量更大的数据集。 - `random` - 生成包含随机采样值的数组 - `ndarray`- 将`DataFrame`(数据框)转换为`ndarray`(多维数组) - `temporal` - 针对时间数据类型在 [Chrono](https://docs.rs/chrono/)(时间库)和 Polars(数据处理库)之间进行转换 - `timezones` - 激活时区支持。 - `strings` - Extra string utilities for `StringChunked`: - `string_pad` - for `pad_start`, `pad_end`, `zfill`. - `string_to_integer` - for `parse_int`. - `object` - Support for generic ChunkedArrays called `ObjectChunked` (generic over `T`). These are downcastable from Series through the [Any](https://doc.rust-lang.org/std/any/index.html) trait. - 性能相关: - `nightly` - Several nightly only features such as SIMD and specialization. - `performant` - more fast paths, slower compile times. - `bigidx` - Activate this feature if you expect >> $2^{32}$ rows. This allows polars to scale up way beyond that by using `u64` as an index. Polars will be a bit slower with this feature activated as many data structures are less cache efficient. - `cse` - Activate common subplan elimination optimization. - IO相关: - `serde` - Support for [serde](https://crates.io/crates/serde) serialization and deserialization. Can be used for JSON and more serde supported serialization formats. - `serde-lazy` - Support for [serde](https://crates.io/crates/serde) serialization and deserialization. Can be used for JSON and more serde supported serialization formats. - `parquet` - Read Apache Parquet format. - `json` - JSON serialization. - `ipc` - Arrow's IPC format serialization. - `decompress` - Automatically infer compression of csvs and decompress them. Supported compressions: - gzip - zlib - zstd - Dataframe操作: - `dynamic_group_by` - Group by based on a time window instead of predefined keys. Also activates rolling window group by operations. - `sort_multiple` - Allow sorting a dataframe on multiple columns. - `rows` - Create dataframe from rows and extract rows from `dataframes`. Also activates `pivot` and `transpose` operations. - `join_asof` - Join ASOF, to join on nearest keys instead of exact equality match. - `cross_join` - Create the Cartesian product of two dataframes. - `semi_anti_join` - SEMI and ANTI joins. - `row_hash` - Utility to hash dataframe rows to `UInt64Chunked`. - `diagonal_concat` - Diagonal concatenation thereby combining different schemas. - `dataframe_arithmetic` - Arithmetic between dataframes and other dataframes or series. - `partition_by` - Split into multiple dataframes partitioned by groups. - Series/表达式操作: - `is_in` - Check for membership in Series. - `zip_with` - Zip two `Series` / `ChunkedArray`s. - `round_series` - round underlying float types of series. - `repeat_by` - Repeat element in an array a number of times specified by another array. - `is_first_distinct` - Check if element is first unique value. - `is_last_distinct` - Check if element is last unique value. - `checked_arithmetic` - checked arithmetic returning `None` on invalid operations. - `dot_product` - Dot/inner product on series and expressions. - `concat_str` - Concatenate string data in linear time. - `reinterpret` - Utility to reinterpret bits to signed/unsigned. - `take_opt_iter` - Take from a series with `Iterator>`. - `mode` - Return the most frequently occurring value(s). - `cum_agg` - `cum_sum`, `cum_min`, and `cum_max`, aggregations. - `rolling_window` - rolling window functions, like `rolling_mean`. - `interpolate` - Interpolate `None` values. - `extract_jsonpath` - [Run `jsonpath` queries on `StringChunked`](https://goessner.net/articles/JsonPath/). - `list` - List utils: - `list_gather` - take sublist by multiple indices. - `rank` - Ranking algorithms. - `moment` - Kurtosis and skew statistics. - `ewma` - Exponential moving average windows. - `abs` - Get absolute values of series. - `arange` - Range operation on series. - `product` - Compute the product of a series. - `diff` - `diff` operation. - `pct_change` - Compute change percentages. - `unique_counts` - Count unique values in expressions. - `log` - Logarithms for series. - `list_to_struct` - Convert `List` to `Struct` data types. - `list_count` - Count elements in lists. - `list_eval` - Apply expressions over list elements. - `cumulative_eval` - Apply expressions over cumulatively increasing windows. - `arg_where` - Get indices where condition holds. - `search_sorted` - Find indices where elements should be inserted to maintain order. - `offset_by` - Add an offset to dates that take months and leap years into account. - `trigonometry` - 三角函数. - `sign` - 计算一个序列中每个元素的符号(正、负或零)。 - `propagate_nans` - `NaN`-propagating min/max aggregations. - Dataframe美化格式化: - `fmt` - 激活Dataframe格式化功能.