Python数据分析
- PolarsBook中文版: https://www.pythondataanalysis.com/docs/polars_book_cn/
- Polars快速入门: https://www.pythondataanalysis.com/docs/polars_book_cn/quickstart/
- Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/
- Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/expressions/
- Polars上下文: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/contexts/
- Polars分组: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/groupby/
- Polars折叠: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/folds/
- Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/custom_functions/
- Polars实例: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/introduction_polars/
- Polars表达式方法: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/api/
- Polars视频介绍: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/video_intro/
- Polars与Numpy交互: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/numpy/
- Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/window_functions/
- Polars索引: https://www.pythondataanalysis.com/docs/polars_book_cn/indexing/
- Polars数据类型: https://www.pythondataanalysis.com/docs/polars_book_cn/datatypes/
- 来自Pandas: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_pandas/
- 来自ApacheSpark: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_spark/
- Polars性能: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/
- 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/strings/
- Polars优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/
- Polars惰性方法: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/
- 谓词下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/predicate-pushdown/
- 投影下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/projection-pushdown/
- 其它优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/other-optimizations/
- Polars参考指南: https://www.pythondataanalysis.com/docs/polars_book_cn/references/
- Polars时间序列: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/
- Polars时间序列实例: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/time-series/
- Polars使用范围: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/
- IO: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/
- Polars操作CSV文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/csv/
- Polars操作Parquet文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/parquet/
- Polars处理多个文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/multiple_files/
- Polars读取数据库: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/read_db/
- Polars与AWS交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/aws/
- Polars与Google BigQuery交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/google-big-query/
- Polars与Postgres交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/postgres/
- 互通性: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/
- Arrow: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/arrow/
- Numpy: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/numpy/
- 数据: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/
- 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/strings/
- 时间戳: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/timestamps/
- 数据帧: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/
- 选中行或列: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/row_col_selection/
- 常用操作: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/common-manipulations/
- 聚合: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/aggregate/
- 分组: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/groupby/
- 过滤: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/filter/
- 连接: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/join/
- 重塑: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/melt/
- 条件应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/conditionally-apply/
- 排序: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/sorting/
- 透视: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/pivot/
- 应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/
- Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/udfs/
- Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/window-functions/
- Python数据分析 第二版: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/
- 第 1 章 准备工作: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-01/
- 第 2 章 Python 语法基础: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-02/
- 第 3 章 Python 的数据结构、函数和文件: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-03/
- 第 4 章 NumPy 基础:数组和向量计算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-04/
- 第 5 章 Pandas 入门: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-05/
- 第 6 章 数据加载、存储与文件格式: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-06/
- 第 7 章 数据清洗和准备: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-07/
- 第 10 章 数据聚合与分组运算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-10/
- 第 11 章 时间序列: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-11/
- 第 12 章 pandas 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-12/
- 第 13 章 Python 建模库介绍: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-13/
- 第 14 章 数据分析案例: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-14/
- 附录 A NumPy 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-A/
- 附录 B 更多关于 IPython 的内容: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-B/
- 第 8 章 数据规整:聚合、合并和重塑: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-08/
- 第 9 章 绘图和可视化: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-09/
- Polars用户指南: https://www.pythondataanalysis.com/docs/Polars_user_guide/
- Polars入门: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_getting_started/
- 安装Polars: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_installation/
- Polars核心概念: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/
- Polars数据类型和结构: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/data-types-and-structures/
- Polars表达式和上下文: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/expressions-and-contexts/
- Polars延迟API: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/lazy-api/
- Streaming: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/_streaming/
- Polars表达式: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/
- Polars基本操作: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/basic-operations/
- Aggregation: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/aggregation/
- Casting: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/casting/
- Categorical Data and Enums: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/categorical-data-and-enums/
- Expression Expansion: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/expression-expansion/
- Folds: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/folds/
- Lists and Arrays: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/lists-and-arrays/
- Missing Data: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/missing-data/
- Numpy Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/numpy-functions/
- Strings: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/strings/
- Structs: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/structs/
- User Defined Python Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/user-defined-python-functions/
- Window Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/window-functions/
- Reference: https://www.pythondataanalysis.com/docs/Polars_user_guide/api/reference/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/contributing/
- Versioning: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/versioning/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars-cloud/
- Ecosystem: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/ecosystem/
- Gpu Support: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/gpu-support/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/io/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/lazy/
- Pandas: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/pandas/
- Spark: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/spark/
- Arrow: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/arrow/
- Comparison: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/comparison/
- Multiprocessing: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/multiprocessing/
- Polars Llms: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/polars_llms/
- Styling: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/styling/
- Visualization: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/visualization/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/plugins/
- Create: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/create/
- Cte: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/cte/
- Intro: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/intro/
- Select: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/select/
- Show: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/show/
- Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/transformations/
# Polars索引
`Polars` `DataFrame`没有索引,因此索引行为可以是一致的,而不需要 `df.loc`,
`df.iloc`, or a `df.at` 操作。
规则如下(取决于值的数据类型):
- **数值型**
- axis 0: 行(row)
- axis 1: 列(column)
- **数值型 + 字符串**
- axis 0: 行(这里只接收数字)
- axis 1: 列(接受数字+字符串值)
- **仅字符串**
- axis 0: 列(column)
- axis 1: 报错(error)
- **表达式**
_所有表达式求值都是并行执行的_
- axis 0: 列(column)
- axis 1: 列(column)
- ..
- axis n: 列(column)
## 与Pandas的对比
| pandas | polars |
| -------------------------------------------------------- | ------------------------- |
| 选择列
`df.iloc[2]` | `df[2, :]` |
| 按索引选择几行
`df.iloc[[2, 5, 6]]` | `df[[2, 5, 6], :]` |
| 选择行的切片
`df.iloc[2:6]` | `df[2:6, :]` |
| 使用布尔掩码(boolean mask)选择行
`df.iloc[True, True, False]` | `df[[True, True, False]]` |
| 按谓词(predicate)条件选择行
`df.loc[df["A"] > 3]` | `df[df["A"] > 3]` |
| 选择列的切片
`df.iloc[:, 1:3]` | `df[:, 1:3]` |
| 按字符串顺序选择列的切片
`df.loc[:, "A":"Z"]` | `df[:, "A":"Z"]` |
| 选择单个值(标量)
`df.loc[2, "A"]` | `df[2, "A"]` |
| 选择单个值(标量)
`df.iloc[2, 1]` | `df[2, 1]` |
| 选择单个值(Series或DataFrame)
`df.loc[2, ["A"]]` | `df[2, ["A"]]` |
| 选择单个值 (Series或DataFrame)
`df.iloc[2, [1]]` | `df[2, [1]]` |
## 表达式
表达式也可以用于索引(它是`df.select`的语法糖)。
这可以用来做一些很有别致的选择。
```python
df[[
pl.col("A").head(5), # 从“A”的首部开始获取
pl.col("B").tail(5).reverse(), # 以逆序的方式获取“B”的后部
pl.col("B").filter(pl.col("B") > 5).head(5), # 首先得到满足谓词的“B”
pl.sum("A").over("B").head(5) # 获取“A”在“B”组上的总和,并返回前5个
```