Python数据分析 - PolarsBook中文版: https://www.pythondataanalysis.com/docs/polars_book_cn/ - Polars快速入门: https://www.pythondataanalysis.com/docs/polars_book_cn/quickstart/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/ - Polars表达式: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/expressions/ - Polars上下文: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/contexts/ - Polars分组: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/groupby/ - Polars折叠: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/folds/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/custom_functions/ - Polars实例: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/introduction_polars/ - Polars表达式方法: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/api/ - Polars视频介绍: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/video_intro/ - Polars与Numpy交互: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/numpy/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/dsl/window_functions/ - Polars索引: https://www.pythondataanalysis.com/docs/polars_book_cn/indexing/ - Polars数据类型: https://www.pythondataanalysis.com/docs/polars_book_cn/datatypes/ - 来自Pandas: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_pandas/ - 来自ApacheSpark: https://www.pythondataanalysis.com/docs/polars_book_cn/coming_from_spark/ - Polars性能: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/performance/strings/ - Polars优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/ - Polars惰性方法: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/ - 谓词下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/predicate-pushdown/ - 投影下推: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/projection-pushdown/ - 其它优化: https://www.pythondataanalysis.com/docs/polars_book_cn/optimizations/lazy/other-optimizations/ - Polars参考指南: https://www.pythondataanalysis.com/docs/polars_book_cn/references/ - Polars时间序列: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/ - Polars时间序列实例: https://www.pythondataanalysis.com/docs/polars_book_cn/timeseries/time-series/ - Polars使用范围: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/ - IO: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/ - Polars操作CSV文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/csv/ - Polars操作Parquet文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/parquet/ - Polars处理多个文件: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/multiple_files/ - Polars读取数据库: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/read_db/ - Polars与AWS交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/aws/ - Polars与Google BigQuery交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/google-big-query/ - Polars与Postgres交互: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/io/postgres/ - 互通性: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/ - Arrow: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/arrow/ - Numpy: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/interop/numpy/ - 数据: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/ - 字符串: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/strings/ - 时间戳: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/data/timestamps/ - 数据帧: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/ - 选中行或列: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/row_col_selection/ - 常用操作: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/common-manipulations/ - 聚合: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/aggregate/ - 分组: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/groupby/ - 过滤: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/filter/ - 连接: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/join/ - 重塑: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/melt/ - 条件应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/conditionally-apply/ - 排序: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/sorting/ - 透视: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/df/pivot/ - 应用: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/ - Polars自定义函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/udfs/ - Polars窗口函数: https://www.pythondataanalysis.com/docs/polars_book_cn/howcani/apply/window-functions/ - Python数据分析 第二版: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/ - 第 1 章 准备工作: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-01/ - 第 2 章 Python 语法基础: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-02/ - 第 3 章 Python 的数据结构、函数和文件: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-03/ - 第 4 章 NumPy 基础:数组和向量计算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-04/ - 第 5 章 Pandas 入门: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-05/ - 第 6 章 数据加载、存储与文件格式: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-06/ - 第 7 章 数据清洗和准备: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-07/ - 第 10 章 数据聚合与分组运算: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-10/ - 第 11 章 时间序列: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-11/ - 第 12 章 pandas 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-12/ - 第 13 章 Python 建模库介绍: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-13/ - 第 14 章 数据分析案例: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-14/ - 附录 A NumPy 高级应用: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-A/ - 附录 B 更多关于 IPython 的内容: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Appendix-B/ - 第 8 章 数据规整:聚合、合并和重塑: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-08/ - 第 9 章 绘图和可视化: https://www.pythondataanalysis.com/docs/Python_Data_Analysis_2nd_Editon/Chapter-09/ - Polars用户指南: https://www.pythondataanalysis.com/docs/Polars_user_guide/ - Polars入门: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_getting_started/ - 安装Polars: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars_installation/ - Polars核心概念: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/ - Polars数据类型和结构: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/data-types-and-structures/ - Polars表达式和上下文: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/expressions-and-contexts/ - Polars延迟API: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/lazy-api/ - Streaming: https://www.pythondataanalysis.com/docs/Polars_user_guide/concepts/_streaming/ - Polars表达式: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/ - Polars基本操作: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/basic-operations/ - Aggregation: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/aggregation/ - Casting: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/casting/ - Categorical Data and Enums: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/categorical-data-and-enums/ - Expression Expansion: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/expression-expansion/ - Folds: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/folds/ - Lists and Arrays: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/lists-and-arrays/ - Missing Data: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/missing-data/ - Numpy Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/numpy-functions/ - Strings: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/strings/ - Structs: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/structs/ - User Defined Python Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/user-defined-python-functions/ - Window Functions: https://www.pythondataanalysis.com/docs/Polars_user_guide/expressions/window-functions/ - Reference: https://www.pythondataanalysis.com/docs/Polars_user_guide/api/reference/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/contributing/ - Versioning: https://www.pythondataanalysis.com/docs/Polars_user_guide/development/versioning/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/polars-cloud/ - Ecosystem: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/ecosystem/ - Gpu Support: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/gpu-support/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/io/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/lazy/ - Pandas: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/pandas/ - Spark: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/migration/spark/ - Arrow: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/arrow/ - Comparison: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/comparison/ - Multiprocessing: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/multiprocessing/ - Polars Llms: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/polars_llms/ - Styling: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/styling/ - Visualization: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/misc/visualization/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/plugins/ - Create: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/create/ - Cte: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/cte/ - Intro: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/intro/ - Select: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/select/ - Show: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/sql/show/ - Index: https://www.pythondataanalysis.com/docs/Polars_user_guide/user-guide/transformations/ # GPU Support [Open Beta] Polars provides an in-memory, GPU-accelerated execution engine for Python users of the Lazy API on NVIDIA GPUs using [RAPIDS cuDF](https://docs.rapids.ai/api/cudf/stable/). This functionality is available in Open Beta and is undergoing rapid development. ### System Requirements - NVIDIA Volta™ or higher GPU with [compute capability](https://developer.nvidia.com/cuda-gpus) 7.0+ - CUDA 11 or CUDA 12 - Linux or Windows Subsystem for Linux 2 (WSL2) See the [RAPIDS installation guide](https://docs.rapids.ai/install#system-req) for full details. ### Installation You can install the GPU backend for Polars with a feature flag as part of a normal [installation](installation.md). === ":fontawesome-brands-python: Python" ```bash pip install polars[gpu] ``` !!! note Installation on a CUDA 11 system If you have CUDA 11, the installation line also needs the NVIDIA package index to get the CUDA 11 package. === ":fontawesome-brands-python: Python" ```bash pip install --extra-index-url=https://pypi.nvidia.com polars cudf-polars-cu11 ``` ### Usage Having built a query using the lazy API [as normal](lazy/index.md), GPU-enabled execution is requested by running `.collect(engine="gpu")` instead of `.collect()`. {{ code_header("python", [], []) }} ```python --8<-- "python/user-guide/lazy/gpu.py:setup" result = q.collect(engine="gpu") print(result) ``` ```python exec="on" result="text" session="user-guide/lazy" --8<-- "python/user-guide/lazy/gpu.py:setup" --8<-- "python/user-guide/lazy/gpu.py:simple-result" ``` For more detailed control over the execution, for example to specify which GPU to use on a multi-GPU node, we can provide a `GPUEngine` object. By default, the GPU engine will use a configuration applicable to most use cases. {{ code_header("python", [], []) }} ```python --8<-- "python/user-guide/lazy/gpu.py:engine-setup" result = q.collect(engine=pl.GPUEngine(device=1)) print(result) ``` ```python exec="on" result="text" session="user-guide/lazy" --8<-- "python/user-guide/lazy/gpu.py:engine-setup" --8<-- "python/user-guide/lazy/gpu.py:engine-result" ``` ### How It Works When you use the GPU-accelerated engine, Polars creates and optimizes a query plan and dispatches to a [RAPIDS](https://rapids.ai/) cuDF-based physical execution engine to compute the results on NVIDIA GPUs. The final result is returned as a normal CPU-backed Polars dataframe. ### What's Supported on the GPU? GPU support is currently in Open Beta and the engine is undergoing rapid development. The engine currently supports many, but not all, of the core expressions and data types. Since expressions are composable, it's not feasible to list a full matrix of expressions supported on the GPU. Instead, we provide a list of the high-level categories of expressions and interfaces that are currently supported and not supported. #### Supported - LazyFrame API - SQL API - I/O from CSV, Parquet, ndjson, and in-memory CPU DataFrames. - Operations on numeric, logical, string, and datetime types - String processing - Aggregations and grouped aggregations - Joins - Filters - Missing data - Concatenation #### Not Supported - Eager DataFrame API - Streaming API - Operations on categorical, struct, and list data types - Rolling aggregations - Time series resampling - Timezones - Folds - User-defined functions - JSON, Excel, and Database file formats #### Did my query use the GPU? The release of the GPU engine in Open Beta implies that we expect things to work well, but there are still some rough edges we're working on. In particular the full breadth of the Polars expression API is not yet supported. With fallback to the CPU, your query _should_ complete, but you might not observe any change in the time it takes to execute. There are two ways to get more information on whether the query ran on the GPU. When running in verbose mode, any queries that cannot execute on the GPU will issue a `PerformanceWarning`: {{ code_header("python", [], []) }} ```python --8<-- "python/user-guide/lazy/gpu.py:fallback-setup" with pl.Config() as cfg: cfg.set_verbose(True) result = q.collect(engine="gpu") print(result) ``` ```python exec="on" result="text" session="user-guide/lazy" --8<-- "python/user-guide/lazy/gpu.py:fallback-setup" print( "PerformanceWarning: Query execution with GPU not supported, reason: \n" ": Grouped rolling window not implemented" ) print("# some details elided") print() print(q.collect()) ``` To disable fallback, and have the GPU engine raise an exception if a query is unsupported, we can pass an appropriately configured `GPUEngine` object: {{ code_header("python", [], []) }} ```python q.collect(engine=pl.GPUEngine(raise_on_fail=True)) ``` ```pytb Traceback (most recent call last): File "", line 1, in File "/home/coder/third-party/polars/py-polars/polars/lazyframe/frame.py", line 2035, in collect return wrap_df(ldf.collect(callback)) polars.exceptions.ComputeError: 'cuda' conversion failed: NotImplementedError: Grouped rolling window not implemented ``` Currently, only the proximal cause of failure to execute on the GPU is reported, we plan to extend this functionality to report all unsupported operations for a query. ### Testing The Polars and NVIDIA RAPIDS teams run comprehensive unit and integration tests to ensure that the GPU-accelerated Polars backend works smoothly. The **full** Polars test suite is run on every commit made to the GPU engine, ensuring consistency of results. The GPU engine currently passes 99.2% of the Polars unit tests with CPU fallback enabled. Without CPU fallback, the GPU engine passes 88.8% of the Polars unit tests. With fallback, there are approximately 100 failing tests: around 40 of these fail due to mismatching debug output; there are some cases where the GPU engine produces the a correct result but uses a different data type; the remainder are cases where we do not correctly determine that a query is unsupported and therefore fail at runtime, instead of falling back. ### When Should I Use a GPU? Based on our benchmarking, you're most likely to observe speedups using the GPU engine when your workflow's profile is dominated by grouped aggregations and joins. In contrast I/O bound queries typically show similar performance on GPU and CPU. GPUs typically have less RAM than CPU systems, therefore very large datasets will fail due to out of memory errors. Based on our testing, raw datasets of 50-100 GiB fit (depending on the workflow) well with a GPU with 80GiB of memory. ### CPU-GPU Interoperability Both the CPU and GPU engine use the Apache Arrow columnar memory specification, making it possible to quickly move data between the CPU and GPU. Additionally, files written by one engine can be read by the other engine. When using GPU mode, your workflow won't fail if something isn't supported. When you run `collect(engine="gpu")`, the optimized query plan is inspected to see whether it can be executed on the GPU. If it can't, it will transparently fall back to the standard Polars engine and run on the CPU. GPU execution is only available in the Lazy API, so materialized DataFrames will reside in CPU memory when the query execution finishes. ### Providing feedback Please report issues, and missing features, on the Polars [issue tracker](https://github.com/pola-rs/polars/issues).