DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs

DataFrame Acceleration: hipDF and hipDF.pandas on AMD GPUs#

May 07, 2025 by Fabricio Flores.

8 min read. | 1971 total words.

Applications & models

PyTorch, AI/ML

AI

In our previous blog CuPy and hipDF on AMD: The Basics and beyond, we explored the fundamentals of hipDF and demonstrated the significant speed up it provides when compared to Pandas for data manipulation tasks, particularly when AMD GPUs are used.

hipDF is a GPU-accelerated DataFrame library designed to bring the power of GPUs to data manipulation tasks. hipDF offers an interface similar to the Python Data Analysis Library: Pandas but is optimized for speed and efficiency on GPUs. It provides high performance capabilities for operations such as data loading, data aggregation, and data transformations.

In this follow-up blog, we explore additional capabilities of hipDF and the hipdf.pandas acceleration layer running on AMD GPUs using ROCm. Specifically, we will examine the high-performance capabilities of hipDF for operations on DataFrames such as data manipulation, data aggregation, and the creation of User Defined Functions (UDFs) using standard Python functions. Next, we will explore the cudf.pandas acceleration layer, which is designed to accelerate Pandas operations on GPU with minimal to no code changes, ensuring a unified CPU and GPU experience. Finally, we will discuss cudf.pandas.profiler, a powerful tool for understanding and optimizing Pandas operations within hipDF. This profiler tool generates detailed reports on which operations use the GPU and which fall back to the CPU, helping identify bottlenecks and potential areas for optimization. For more details, see the official documentation: ROCm Data Science and the hipDF installation instructions: Installing hipDF. Additional information can be found visiting the official hipDF Github Repository.

Note

Throughout this blog, you will see the term “cuDF” used for commands and package calls. This reflects the fact that hipDF adopts the well-known cuDF API on AMD hardware, ensuring compatibility and ease of use across various computing environments. This API compatibility enables existing cuDF workloads to be effortlessly transitioned to run on supported AMD devices, allowing you to use AMD’s ROCm platform for your data processing tasks.

Note

You can also make use of the command “import hipdf” instead of “import cudf”. hipDF modules can also be called with “hipdf.MODULE” instead of “cudf.MODULE”

You can find files related to this blog post in the GitHub folder.

Requirements: operating system and hardware tested#

AMD GPU: See the ROCm documentation page for supported hardware and operating systems.
ROCm 6.4: See the ROCm installation for Linux page for installation instructions.
Docker: See Install Docker Engine on Ubuntu for installation instructions.
ROCm Data Science & Installing hipDF: see hipDF installation.

Following along with this blog#

You can run this blog by using a Docker container. Using Docker is the easiest and most reliable way to construct the required environment.

Clone the repo and cd into the blog directory:

git clone https://github.com/ROCm/rocm-blogs.git
cd rocm-blogs/blogs/artificial-intelligence/hipDF_pandas_accelerated 

Build and start the container. For details on the build process, see the hipDF_pandas_accelerated/docker/Dockerfile.
```
cd docker
docker compose build
docker compose up
```
Navigate to http://127.0.0.1:8888/lab in your browser and open the /src/dataset_creation.ipynb and /src/hipDF_pandas_accelerated.ipynb notebooks.

Load and explore the data#

For the purpose of this blog, we have created a synthetic dataset that mimics financial transaction data, including fields that are typical of customer transactions with a bank.

The ~/src/dataset_creation.ipynb notebook contains the code that will allow us to create the synthetic dataset. You can run the dataset_creation.ipynb notebook or use the following code to create it.

Dataset creation (Click to expand)

import pandas as pd
import numpy as np
import time

def generate_synthetic_data(num_records):

    # Set random seed for reproducibility
    np.random.seed(42)
    
    # Generate random data
    data = {
        'TransactionID': np.arange(1, num_records + 1),
        'AccountID': np.random.randint(1, 101, num_records),
        'TransactionDate': pd.date_range(start='1900-01-01', periods = num_records, freq = 's'),
        'TransactionAmount': np.random.uniform(10, 5000, num_records).round(2),
        'TransactionType': np.random.choice(['Deposit', 'Withdrawal', 'Transfer', 'Payment'], num_records),
        'CustomerAge': np.random.randint(18, 80, num_records),
        'CustomerGender': np.random.choice(['Male', 'Female'], num_records),
        'CustomerRegion': np.random.choice(['North', 'South', 'East', 'West'], num_records),
        'AccountType': np.random.choice(['Savings', 'Checking', 'Credit'], num_records),
        'BranchCode': np.random.randint(1, 21, num_records),
        'Currency': np.random.choice(['USD', 'EUR', 'GBP', 'JPY'], num_records),
        'TransactionStatus': np.random.choice(['Completed', 'Pending', 'Failed'], num_records),
        'CustomerIncome': np.random.uniform(20000, 150000, num_records).round(2),
        'CustomerMaritalStatus': np.random.choice(['Single', 'Married', 'Divorced', 'Widowed'], num_records),
        'TransactionChannel': np.random.choice(['Online', 'Branch', 'ATM', 'Mobile'], num_records),
        'MerchantCategory': np.random.choice(['Groceries', 'Utilities', 'Entertainment', 'Travel', 'Healthcare'], num_records),
        'TransactionFee': np.random.uniform(0, 50, num_records).round(2),
        'CustomerOccupation': np.random.choice(['Employed', 'Self-employed', 'Unemployed', 'Retired', 'Student'], num_records),
        'CustomerEducationLevel': np.random.choice(['High School', 'Bachelor’s', 'Master’s', 'Doctorate'], num_records),
        'TransactionTime': pd.to_datetime(np.random.randint(0, 86400, num_records), unit = 's').time
    }
    
    # Create DataFrame
    df = pd.DataFrame(data)

    return df

num_records = 10_000_000
data = generate_synthetic_data(num_records)
data.to_parquet('transactions_synthetic_data.parquet')

We will use this dataset to demonstrate the functionality of cudf and cudf.pandas operations, alongside the implementation of custom user functions and profiling analysis.

Let’s begin by importing the cudf dependency:

import cudf

Then load our synthetic dataset:

dataset = cudf.read_parquet('transactions_synthetic_data.parquet')
display(type(dataset))
display(dataset.shape)
dataset.head()

The output of the commands above is:

cudf.core.dataframe.DataFrame
(10000000, 20)

TransactionID	…	TransactionChannel	MerchantCategory	TransactionFee	TransactionTime
0	…	ATM	Entertainment	34.40	0 days 23:49:22
1	…	ATM	Groceries	46.51	0 days 06:04:57
2	…	Mobile	Utilities	19.82	0 days 23:09:57
3	…	Online	Entertainment	18.38	0 days 00:10:17
4	…	Online	Groceries	43.90	0 days 18:40:09

From the output above, we can see that the dataset object’s datatype is cudf.core.dataframe.DataFrame. The dimensions of this DataFrame are 10 million rows and 20 columns. It consists of a variety of variable types, including numerical and categorical, with data formats such as strings, dates, and timestamps.

Now that our data is ready, let’s look into some examples of transformations that can be applied to the entire dataset, represented by a cudf.DataFrame, or to individual columns, represented by cudf.Series, within that DataFrame.

hipDF operations and User Defined Functions (UDFs)#

A hipDF User Defined Function (UDF) is a custom function that is used to perform custom operations on cudf.Series or cudf.DataFrames. These custom functions are written to leverage the parallel processing power of GPUs to perform data operations within a hipDF DataFrame. UDFs can be applied to columns (cudf.Series) in a similar way to Pandas UDFs but are optimized for performance on GPUs.

Example 1#

A common preprocessing step in various Natural Language Processing (NLP) tasks is lowercasing text. Lowercasing is beneficial in cases where normalization is needed as a way to reduce variability and simplify text. Let’s demonstrate how to use a UDF that lowercases the text in the TransactionType column using an inline lambda function and the lower method from the cudf.Series object:

# TransactionType to lower
dataset['TransactionType'] = dataset['TransactionType'].apply(lambda x:x.lower())
display(type(dataset))
dataset.head(3)

Where the output is:

cudf.core.dataframe.DataFrame

TransactionID	AccountID	…	TransactionType	CustomerAge	CustomerGender	…
0	1	…	payment	31	Female	…
1	2	…	transfer	30	Female	…
2	3	…	transfer	47	Male	…

The text in the TransactionType column has been transformed into its lowercase representation (e.g. Payment to payment).

Example 2#

Similarly, we can also create another User Defined Function to transform the variable gender to a binary representation:

# Transform CustomerGender to zero & one
def customer_gender_to_binary(x):
    return 1 if x == 'Male' else 0

dataset['CustomerGender'] = dataset['CustomerGender'].apply(customer_gender_to_binary)
display(type(dataset))
dataset.head(3)

The output from this function is:

cudf.core.dataframe.DataFrame

TransactionID	AccountID	…	TransactionType	CustomerAge	CustomerGender	…
0	1	…	payment	31	0	…
1	2	…	transfer	30	0	…
2	3	…	transfer	47	1	…

Example 3#

We can also work across multiple columns (cudf.Series) :

# Operate using data from multiple columns
def compute_transaction_amount_over_income(row):
    return row['TransactionAmount']/row['CustomerIncome']

dataset['transaction_amount_over_income'] = dataset.apply(compute_transaction_amount_over_income, axis = 1)
display(type(dataset))
dataset.head(3)

The output of this transformation consists of a DataFrame with a new column transaction_amount_over_income that computes the ratio TransactionAmount/CustomerIncome:

cudf.core.dataframe.DataFrame

TransactionID	AccountID	…	TransactionTime	transaction_amount_over_income
1	52	…	23:49:22	0.062809
2	93	…	06:04:57	0.035729
3	15	…	23:09:57	0.013124

Example 4#

Similarly, we can perform custom aggregations using cudf.DataFrame.groupby:

# Group by TransactionType and MerchantCategory
agg1 = dataset.groupby(['TransactionType','MerchantCategory']).agg({
    'TransactionAmount': ['sum', 'mean', 'max', 'min', 'std'],
    'TransactionFee': ['sum', 'mean', 'max', 'min', 'std']
}).reset_index()

display(type(agg1))
agg1.sort_values(by = ['TransactionType','MerchantCategory'])

Resulting in a DataFrame similar to:

cudf.core.dataframe.DataFrame

	TransactionType	MerchantCategory			TransactionAmount		TransactionFee
			Sum	…	Std	Sum	…
2	deposit	Entertainment	1.256700e+09	…	1441.526993	1.251765e+07	…
0	deposit	Groceries	1.253619e+09	…	1439.162493	1.252060e+07	…
13	deposit	Healthcare	1.251567e+09	…	1439.087255	1.247303e+07	…
6	deposit	Travel	1.252849e+09	…	1440.157509	1.253024e+07	…
7	deposit	Utilities	1.248455e+09	…	1439.781811	1.247236e+07	…
1	payment	Entertainment	1.255383e+09	…	1438.829924	1.251921e+07	…
18	payment	Groceries	1.253575e+09	…	1440.974886	1.251943e+07	…

In all the previous examples, we observe that the output’s datatype is cudf.core.dataframe.DataFrame, illustrating that these objects reside and were processed on the GPU.

cuDF.pandas acceleration layer#

So far, we have demonstrated how to perform operations in our dataset DataFrame that take advantage of the GPU’s parallel processing for faster computations by explicitly calling import cudf in our Python code. What if we have existing code that uses Pandas and we want to leverage GPU accelerated operations? Rewriting code that uses Pandas to leverage hipDF can take some time and effort, not to mention the additional testing that is required after the changes are made.

Luckily, hipDF offers the cudf.pandas extension that allows for effortless integration of hipDF DataFrames with existing Pandas code without requiring any major change to the code. By executing %load_ext cudf.pandas, we can enable GPU-accelerated data manipulation similar to Pandas, but with significant performance improvements. The cudf.pandas extension proxies Pandas operations to be executed on GPU whenever possible and falls back to CPU-based Pandas operations if necessary.

Timing pandas.DataFrame.groupby operations on CPU#

Before trying the cudf.pandas extension, let’s time the execution of a Pandas operation on CPU so that we have a better idea of the gains in performance when the same operation is being proxied for GPU execution.

For consistency, let’s begin by restarting the Python kernel on our running Jupyter notebook:

get_ipython().kernel.do_shutdown(restart=True)

Next, import the Pandas library and load our dataset:

import pandas as pd

dataset = pd.read_parquet('transactions_synthetic_data.parquet')

Example 5#

With our dataset DataFrame ready, let’s use the %%time Python magic to measure the execution time of the pd.DataFrame.groupby operation, which is designed to group and aggregate data:

%%time
# Group by TransactionType, CustomerRegion, AccountType and BranchCode
agg1 = dataset.groupby(['TransactionType', 'CustomerRegion','AccountType', 'BranchCode']).agg({
    'TransactionAmount': ['sum', 'mean', 'max', 'min', 'std'],
    'TransactionFee': ['sum', 'mean', 'max', 'min', 'std']
}).reset_index()

The output is:

CPU times: user 1.73 s, sys: 312 ms, total: 2.04 s
Wall time: 2.02 s

Note: A Python “magic” is a set of commands provided by IPython that are designed to facilitate common tasks in general programming within Jupyter Notebooks. These magic commands are prefixed with either one or two percent symbols: “%” for line magics and “%%” for cell magics. See the IPython Documentation for details.

The %%time magic returned a detailed description of the time spent by the CPU processing the code. In our case, we have 1.73 seconds of user processing and 312 milliseconds of sys processing, with a combined total of 2.04 s. We also have 2.02 seconds of Wall time.

The user field indicates the time spent in processes initiated by the user, sys is the time spent in running systems calls, and Wall time is the total real-world time taken to run the cell, including input/output time and other processes.

Timing pandas.DataFrame.groupby operations using the cudf.pandas acceleration#

At this point, we have recorded the time taken to run the example that uses pd.DataFrame.groupby on the CPU (2.02 seconds). Let’s see how this compares to using the cudf.pandas acceleration layer. Begin by restarting the Python kernel:

get_ipython().kernel.do_shutdown(restart=True)

Next, proceed to load the acceleration layer extension:

%load_ext cudf.pandas

Note: You must restart the Python kernel before loading the cudf.pandas acceleration layer. Restarting the kernel ensures that the GPU acceleration is properly initialized without interference from previous CPU-based operations that could result in conflicts or memory management issues.

Import Pandas and load the dataset:

import pandas as pd

dataset = pd.read_parquet('transactions_synthetic_data.parquet')

Finally, execute the same pd.DataFrame.groupby operation as before:

%%time
# Group by TransactionType, CustomerRegion, AccountType and BranchCode
agg1 = dataset.groupby(['TransactionType', 'CustomerRegion','AccountType', 'BranchCode']).agg({
    'TransactionAmount': ['sum', 'mean', 'max', 'min', 'std'],
    'TransactionFee': ['sum', 'mean', 'max', 'min', 'std']
}).reset_index()

Here, the %%time output is:

CPU times: user 108 ms, sys: 28.9 ms, total: 137 ms
Wall time: 130 ms

We see a noticeable improvement on the execution time of the pd.DataFrame.groupby operation thanks to the cudf.pandas acceleration layer. Moreover, we did not have to make any changes to the code!

Profiling operations using cudf.pandas#

The cudf.pandas extension allows us to speed up data processing tasks that include Pandas operations. The cudf.pandas extension also includes the %%cudf.pandas.profile magic that generates a detailed performance report on the Pandas operations. This profiler helps data scientists understand which parts of the Pandas code are being executed on the GPU and which are falling back to the CPU. This is helpful for optimizing performance and ensuring that the code leverages the GPU as much as possible.

Let’s check the profiler report on the same pd.DataFrame.groupby operation by executing the following:

%%time
%%cudf.pandas.profile

# Group by TransactionType, CustomerRegion, AccountType and BranchCode
agg1 = dataset.groupby(['TransactionType', 'CustomerRegion','AccountType', 'BranchCode']).agg({
    'TransactionAmount': ['sum', 'mean', 'max', 'min', 'std'],
    'TransactionFee': ['sum', 'mean', 'max', 'min', 'std']
}).reset_index()

The profiler output is:

                                         Total time elapsed: 0.112 seconds                                 
                                       3 GPU function calls in 0.068 seconds                               
                                       0 CPU function calls in 0.000 seconds                               
                                                                                                           
                                                       Stats                                               
                                                                                                           
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Function              ┃ GPU ncalls ┃ GPU cumtime ┃ GPU percall ┃ CPU ncalls ┃ CPU cumtime ┃ CPU percall ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ DataFrame.groupby     │ 1          │ 0.000       │ 0.000       │ 0          │ 0.000       │ 0.000       │
│ DataFrameGroupBy.agg  │ 1          │ 0.067       │ 0.067       │ 0          │ 0.000       │ 0.000       │
│ DataFrame.reset_index │ 1          │ 0.001       │ 0.001       │ 0          │ 0.000       │ 0.000       │
└───────────────────────┴────────────┴─────────────┴─────────────┴────────────┴─────────────┴─────────────┘
CPU times: user 123 ms, sys: 505 µs, total: 124 ms
Wall time: 117 ms

The profiler output shows the name of the functions that ran, the number of function calls, which functions ran on the GPU/CPU, and the corresponding running time. In this case, Pandas acceleration allowed for complete GPU execution, allowing faster processing times by harnessing the power of GPU processing.

It was mentioned that the cudf.pandas extension proxies Pandas operations to run on the GPU whenever possible and fall back to CPU-based Pandas operations if necessary. Let’s further explore what this means.

Let’s verify the corresponding datatype of the pd.DataFrame.groupby operation with:

type(pd.DataFrame.groupby)

cudf.pandas.fast_slow_proxy._MethodProxy

We see that cudf.pandas.fast_slow_proxy._MethodProxy acts on behalf of the regular pd.DataFrame.groupby operation. The cudf.pandas extension works by intercepting the Pandas operations. This proxy module contains proxy types and functions designed to run on the GPU where possible and fall back to the CPU when necessary.

To illustrate a case where some operations will fall back to the CPU, let’s explore the pd.pivot_table transformation while timing its execution:

%%time
# Pivot table with CustomerMaritalStatus and MerchantCategory, showing mean and sum of TransactionAmount
pivot1 = pd.pivot_table(
    dataset, 
    values='TransactionAmount',
    index='CustomerMaritalStatus',
    columns='MerchantCategory',
    aggfunc=['mean', 'sum']
    )

CPU times: user 7.44 s, sys: 1.97 s, total: 9.41 s
Wall time: 9.69 s

The execution time is extremely large at 9.96 seconds! Let’s see the pd.pivot_table output:

pivot1.head()

MerchantCategory				mean				sum
CustomerMaritalStatus	Entertainment	Groceries	…	Utilities	Entertainment	Groceries	…	Utilities
Divorced	2507.00	2501.97	…	2505.18	1.25446e+09	1.25263e+09	…	1.24878e+09
Married	2505.66	2503.73	…	2507.58	1.25284e+09	1.25266e+09	…	1.25291e+09
Single	2508.09	2504.90	…	2503.20	1.25667e+09	1.25034e+09	…	1.25342e+09
Widowed	2509.43	2506.07	…	2502.95	1.25588e+09	1.25415e+09	…	1.24980e+09

Let’s use %%cudf.pandas.profile to investigate the cause of such a large execution time:

%%time
%%cudf.pandas.profile

# Pivot table with CustomerMaritalStatus and MerchantCategory, showing mean and sum of TransactionAmount
pivot1 = pd.pivot_table(
    dataset, 
    values='TransactionAmount',
    index='CustomerMaritalStatus',
    columns='MerchantCategory',
    aggfunc=['mean', 'sum']
    )

                                    Total time elapsed: 10.343 seconds                            
                                  0 GPU function calls in 0.000 seconds                          
                                  1 CPU function calls in 9.988 seconds                          
                                                                                                 
                                                  Stats                                          
                                                                                                 
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Function    ┃ GPU ncalls ┃ GPU cumtime ┃ GPU percall ┃ CPU ncalls ┃ CPU cumtime ┃ CPU percall ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ pivot_table │ 0          │ 0.000       │ 0.000       │ 1          │ 9.988       │ 9.988       │
└─────────────┴────────────┴─────────────┴─────────────┴────────────┴─────────────┴─────────────┘
Not all pandas operations ran on the GPU. The following functions required CPU fallback:

- pivot_table

To request GPU support for any of these functions, please file a Github issue here: 
https://github.com/rocm-ds/hipdf/issues/new.

CPU times: user 7.97 s, sys: 2.05 s, total: 10 s
Wall time: 10.4 s

Indeed, the pd.pivot_table fall backs to CPU usage. This explains the large execution time.

Note: At the time this blog was written, the pd.pivot_table operation under the cudf.pandas extension falls back to CPU execution. This might change in the future.

If we want all our operations to run on GPU, we can try to look for a work around. Since the behavior of pd.pivot_table is fairly similar to pd.DataFrame.groupby, we can try to use the latter to obtain the same output.

Let’s begin again by restarting the kernel and loading the cudf.pandas extension and the dataset (running each command in a separate cell):

get_ipython().kernel.do_shutdown(restart=True)

%load_ext cudf.pandas

import pandas as pd

dataset = pd.read_parquet('transactions_synthetic_data.parquet')

By using pd.DataFrame.groupby (instead of pd.pivot_table) and timing the operation, we obtain:

%%time
# Alternative to pivot_table using groupby
pivot1 = dataset.groupby(['CustomerMaritalStatus','MerchantCategory'])['TransactionAmount'].agg(['mean','sum']).unstack()

CPU times: user 711 ms, sys: 62.5 ms, total: 774 ms
Wall time: 757 ms

pivot1.head()

MerchantCategory				mean				sum
CustomerMaritalStatus	Entertainment	Groceries	…	Utilities	Entertainment	Groceries	…	Utilities
Divorced	2507.00	2501.97	…	2505.18	1.25446e+09	1.25263e+09	…	1.24878e+09
Married	2505.66	2503.73	…	2507.58	1.25284e+09	1.25266e+09	…	1.25291e+09
Single	2508.09	2504.90	…	2503.20	1.25667e+09	1.25034e+09	…	1.25342e+09
Widowed	2509.43	2506.07	…	2502.95	1.25588e+09	1.25415e+09	…	1.24980e+09

We obtain the same data aggregation that we got from the pd.pivot_table operation. The running time is also aligned with our expectations of improved execution. When GPU computations are involved, the Wall time is 757 milliseconds. Finally, let’s see the output of the %%cudf.pandas.profile profiler:

%%time
%%cudf.pandas.profile
# Alternative to pivot using groupby
pivot1 = dataset.groupby(['CustomerMaritalStatus','MerchantCategory'])['TransactionAmount'].agg(['mean','sum']).unstack()

                                                                                                                  
                                            Total time elapsed: 0.664 seconds                                     
                                          4 GPU function calls in 0.383 seconds                                   
                                          0 CPU function calls in 0.000 seconds                                   
                                                                                                                  
                                                          Stats                                                   
                                                                                                                  
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Function                     ┃ GPU ncalls ┃ GPU cumtime ┃ GPU percall ┃ CPU ncalls ┃ CPU cumtime ┃ CPU percall ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ DataFrame.groupby            │ 1          │ 0.000       │ 0.000       │ 0          │ 0.000       │ 0.000       │
│ DataFrameGroupBy.__getitem__ │ 1          │ 0.006       │ 0.006       │ 0          │ 0.000       │ 0.000       │
│ SeriesGroupBy.agg            │ 1          │ 0.343       │ 0.343       │ 0          │ 0.000       │ 0.000       │
│ DataFrame.unstack            │ 1          │ 0.034       │ 0.034       │ 0          │ 0.000       │ 0.000       │
└──────────────────────────────┴────────────┴─────────────┴─────────────┴────────────┴─────────────┴─────────────┘
CPU times: user 552 ms, sys: 134 ms, total: 687 ms
Wall time: 670 ms

We see that all the calculations are effectively happening on the GPU when pd.DataFrame.groupby is used, thanks to the cudf.pandas acceleration layer.

Summary#

We have demonstrated how hipDF significantly enhances data manipulation, aggregation, and transformation tasks when these operations are executed on AMD hardware using ROCm. With its GPU-accelerated capabilities, hipDF offers an efficient and high-performance alternative to traditional Pandas operations. The cudf.pandas acceleration layer ensures consistent integration and minimal code changes, providing a unified experience across CPU and GPU environments. Moreover, the cudf.pandas profiler is an invaluable tool for identifying bottlenecks and optimizing performance. By leveraging the advanced hardware of AMD GPUs, hipDF facilitates the processing of complex datasets more efficiently, driving better insights and outcomes.

Disclaimers#

Third-party content is licensed to you directly by the third party that owns the content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.