Examining Google Analytics GA4 Data with Python and Pandas.


Welcome to my beginner-friendly tutorial on analyzing Google Analytics 4 data using Python and Pandas. In this guide, we’ll walk you through a simple Python script that fetches data from a Google Analytics property and shows you how to analyze it using the Pandas library.

Pandas bar chart

Prerequisites

Before we begin, make sure you have the following:

  • Google Analytics 4 Property ID: Replace GA4_PROPERTY_ID in the code with your actual property ID.
  • Google Analytics Data API Credentials: Create a service account on the Google Cloud Console and download the JSON key file. Update CREDENTIALS_FILE_PATH in the code with the path to your key file.

Setup

Let’s set up the environment. Open your terminal and install the required packages:

pip install google-analytics-data
pip install pandas

Unlocking the Heart of the Code

In simple terms, the Python script interacts with Google Analytics using the Google Analytics Data API. It fetches information about user activity on your website and converts it into a table, making it easy to understand and work with.

The first thing we need to do is create the BetaAnalyticsDataClient this is done by passing the credetials file

client = BetaAnalyticsDataClient(credentials=credentials)

Then we build our request. By adding the dates, metrics and demensions to a RunReportRequest

 dates = [DateRange(start_date="7daysAgo", end_date='today')]
    metrics = [Metric(name='activeUsers')]
    dimensions = [Dimension(name='city')]
    request = RunReportRequest(
        property=f'properties/{property_id}',
        dimensions=dimensions,
        metrics=metrics,
        date_ranges=dates,
    )

Then we can run the report using the client we built.

response = client.run_report(request)

Covert GA4 response to Pandas DataFrame

The data for a pandas data frame is in the form of a two dimensional array. Where each row contains an array of data. Here would be an example of the data.

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Then we also need an array of the column names, in this case the names of the dimensions and metrics in the report. Here would be an example

['city', 'activeUsers']

Note that the Column a single array nota multi dimensional array as the rows are.

Converting the Google Analytics GA4 response to these two types of arrays is simply a question of looping though with a for loop.

# Grab all the columns in the result
columns = []
for col in data.dimension_headers:
    columns.append(col.name)
for col in data.metric_headers:
    columns.append(col.name)

Then looping though each row to build the array’s of each row.

rows = []
for row_data in data.rows:
    row = []
    for val in row_data.dimension_values:
        row.append(val.value)
    for val in row_data.metric_values:
        row.append(val.value)
    rows.append(row)

Then all you need to do is load it into Pandas as a DataFrame

pd.DataFrame(rows, columns=columns)

Further Analysis with Pandas

Now, the cool part! We use Pandas to dig deeper into the data. Pandas is like a superhero for tables in Python. You can filter, sort, and analyze the data in various ways. We’ve provided some examples in the script.

Conclusion

Analyzing Google Analytics data with Python and Pandas is like unlocking the secrets of your website’s performance. This tutorial gives you a taste of how to fetch and understand the data. Don’t hesitate to explore more with different dimensions and metrics to get insights tailored to your needs.

Additional Resources


About Linda Lawton

My name is Linda Lawton I have more than 20 years experience working as an application developer and a database expert. I have also been working with Google APIs since 2012 and I have been contributing to the Google .Net client library since 2013. In 2013 I became a a Google Developer Experts for Google Analytics.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.