Welcome to my beginner-friendly tutorial on analyzing Google Analytics 4 data using Python and Pandas. In this guide, we’ll walk you through a simple Python script that fetches data from a Google Analytics property and shows you how to analyze it using the Pandas library.
Prerequisites
Before we begin, make sure you have the following:
- Google Analytics 4 Property ID: Replace
GA4_PROPERTY_ID
in the code with your actual property ID. - Google Analytics Data API Credentials: Create a service account on the Google Cloud Console and download the JSON key file. Update
CREDENTIALS_FILE_PATH
in the code with the path to your key file.
Setup
Let’s set up the environment. Open your terminal and install the required packages:
pip install google-analytics-data
pip install pandas
Unlocking the Heart of the Code
In simple terms, the Python script interacts with Google Analytics using the Google Analytics Data API. It fetches information about user activity on your website and converts it into a table, making it easy to understand and work with.
The first thing we need to do is create the BetaAnalyticsDataClient this is done by passing the credetials file
client = BetaAnalyticsDataClient(credentials=credentials)
Then we build our request. By adding the dates, metrics and demensions to a RunReportRequest
dates = [DateRange(start_date="7daysAgo", end_date='today')]
metrics = [Metric(name='activeUsers')]
dimensions = [Dimension(name='city')]
request = RunReportRequest(
property=f'properties/{property_id}',
dimensions=dimensions,
metrics=metrics,
date_ranges=dates,
)
Then we can run the report using the client we built.
response = client.run_report(request)
Covert GA4 response to Pandas DataFrame
The data for a pandas data frame is in the form of a two dimensional array. Where each row contains an array of data. Here would be an example of the data.
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Then we also need an array of the column names, in this case the names of the dimensions and metrics in the report. Here would be an example
['city', 'activeUsers']
Note that the Column a single array nota multi dimensional array as the rows are.
Converting the Google Analytics GA4 response to these two types of arrays is simply a question of looping though with a for loop.
# Grab all the columns in the result
columns = []
for col in data.dimension_headers:
columns.append(col.name)
for col in data.metric_headers:
columns.append(col.name)
Then looping though each row to build the array’s of each row.
rows = []
for row_data in data.rows:
row = []
for val in row_data.dimension_values:
row.append(val.value)
for val in row_data.metric_values:
row.append(val.value)
rows.append(row)
Then all you need to do is load it into Pandas as a DataFrame
pd.DataFrame(rows, columns=columns)
Further Analysis with Pandas
Now, the cool part! We use Pandas to dig deeper into the data. Pandas is like a superhero for tables in Python. You can filter, sort, and analyze the data in various ways. We’ve provided some examples in the script.
Conclusion
Analyzing Google Analytics data with Python and Pandas is like unlocking the secrets of your website’s performance. This tutorial gives you a taste of how to fetch and understand the data. Don’t hesitate to explore more with different dimensions and metrics to get insights tailored to your needs.