GA4 and BigQuery: Linking Google Analytics 4 with BigQuery
Google Analytics 4 has a number of drawbacks, especially for those who really liked Universal Analytics. But the linking of GA4 with Google BigQuery is objectively one of the benefits. After all, this was a paid feature with Universal Analytics while the GA4 BigQuery linking is free. Let’s check it out in this blog.
This is a bit of a technical topic, so we’re going to break it down into sections to make it a bit more digestible.
BigQuery and GA4 Topic Overview
We’ll cover 4 things in this blog post.
- What is Google BigQuery and what are some of the benefits of using it in conjunction with GA4?
- What is Google Cloud and why do we need it in this GA4 BigQuery process?
- The GA4 BigQuery schema: how is data exported from GA4 to BigQuery?
- How do we integrate GA4 with BigQuery?
If you’re here, I assume you’ve already set up AND configured your GA4 property properly. If you’re not sure that’s you, you might want to check these 10 steps to proper GA4 configuration, including your settings for data retention settings and session timeout, as well as your internal traffic filter.
Great, you’re still here!
Video Walkthrough Support
If you’re interested in the complete video tutorial, you can watch it embedded below. If you’d prefer to read, we’ll reference select portions of the video later in this blog (with timestamps) so you can see the click-by-click steps at certain points. Specifically, we’ll reference the video in creating the Google Cloud project for BigQuery and also when we complete the GA4 and BigQuery linking towards the end of this article.
By the way, this is a guide written on a highly technical subject by someone who is *not* a developer. So if you’re a SQL celebrity or a Google Cloud wizard, you might want to look elsewhere. But if you’re not currently an expert to that level, but *are* generally interested in learning more about topics like Google Analytics 4, Google Tag Manager, Data Studio (now Looker), and maybe even SEO, you have come to the right place.
Now, let’s get technical!
What is Google BigQuery?
According to Google, BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.
A data warehouse is a database that can collect your data from various sources like Google Analytics, Google Ads, Facebook Ads, and more. The data warehouse organizes it in a single place from which it can then be queried to answer questions.
So we can see how the platform gets it’s name, right? It’s a massively BIG storehouse of data, that can then be asked questions (QUERIED) to help gain insights about the business. In our case, the raw data from GA4 can be sent into this big data warehouse. We can then transform and manipulate the data to (hopefully) get smarter and make better decisions.
What’s the Big Deal With GA4 and BigQuery?
With Universal Analytics, a link to BigQuery was only available for enterprise users who paid for GA360. Google Analytics 360 starts at $150,000 a year ($12,500 per month). This is a big jump from the zip / zero / nadda dollars that most of us are paying for our access to GA4. The fact that this enterprise feature is available for free has been cause enough for Google Analytics users to perk up and pay attention.
Of course, if it was free and added zero value, that wouldn’t exactly be a great “value proposition.” But BigQuery does offer some solid benefits for those who choose to use it.
What are the Benefits of GA4 and BigQuery?
Here are a few of the benefits:
- You can combine your GA data with data from other sources like CRM systems, email providers, and more. You can even join it with offline sales data, should you have that kind of data for your business. When all of this data is in a single warehouse, you may be able to find patterns and insights that you would miss if you were analyzing each dataset in isolation.
- There is no data sampling to worry about since the raw data from GA4 is sent directly to BigQuery. Sampling is when only a subset of data is analyzed, and that subset is then used to represent the broader whole. Custom reports in GA4 can be subject to data sampling. No sampling here.
- BigQuery can be asked questions with SQL queries that allow for high-level analysis. You can create reports with any number and combinations of metrics.
- You can connect your BigQuery data with visualization platforms like Tableau or Google Data Studio (now called Looker). It’s true that you can also send your GA data to Looker directly from GA4 (here are some tutorials of cool reports you can build like a site search report or a report on all link clicks), but there are more metrics and dimensions available from BigQuery.
So there’s lots of potential upside, and no fixed cost to get started. It should be noted that there can be a cost for on demand queries, but this is likely not a factor for most small websites.
As you can see below from this Google Cloud pricing overview, the price for data processing in the US is $5 per terabyte. How much is a terabyte? It is one trillion bytes, or 1,024 gigabytes. So a gigabyte costs a little less than $.05.
So go nuts. Go analyze a gig’s with of data you crazy spender, you. There is also a potential charge for active storage. The first 10 gigabytes are free each month. After that, it costs $0.02 for each gig of active storage.
What is Google Cloud and What Role Does it Play?
The Google Cloud Platform is a suite of cloud computing services offered by Google. Competitors include Amazon AWS, Microsoft Azure, and IMB Cloud. The way to Google BigQuery is through the Google Cloud Platform. Fortunately, this is not overly onerous. You need to create a Google Cloud account (if you don’t have one), and then create something called a “project” for Google BigQuery.
To create a Google Cloud account, all you need is an email address. You can create an account here, if you don’t have one already. From there, you’ll need to create your project for BigQuery.
What is a Google Cloud Project and Why Do I Need One for BigQuery?
A project organizes resources in Google Cloud. All data that exists in Google Cloud Storage belongs to a project. You will need to create a project for Google BigQuery in order to have a place to send your GA4 data.
A project consists of a set of users; a set of APIs; and billing, authentication, and monitoring settings for those APIs. An API is an “Application Programming Interface.” It’s basically a mechanism that enables two different pieces of software to communicate. In our case, the two pieces of software are BigQuery and GA4.
If you haven’t yet watched the video at top, jump ahead to the 5:12 mark to walk through how to create your BigQuery project in Google Cloud. You’ll ultimately get to something that looks like the screen below, where you can select “New Project” (see the top right) and create your Project Name for BigQuery and get a corresponding Project ID.
The Project Name and Project ID from this step, will be the same credentials that are used to link your BigQuery project with GA4 later in this guide. But before we get ahead of ourselves, we need to understand how our GA4 data will be exported to BigQuery. This is called the “GA4 Export Schema.”
GA4 BigQuery Schema
We can get a picture of how the GA4 to BigQuery export schema looks with 4 terms: dataset, tables columns, and rows.
In brief, the dataset is the entirety of the GA4 data. The table is how the data is exported to BigQuery each day. Each row contains a single event and the columns help to describe the event.
By now, you have probably heard that GA4 has an “event based data model.” This means that all “hits” (web interactions) are recorded as events and corresponding event parameters. This compares to a session and page view model in Universal Analytics. Check out the stark difference in the table below. Understanding this fundamental difference helps to make sense of the export schema. Read on below, or if this paragraph confused you, you may want to read more about GA4 vs. UA or about GA4 events.
Alright, let’s do this export thing with datasets, tables, columns, and rows. What do we need to know?
What’s a Dataset in BigQuery?
For each Google Analytics 4 property that is linked to BigQuery, a single dataset named “analytics_<property_id>” is added to your BigQuery project. This “Property ID” refers to your GA4 Property ID that you see in your Data Stream in Admin.
What’s a Table in the BigQuery Export Schema?
Within each GA4 dataset, a table is created at least once a day. Why at least? Well, you have two options for how your raw data is exported.
Option 1 is to choose the “daily export option.” With the daily approach, a table of all your GA4 data is created and exported to BigQuery at the end of each day. The table’s name is events_YYYYMMDD, where those Y’s, M’s, and D’s are replaced by the actual date.
Option 2 is to choose the “streaming export option.” With this option, a table named events_intraday_YYYYMMDD is created and it is updated throughout the day as new data is recorded. At the end of each day, your intraday table is deleted and is replaced by the same full day table explained in Option 1.
Which one should you do? If you need to analyze your data as soon as it is happening, then consider the streaming export. Personally, that is a degree of intensity that I don’t have and I went with the daily export.
The table of exported data contains data in both Columns and Rows.
What’s a Column in the BigQuery Export Schema?
The Columns with the export fall into the following categories: App, Device, Stream and platform, User, Campaign, Geo, Event, Ecommerce, Items, Web, and Privacy Info. Check out the Event related elements in the export schema below. You can find the complete export schema from Google here.
These different data types (strings, integers, records, and more) are all eligible to show up as Columns in the export.
What’s a Row in the BigQuery Export Schema?
Remember when we said how the GA4 data model is “event based”?
Each row in the exported table corresponds to a single event. Each event has information on event-specific parameters with associated values. Here’s an example below for a page view event (page_view). The event_date and event_timestamp are in the yellow box. In the blue boxes we can see the event parameters (“event_params”), in a pair of a key and a corresponding value. The parameter name (the key) is a constant and the value is a variable that changes with the specific event. For example, the source for this particular event is “google”, but other events will doubtless have the same “source” event parameter with values like “bing” or “facebook” or “duckduckgo.”
How to Link GA4 With BigQuery?
By this time, you have your Google Cloud project for BigQuery. It’s basically just a shell at this point, but that’s fine. Hopefully, you also understand the GA4 BigQuery export schema so you have a framework for all this data and so you can make an informed decision about the “daily” vs. “streaming” export option. This is good news.
From here, it’s best to follow along with the video above. But if you’ve avoided it until now, the good news is you can skip the first 11:29 and go right to 11:30 where we’ll pick things up. You’re only about 7 minutes away of video time from the end.
To start this final phase, you’re going to get started by going to the Admin section of your GA4 property. In the property settings in the center of your screen, you should see “BigQuery Links.” Check it out below and then select it.
And this is what it will look like once you have linked in your BigQuery project.
Once you’re done, you’ll just need to learn how to query your data so you can see how much this GA4 feature will really help you. That’s what I’m going to be working on learning. There will be a subsequent blog if and when I figure that out.
If you’re interested in more about GA4 reporting functionality, some key resources are highlighted below.
What Other Reporting Features Are Useful in GA4?
Reporting in GA4 is quite different from UA. Not only are Views a thing of the past in GA4, but many other elements are either completely new or quite different.
Here are several resources that should make things easier for you. Feel free to check out the linked blogs and video tutorials.
Event Parameters in GA4
A “parameter” is a piece of data that adds additional context to a web interaction (an “event”). For example, an internal link click might be called internal_link_click event. The event will register and record data each time the event occurs. But if you want to see additional information like the specific link that was clicked or the text of the link that was clicked, you’re talking about event parameters.
Event parameters are handled very differently in GA4 than in UA. And any comprehensive understanding of reporting requires understanding GA4 event parameters.
Site Search Tracking in GA4
Site search is one of the “enhanced measurement” events in GA4 that is tracked automatically in GA4. If you want to see the specific terms that are searched on your site in addition to the total count of searches, you’re talking once again about those pesky event parameters. This guide to GA4 site search will show you to build a quick report and get useful insights about what your visitors are looking for on your site.
Path Analysis in GA4
The Path Exploration report in GA4 is one of the new reports that is available within the “Explore” section of GA4. Path Exploration compares to the Behavior Flow report in UA and represents a big upgrade for GA4. Unlike in UA, the Path Exploration report works based on either a starting point OR an ending point. Here’s how to use it.
Campaign Reporting With UTMs in GA4
Do you know how to access your UTM tagged campaign traffic in GA4? This is necessary to track how your marketing campaigns are performing. It’s also useful for measuring traffic mediums (like email) that Google Analytics can’t classify on its own. New parameters like utm_content and utm_term were only added in July of 2022. In other words, effective campaign reporting is a relatively new feature in GA4. Check out the guide to Google Analytics 4 UTM parameters here.
Looking for something else? Feel free to leave a note in the comments or on the Root and Branch YouTube channel at youtube.com/@rooted-digital. We’re always looking for new content ideas!
Don’t forget that Universal Analytics data will no longer be processed in that platform beginning July 1, 2023. In other words, now is the time to get more comfortable with GA4.
If you’re still someone learning about GA4 (as I am), I’d recommend checking out this GA4 vs. UA comparison or this list of updated GA4 questions. You can also subscribe to the Root and Branch YouTube channel for an updated video every week or so. I’ll see you there! There are explainers and tutorials for tracking like this.
- GA4 page timer tracking
- Bounce rate in GA4 vs UA
- How to set up a GA4 form submission conversion
- How to see UTM tagged campaign data in GA4
- How to create the Source/Medium traffic report in GA4
- How to set up a custom dimension in 7 steps (and why you need to if you want to see event parameter data)
- Goals in GA4 explained vs UA
- How to link Google Ads and GA4
- How to link Google Search Console and GA4
About Root & Branch
Root & Branch is a certified Google Partner agency and focuses on paid search (PPC), SEO, Local SEO, and Google Analytics. You can learn more about us here. Or hit the button below to check out YouTube for more digital marketing tips and training resources.
Thanks for such a detailed article, I enjoyed reading it! We also extract our GA4 data to BigQuery and it streamlines our reporting a lot. We use Coupler.io tool for data export. The biggest advantage of this tool is that it can pull our data on schedule we need. Here’s a link to learn more about it https://www.coupler.io/bigquery-integrations/ga4-to-bigquery in the case someone also needs something easy and fully automated.