DIGITAL ANALYTICS MINIDEGREE / CXL — BLOG 8

Mert Kolay
4 min readMay 16, 2021

In the last blog post, I talked about the Google Data Studio and it’s capabilities in detail. In this blog post, we will explore Google Big Query with the eighth part of CXL Digital Analytics Minidegree.

You can go to the CXL website from here.

Why should you take this lesson and learn BigQuery?

  • Consolidate all your data in one place — no more complexity
  • Automate your marketing reports — break the routine
  • Develop your skills to become a data-driven marketer
  • Learn more about Google Cloud and it’s possibilities
  • Discover the full analytics process from data collection to data visualisation

Introduction to Google Cloud Platform and BigQuery

Google BigQuery is a cloud-based big data analytics web service for processing very large read-only data sets.

BigQuery was designed for analyzing data on the order of billions of rows, using SQL-like syntax. It runs on the Google Cloud Storage infrastructure and can be accessed with a REST-oriented application program interface (API).

In this step let’s understand the Google Cloud Platform.

Google Cloud Platform is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, file storage, and YouTube.

Broadly speaking, GCP can be used for four things:

1. Computing: App Engine, Compute Engine, Container Engine.

2. Storage: BigTable, Cloud Storage, Cloud SQL, Cloud Database.

3. Big Data: BigQuery, Pub/Sub, Dataflow, Dataproc, DataLab

4. Machine Learning: Vision API, Machine Learning, Speech and Translate API.

BigQuery can be used for data collection, data consolidation, data transformation, data warehousing and then running analytics and creating visualization models on top of that data. You can put every data you can imagine to BigQuery.

Getting Data Into BigQuery

There are different ways of getting the data into BigQuery:

a. BigQuery Data Transfer Service.

b. Google Analytics 360

c. Cloud Storage

d. Google Cloud Dataflow

e. Google Cloud Function

You can also load data from a readable data source:

  • You can load data manually
  • You can use API or client libraries

API is a set of clearly defined methods of communication among various components-applications, software.

Also, certain connectors can be used as well. For example, there is a connector for Microsoft Excel, there are other connectors like Spark BigQuery Connector and Hadoop BigQuery Connector as well. You can use data transfer for loading data from Google and Amazon sources as well. This can be accomplished by going in the transfer section in BigQuery and selecting source type wherein you would have to fill in all the data source and data destination information.

Streaming Into BigQuery

  • Create cloud pub/sub topic
  • Read from pub/sub in dataflow
  • Compute widowed aggregates
  • Stream into BigQuery

For real-time reporting, you can use BigQuery Streaming services to Load data immediately without delay. There are three ways of accomplishing it:

a. Create your custom solution.

b. By Using Google solutions like Google Dataflow.

c. Using some sort of paid solutions

Working With Data In BigQuery

The simplest way of working with data in BigQuery is by using SQL. BigQuery supports SQL for querying and working with the data. In this section we will talk about some basics of SQL that you can use at BigQuery.

Each SQL query must contain at least two parameters:

  • SELECT — defines the columns you would like to pull
  • FROM — defines the table to pull them from

The syntax will look like:

SELECT column1, column2, …

FROM table_name;

  • WHERE is used to filter data
  • ORDER BY is used to order the results
  • GROUP BY is used to calculate aggregate totals

Joins

Join is the widely-used clause in the SQL Server essentially to combine and retrieve data from two or more tables. In a real-world relational database, data is structured in a large number of tables and which is why, there is a constant need to join these multiple tables based on logical relationships between them. There are four basic types of Joins in SQL Server — Inner, Outer (left, right, full), Self and Cross join.

(INNER) JOIN: Returns records that have matching values in both tables

LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table

RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table

FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table

Analyzing Data in BigQuery

There are two important concepts that we need to understand.

1. Nested Fields: A ReQL document is a JSON object: a set of key-value pairs, in which each value might be a single value, a list of values, or another set of key-value pairs. When the value of a field contains more fields, we describe these as nested fields.

2. Partitioned Tables: A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.

There are some connectors that you can use to visualize the data using BigQuery

  • Tableau
  • QlikView
  • Power BI
  • OWOX

Google Sheets can be connected with the BigQuery and with this way you can automitize your data according to you. Also you can connect Google Data Studio and make your automation.

Best Practices

  • Views is when you create a query and you save the view, it means like a virtual table that you can query afterwards.
  • HAVING Works with your data after all the calculations you performed.
  • Before HAVING you need to make a GROUP.
  • UNION ALL SELECT all the fields from table two.
  • WITH clause is used when you want to perform some operations or some queries with another query inside.

BigQuery is a powerful tool that can be used for powerful analytics for small and big data. Also, it can be scaled for petabytes of data. The possibilities are endless. You can create a BigQuery account and make your own practices.

See you next week with a new blog post containing details from the CXL digital analytics mini degree program.

Thanks,

Mert Kolay

--

--