Sync Firestore to BigQuery
Introduction
One of the pain points of Avada SaaS is that we have disconnection between app database and data warehouse. We have not yet built a unified data warehouse that we can regularly perform analytics, ETL, reverse ETL actions. Normally, it relies on getting data from an app database directly, which is not a good practice. Furthermore, the Product or the Marketing team find it hard to get the data to send emails, perform marketing campaigns, or even analyze the data for customer 360.
Sync to BigQuery
From Firestore to Big Query will help you with leveraging the data warehouse power and SQL to perform analytics, ETL, reverse ETL actions. However, it is quite a challenge to store dynamic app data because of Big Query's DML with recently streamed data.
With this limit, you cannot update or delete rows that were recently streamed to BigQuery. This limit is 30 minutes. This means that you cannot update or delete rows that were recently streamed to BigQuery. This limit is 30 minutes. So, we can only use an insert-only approach to sync data from Firestore to BigQuery, which can be achieved by using Stream Firestore to BigQuery.
How to set up Firestore to BigQuery
In this post, we will only take about syncing the shops
collections to Big Query, this is the most important collection that we need to sync to Big Query.
You can install many instances of this extensions to sync other collections to Big Query as well.
Step 1: Install the extension
You can install the extension on your Firebase project via Extensions Store. Most important configurations are:
- Collection path:
shops
- Dataset ID:
firestore_export
Or you can change accordingly - Transform function URL (Optional): If you want to omit token fields, format ISO time, then use our prebuilt
transformer:
https://us-central1-avada-fs-bg-transformer.cloudfunctions.net/handler
Otherwise, build your own.
Step 2: Launch the extension
It will take a few minutes to install the extension, around 3-5 minutes. Stay patience. In the meantime, prepare step 3.
Step 3: Sync previous data
Firebase team may recommend you to use the fs-bq-import-collection
extension to sync previous data to Big Query.
However, this extension does not use the Transformer URL configured in the extension configuration. So, you need to build your own script to sync previous data to Big Query.
Put this field in your functions/commands
to sync previous data to Big Query. This is a one-time job, you can only
run this after the extension is installed.
const serviceAccount = require('../../serviceAccount.production.json');
const {formatDateFields} = require('@avada/firestore-utils');
const admin = require('firebase-admin');
const {chunk} = require('@avada/utils');
admin.initializeApp({credential: admin.credential.cert(serviceAccount)});
const db = admin.firestore();
const projectId = `PROJECT_ID`;
const collectionId = `shops`;
const dataset = 'firestore_export';
const table = 'shops_raw_changelog';
const {BigQuery} = require('@google-cloud/bigquery');
/** @type BigQuery */
const bigQueryClient = new BigQuery({
keyFilename: '../../serviceAccount.production.json',
projectId: projectId
});
const bigQueryTable = bigQueryClient.dataset(dataset).table(table);
const shopRef = db.collection('shops');
(async () => {
try {
const shopDocs = await shopRef.get();
const shopData = shopDocs.docs.map(doc => ({
...formatDateFields(doc.data()),
id: doc.id
}));
console.log('Size', shopDocs.size, shopData.length);
const bigQueryData = shopData.map(shop => {
const fieldList = Object.keys(shop);
const keyList = fieldList.filter(key => key.toLowerCase().includes('token'));
const cleanData = omit(shop, [...keyList, 'id']);
return {
timestamp: new Date(),
event_id: '',
document_name: `projects/${projectId}/databases/(default)/documents/${collectionId}/${shop.id}`,
operation: 'IMPORT',
data: JSON.stringify(cleanData),
old_data: null,
document_id: shop.id
};
});
console.log('bigQueryData', bigQueryData.length);
const chunks = chunk(bigQueryData, 1000);
for (const chunk1 of chunks) {
console.log('One chunk at a time', chunk1.length);
await Promise.all(
chunk1.map(row => {
return (async () => {
try {
const response = await bigQueryTable.insert(row);
// console.log('Res', response);
} catch (e) {
console.log(JSON.stringify(e));
}
})();
})
);
}
} catch (e) {
console.error(e);
}
})();
function omit(obj, keys) {
const keysToRemove = new Set(keys); // flatten the props, and convert to a Set
return Object.fromEntries(
Object.entries(obj) // convert the object to entries
.filter(([k]) => !keysToRemove.has(k)) // remove entries with keys that exist in the Set
);
}
Step 4: Query the data
Since the data is stored as JSON, you need to use JSON_EXTRACT_SCALAR
See this example query:
SELECT
JSON_EXTRACT_SCALAR(data, '$.shopifyDomain') AS shopifyDomain,
JSON_EXTRACT_SCALAR(data, '$.email') AS email,
JSON_EXTRACT_SCALAR(data, '$.plan') AS plan
FROM `avada-blog-staging.firestore_export.shops_raw_latest`
WHERE
TIMESTAMP(JSON_EXTRACT_SCALAR(data, '$.installedAt')) > TIMESTAMP('2024-12-27T07:01:02.426Z')