Automate Firestore DB Backups

How to automatically create Cloud Firestore database backups at regular intervals.

I can't emphasize enough on the fact how important it is to backup your application data. I see a shift in thought as more and more applications are relying on cloud for all their data need. We are under a false assumption that our data is safe and easily recoverable in the cloud. However the truth is, using a cloud infrastructure doesn't necessarily guarantee a disaster recovery plan. It's our own responsibility to ensure that our data is safe and in case of failures, we are able to recover as much of data and information as we could. The use case intensifies with incidents like the recent AWS disaster that left many applications with massive data losses.

Using a cloud infrastructure doesn't necessarily guarantee a disaster recovery plan.

Luckily if you are using Firebase Cloud Firestore database as your data source, you can take a backup of your entire database from the command-line using the gcloud tool by following their instructions. However, it's still a manual process, and in this era of automation, we would like to have this backup happen on its own, so we can have a peace of mind and spend our time and efforts building great applications. Let's see how we can achieve this automation in few simple steps and a bit of code.


Some background to facilitate the automation

Steps

  1. Create a folder inside your firebase storage bucket, eg: firestore_backups. This folder will contain all your backups that would be automatically created by the setup at regular intervals.
  2. Enable the Cloud Datastore Import Export Admin permission for the App Engine default service account of your project. Make sure to have this step executed before trying the further steps, or else the backup process will fail.

3. npm install google-auth-library in the functions folder of your firebase project and create a file backup.js with the following code.

/**
 * For the backup to work, make sure the following is done:
 *
 * - Set permission: https://console.cloud.google.com/iam-admin/iam
 *    `Datastore -> Cloud Datastore Import Export Admin`
 *    on the service account IAM role
 * - Set permission: https://console.cloud.google.com/storage/browser
 *    `Firebase Admin and Storage Admin`
 *    on the service account
 * - Create a bucket folder with the same name as
 *    the value of BACKUP_FOLDER variable
 */

const {GoogleAuth} = require('google-auth-library')

// process.env.FIREBASE_CONFIG is automatically populated in the
// cloud functions runtime, however we need to parse it, as a json string
const FIREBASE_CONFIG = JSON.parse(process.env.FIREBASE_CONFIG || '{}')
// the folder that was created in the default storage bucket
const BACKUP_FOLDER = 'firestore_backups'

module.exports = async function backup() {
  console.log('start firebase backup')

  const auth = new GoogleAuth({
    scopes: [
      'https://www.googleapis.com/auth/datastore',
      'https://www.googleapis.com/auth/cloud-platform',
    ],
  })
  const client = await auth.getClient()

  const {storageBucket, projectId} = FIREBASE_CONFIG
  const url = `https://firestore.googleapis.com/v1beta1/projects/${projectId}/databases/(default):exportDocuments`
  const backupFileName = new Date().toISOString()
  const backupUrl = `gs://${storageBucket}/${BACKUP_FOLDER}/${backupFileName}.json`

  await client.request({
    url,
    method: 'POST',
    data: {outputUriPrefix: backupUrl},
  })

  console.log('end firebase backup')
}

backup.js file inside functions folder

4. Create a scheduled cloud function by adding the following code in your index.js file.

const functions = require('firebase-functions')
const backup = require('./backup')

// runs every midnight
exports.dbBackup = functions.pubsub
  // change this to preferred frequency 
  .schedule('every day 00:00')
  // set it to whatever timezone you prefer
  .timeZone('Europe/Berlin')
  .onRun(async context => {
    try {
      await backup()
    } catch (err) {
      console.error('error running db backup cron', err)
    }
  })

index.js file inside functions folder

5. Deploy the function with firebase deploy --only functions which will deploy the cloud function and also create the necessary Pub/Sub trigger for the cloud scheduler to run the backup function at the specified intervals.

Voilla! We have our automated Cloud Firestore database backup :)

To ensure if everything is working as expected, you can manually trigger the scheduled function by going to the Cloud Scheduler settings in the Google Cloud console and clicking on Run Now button next to the function name. Wait until the dbBackup function has finished executing (check cloud functions logs from the Firebase console). If everything went well, you'd see a new folder inside of the firestore_backups folder of your firebase storage bucket. The folder will have the name as the current timestamp and would contain the backup of your entire Cloud Firestore database from the same timestamp.


If you happen to run into issues or spot any mistakes with this article, please feel free to comment and I'd try my best to help you out / correct the mistakes.

Happy Coding!