Introduction
This article is part of a series of blog posts exploring time series databases and TimescaleDB in particular. Today we will see how to get a TimescaleDB instance up and running on Google Cloud.
A time series database is one that is optimized to handle a large volume of chronological data. IoT events or application log data are some such examples of timed datasets. Applications built around these datasets rely on fast inserts and efficient query mechanisms to analyze patterns across various time ranges.
TimescaleDB is built on top of PostgreSQL. This allows you to use a very familiar SQL syntax. It also supports the usual RDBMS semantics including joins, complex predicates and secondary indexes. Be sure to checkout the Getting Started documentation on Timescale site.
Installing TimescaleDB hosted on Google cloud running Ubuntu
We are going to assume here a basic familiarity with Google cloud infrastructure. Article requires that you have a cloud account setup and that you have the GCloud SDK installed your machine.
Startup script for a semi-automated install
gcloud is a command line tool for creating and managing various components on Google Cloud Platform. We will use this to create a virtual machine by passing it a startup shell script to install and configure TimescaleDB. Lets dig into the script.
The startup script is run every single time the VM instance is restarted. To prevent it overwriting our install, we will create a file under /var and use its presence to skip installation on subsequent runs (shout out to Kevin Sookocheff who first used this approach for his Kafka scripts).
1 2 3 4 5 6 7 |
STARTUP_VERSION=1 STARTUP_MARK=/var/startup.script.$STARTUP_VERSION # Exit if this script has already run if [[ -f $STARTUP_MARK ]]; then exit 0 fi |
Next, install TimescaleDB via apt-get
1 2 3 4 5 6 |
# Add in timescale PPA sudo add-apt-repository ppa:timescale/timescaledb-ppa sudo apt-get update # Install timescale sudo apt-get install -y timescaledb |
The script then edits the postgresql.conf file and enables preloading of TimescaleDB libraries and enables PostgreSQL to listen on all IP addresses.
1 2 3 |
sudo sed -i "s/#\(shared_preload_libraries *= *\).*/\1'timescaledb'/" $POSTGRES_CONF sudo sed -i "s/#\(listen_addresses *= *\).*/\1'*'/" $POSTGRES_CONF |
We finish off by restarting the postgresql service and creating our startup mark file.
Note: On Ubuntu, installing PostgreSQL via apt-get auto creates postgres db account. You will notice we skip over this step in the script below.
1 2 3 |
# Add a superuser postgres: # This is not required on Ubuntu as the postgres user account already exists. #sudo createuser postgres -s |
Here isthe complete script (also on GitHub)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
#!/usr/bin/env bash STARTUP_VERSION=1 STARTUP_MARK=/var/startup.script.$STARTUP_VERSION # Exit if this script has already ran if [[ -f $STARTUP_MARK ]]; then exit 0 fi set -o nounset set -o pipefail set -o errexit #Verify this is the correct path for your OS flavor POSTGRES_VER=9.6 POSTGRES_HOME=/etc/postgresql/"$POSTGRES_VER" POSTGRES_CONF="$POSTGRES_HOME"/main/postgresql.conf # Add in timescale PPA sudo add-apt-repository ppa:timescale/timescaledb-ppa sudo apt-get update # Install timescale sudo apt-get install -y timescaledb sudo sed -i "s/#\(shared_preload_libraries *= *\).*/\1'timescaledb'/" $POSTGRES_CONF sudo sed -i "s/#\(listen_addresses *= *\).*/\1'*'/" $POSTGRES_CONF # Restart PostgreSQL instance sudo service postgresql restart # Add a superuser postgres: # This is not required on Ubuntu as the postgres user account already exists. #sudo createuser postgres -s touch $STARTUP_MARK echo Done!!! |
Running the script
Save the script as setup.sh to a local folder and run gcloud
1 |
gcloud compute instances create timescaledb01 --image-family ubuntu-1604-lts --image-project ubuntu-os-cloud --metadata-from-file startup-script=setup.sh --metadata=serial-port-enable=1 --zone=us-west1-a |
This will create a VM instance called timescaledb01 based on ubuntu-1604-lts image. We also enable serial port so we can monitor the logs to see how the script is doing. You may turn it off on the VM Instances page later.
If the script completes successfully it will create a startup.script.1 file under /var. It may take a few minutes for the startup script to finish.
Finishing it up
Setting a password for the database user
ssh into your newly created VM instance and verify that the script has finished successfully.
In order to connect to the server externally we must set the password for our database user. Locally, on the server however, you may login without a password (using peer authentication) and launch psql.
In the ssh terminal type:
sudo -u postgres psql postgres
Next at the psql prompt set the password using the command:
1 |
\password postgres |
Setting up the firewall to allow external connections
First we must configure PostgreSQL for remote access. To do this edit the pg_hba.conf file (under /etc/postgresql/9.6/main in this example). Scroll all the way to the bottom and add the following line:
1 |
host all all [YOUR_IPV4_ADDRESS]/32 md5 |
Replace [YOUR_IPV4_ADDRESS] with your local IP address and ensure that the CIDR suffix /32 is retained.
Restart the PostgreSQL service so that these changes are picked up.
1 |
sudo service postgresql restart |
Next, we create the firewall rule on google cloud portal to allow connections to the server from our local machine.
Set the Targets to “All instances in the network”.- Set the source IP range to your local IPv4 address with /32 CIDR suffix. This must match the IP specified within pg_hba.conf file above
- Set the specified protocols and ports to tcp:5432
Save your changes. You should now be able to connect to your server from your local postgres client – e.g. pgAdmin.
Please try to be more descriptive. What is /var?