This is part of a series on my ambient data logging and visualisation setup.

I’ve already given an overview of the entire architecture, and written about setting up the Raspberry Pico(s) as sensor devices. Now I’m going to explain how I collect that data in a CSV and Postgres database.

Dumb sensors

The sensors themselves are designed to be quite dumb. They don’t log data locally because it’s easier to access it if it’s stored centrally. They don’t proactively post data, because it’s more convenient to determine the frequency of readings centrally. You don’t want to be syncronising data or changing frequency or data logging by downloading and uploading data to each of your sensors.

The downside is that if data doesn’t get collected, it doesn’t exist. This is why we need this part of the system: it needs to query the sensors regularly and record the response. A potential disadvantage is that if the connection is broken, then data gets lost.

Docker

The code is designed to be run with Docker (or potentially another container solution). This is why (at the moment) there is no way to specify, e.g. as a command-line argument, where to store the CSV file, or where to get your settings from. You simply mount the folder where these files exist when running the container.

There is, however, a script in the respository for running the container. You can specify the data directory and the cache directory (more on that later) as arguments.

No timer included

This script only takes one reading from each of the sensors, logs them, then it’s done. So the way to take regular readings is to use a cron job or systemd service to fire up the container as often as you want.

My cron looks like this:

*/10 * * * * /home/johnheaven/repos/ambient-data-logger/docker-run.sh /mnt/general-files/ambient-data /home/johnheaven/data/ambient-data-logger/cache

It runs the docker container every ten minutes, then the data gets logged. /mnt/general-files/ambient-data is where the CSV with all my data is, and /home/johnheaven/data/ambient-data-logger/cache is the cache for the IP address, which I’ll explain later.

Logging to CSV

As explained above, the data gets logged to CSV. This is always within the data directory for the reasons mentioned above. The headings don’t get written to the CSV file (as it’s a once-in-a-lifetime operation), so if you want them you have to (carefully) add them before or after generating the first readings.

The headings are:

time,temp,pressure,humidity,sensor,pico_id,pico_uuid

Logging to Postgres

Additionally, data gets logged to Postgres (or, with modification, any other database). It just has to be available through the data you specify in environment variables (or a .env) file. They should be fairly explanatory and here is an example:

DB_DRIVER=postgresql
DB_USER=jheaven
DB_PASS=[your_password_here]
DB_HOST=postgres
DB_PORT=5432
DB_NAME=jheaven

If you don’t know what a .env file is: it is a way of passing variables to Python that get overridden by environment variables if they exist. In our case, assuming we’re using Docker, these environment variables will only exist if they are passed into the run command with the -e option. You could do it that way, too, if you modify the run-docker.sh script.

So what does the code do and how do I set it up?

Okay, so we’ve got a rough idea of how this thing all fits together. Here’s how to install it, step by step. I’m going to assume you have a Raspberry Pi capable of running Docker (i.e. not running Raspbian but something like Manjaro or Ubuntu Server instead).

  • Install docker sudo pacman -S docker or apt-get install docker
  • Get the repo and put it somewhere sensible git clone https://github.com/johnheaven/ambient-data-logger.git
  • Set up a Postgres database (preferably with Docker) -> docker run --mount type=bind,dst=/var/lib/postgresql/data -d -e POSTGRES_USER=[pgusername] -e POSTGRES_PASSWORD=[pgpass] -e POSTGRES_DB=[dbname] -p 8080:8080 --restart always --name postgres postgres:15-alpine
    • You will need to replace [pgusername], [pgpass] and [dbname] with appropriate values and include them in the .env file
    • This will be available on your local network too, so bear that in mind if it poses a security issue for you
  • Create a .env file as above, and insert the values you just used to create your Postgres database
  • In the src directory, copy example.settings.py to settings.py and insert the name of each of your sensors as a string in the list. You should have assigned these while setting up your Pico(s)
  • Build the Docker image by changing into the directory where the git project has been cloned into, then typing docker build . --name ambient-data-logger
  • Set up your cron job with crontab -e, e.g. */10 * * * * [location of rep]/ambient-data-logger/docker-run.sh [folder where CSV should be stored] [folder where cache can be stored]
  • It should work now. Fingers crossed…

What the code does

There’s not a whole lot going on in the code. It just wants to know which sensors to query, then it queries them with the requests library, logs the data, and it’s done.

It uses sqlalchemy for the SQL part, which is why you might get away with using a database other than Postgres.

There’s one catch, which makes things a little complicated (or it did for me, at least). The script needs to know where on the network to find the Picos. But the Picos can’t identify themselves: it’s impossible to set a custom hostname. Normally, a fixed IP would do the trick, but my router doesn’t allow that. So I came up with a hacky solution, which is the ip_search object.

It starts from a particular IP, and tries to contact a device. If that doesn’t work, it tries a fixed number of IPs before giving up. It also checks whether the ID is correct (which is something I need to optimise), and only logs data if it’s the right one.

To save it doing this search every time, it caches the best IP in a JSON file. So next time, it starts from there and this is almost always the right IP at least for a few weeks. If it doesn’t work on your router, e.g. because your network allocates IPs in a different range, you could try modifying the following part of the code:

 ips = ip_search(starting_ip_last_3=starting_ip_last_3, max_steps=10)

This is where the ip_search object is initiated. There’s a default option ip_template='http://192.168.2.%s/data/'. It only changes the last 3 numbers in the IP. You can modify the bit before that to match your network setup. (This is rather crude, I know.)

You can also change how far it searches: by default, it only searches 20 steps (10 steps ‘outwards’ in either direction). That’s the max_steps parameter.

Finally, the place it starts might be wrong. For the very first time it starts, it’s hard-coded to start at 145 (i.e. 192.168.2.145) and search from there. That’s the starting_ip_last_3 = 145 statement in ambient-data-logger.py. You could try changing that and seeing how you get along. Maxing out the steps might be the easiest way, because the next time it will look up the best value from the cache.

And that’s it. The only step left in the process is to start visualising the data with Grafana.