Skip to content

Linuxer Pro Posts

Building and running Google Chromium OS on Raspberry Pi 3


I’ve been working on porting Google Chromium OS to various platforms in the last few months. One of the platforms is the very popular single board computer Raspberry Pi 3. Our team managed to successfuly build and run Chromium OS on it, and we have open sourced our work on Github.

This blog post is a re-post of the document in our Github repository, which is also written by me. You are welcome to leave feedback on my blog or on Github.

Hapy hacking!

Table of contents


This document describes how to build and run Google Chromium OS on Raspberry Pi, from its source code and the board overlay hosted in this repository.

This overlay can be used to build a Chromium OS image with Xorg/X11 as the graphics stack. As Google moved off from X since Chrome OS release 57, this overlay targets and was tested against the release 56 of Chromium OS.

This overlay and the document has been tested against Raspberry Pi 3 by the Flint team. It may also work on Pi 2 but is not tested, you are welcome to test it and send feedback.

About this repository

The code and document in this repository is the result of works by the people of the Flint team. We previously worked on this overlay internally and released a few disk images for Raspberry Pi to the public. Now we open this to the public.

Goal of this repository

  • To provide a open source code base that everybody can use to build and improve Chromium OS for Raspberry Pi.
  • To make as less change to the original Chromium OS code and process as possible, so that people can study and get used to the Chromium OS development process. We may provide scripts later to ease the process.

Typography Conventions

Shell commands running in the host OS are prefixed with the $ sign, like below.

$ cd /mydir

Shell commands running in the Chromium OS chroot environment are prefixed with (cr) $, like below.

(cr) $ cd /mydir         # This is a comment for the command. It should not be included in your command.

System requirement

  • A x86_64 system to perform the build. 64-bit hardware and OS are must. The Chromium OS is a very large project, building from the source form scratch usually takes hours to over 10 or even 20 hours, depends on the system configuration.
    • CPU: we recommend using a 4-core or higher processor. The Chromium OS build process runs in parallel so more cores can help shorten build time dramatically.

    • Memory: we recommend at least 8GB, plus enough swap space. Linking Chrome(the browser) could require more than 8GB of memory, so you will run into massive swapping or OOM if you have less memory.

    • Disk: at least 100GB of free space, 200GB or more is recommended. SSD could noticeably shorten the build time as there are many gigabytes of files need to be written to and read from the disk.

    • Network: total source code downloading will be over 10GB. A fast and stable Internet access is going to be very helpful.

  • A x86_64 Linux OS, it is called as the host OS later in this doc. The Chromium OS build process utilizes chroot to isolate the build environment from the host OS. So theoretically any modern Linux system should work. However, only limited Linux distros are tested by the Chromium OS team and the Flint team. Linux versions that are known to work:

    • Ubuntu Linux 14.04 & 16.04
    • Gentoo Linux
  • A non-root user account with sudo access. The build process should be run by this user, not the root user. The user need to have sudo access. For simplicity and convenience password-less sudo could be set for this user.

Prepare the system

Install necessary tools

Git and curl as the essential tools need to installed in the host OS. Python 2.7 is required to run scripts from Google depot_tools package. Python 3 support is marked as experimental by these scripts, so use it at your own risk, feedback are welcome.

Follow the usual way to install them on your host OS.

Install Google depot_tools

The depot_tools is a software package of scripts, provided by Google, to manage source code checkouts and code reviews. We need it to fetch the Chromium OS source code.

$ sudo mkdir -p /usr/local/repo
$ sudo chmod 777 /usr/local/repo
$ cd /usr/local/repo
$ git clone

Then add depot_tools directory to PATH and setup proper umask for the user who is going to perform the build. Add below lines to the file ~/.bash_profile of that user. Or if you are using a different shell, handle that accordingly.

export PATH=/usr/local/repo/depot_tools:$PATH
umask 022

Then re-login to make above changes take effective.

Configure git

Better configure git now or it may complain in some operations later.

$ git config --global "[email protected]"
$ git config --global "Your Name"

Get Chromium OS source code

Create directory structure

The directory structure described here is a recommendation based on the best practice in the Flint team. You may host the files in a different way as you wish.

$ mkdir -p /project/chromiumos-R56      # This is the directory to hold Chromium OS source code.
$ mkdir -p /project/overlays            # This is the directory to hold this repository.

Fetch Chromium OS source code

Fetching of Chromium OS source code may take 10 to more than 30 minutes depends on your connection speed.

$ cd /project/chromiumos-R56
$ repo init -u --repo-url -b release-R56-9000.B
$ repo sync -j8         # Raise this number if you have a fast Internet connection

Request for Google API key

If you would like to login into the Chromium OS GUI by using your Google ID, you will need to request for Google API key and include them in the disk image you build. Since the only authentication mechanism included in Chromium OS is Google ID, you probably will need this or you will only be able to login as guest user.

Apply for Google API on Google website per this document. After acquired the client ID, client secret and API key, put then in ~/.googleapikeys file as in below format.

'google_api_key': 'your api key',
'google_default_client_id': 'your client id',
'google_default_client_secret': 'your client secret',

Then the Chromium OS build script will read necessary information from this file automatically, and the image you build will allow Google ID login.

Setup Raspberry Pi overlay

Now fetch this overlay and put it in the right place.

$ cd /project/overlays
$ git clone

$ cd /project/chromiumos-R56/src/overlays
$ ln -s /project/overlays/overlay-rpi .

Then edit file /project/chromiumos-R56/src/third_party/chromiumos-overlay/eclass/cros-board.eclass, and put a line rpi in the ALL_BOARDS array, around line 29. It looks like this when done.


Build Chromium OS for Raspberry Pi

Create the chroot

As mentioned above, a chroot environment will be used to run the actual build process and some other related tasks. To create the chroot environment, run below commands.

$ cd /project/chromiumos-R56
$ cros_sdk

It make take 10 to over 30 minutes depends on your Internet connection speed and disk speed. Once finished, it will enter into the chroot. The shell prompt string looks like below so it is very easy to tell whether you are currently in the chroot or not.

(cr) (release-R56-9000.B/(aaab1a3...)) <user>@<host> ~/trunk/src/scripts $

The chroot environment is located under the /project/chromiumos-R56/chroot directory.

Let’s exit from the chroot first as we need to do some customization before move on. Type exit or Ctrl + D to exit from the chroot shell.

Usually the chroot only needs to be created once and can be used to build a board many times or build different boards. It very rarely need to be removed/re-created.

Delete the chroot

If you would like to remove the chroot and re-create it from scratch, don’t delete the chroot directory directly. As there could be directories from the host OS bind mounted in the chroot, a rm chroot command could actually remove files from your host OS undesirably.

The correct way to remove the chroot is by using below commands.

$ cd /project/chromiumos-R56
$ cros_sdk --delete

Setup bind mount directories for chroot

Programs running inside the chroot will not be able to access files outside of the chroot. One way to circumvent this is to bind mount those files into a directory inside the chroot.

When entering the Chromium OS chroot environment, a file named .local_mounts will be checked and directories listed in it will be bind mounted inside the chroot. All we need to do is to create this file in the right place and put necessary contents in, by using below command.

$ echo "/project" > /project/chromiumos-R56/src/scripts/.local_mounts

Now, after entered the chroot, a /project directory will exist in the chroot and its content is the same as the /project directory in the host OS, as it actually is bind mounted from the host OS.

If we don’t do this, the /project/chromiumos-R56/src/overlays/overlay-rpi symbolic link will not be accessible, as the top directory (/project) it points to doesn’t exist in the chroot.

Enter the chroot

Now we can enter the chroot.

$ cd /project/chromiumos-R56
$ cros_sdk

It is the same command used to create the chroot. It creates the chroot if one does not exist, and enters the chroot if there is already one.

And we can check whether above .local_mounts setup was done correctly. Notice that the (cr) $ prefix denotes these commands should be run in the chroot.

(cr) $ ls /project                      # You should be able to see the same content as in host OS.
(cr) $ ls ../overlays/overlay-rpi/      # You should be able to see the content of this repo.

Move on if it works well. If not, check and make sure you set up .local_mounts correctly.

Set password for the chronos user

The chronos user is used to log into the command line interface of Chromium OS, via SSH, local console or the shell in crosh interface. It is recommended that a password is set for this user so you can login as this user and also can do sudo in the Chromium OS command line, for advanced tasks.

To set password for chronos user, run below command.

(cr) $ ./

Type in a password when been prompted. If you would like to change the password, simply run the command again.

The password is encrypted and saved in the file /etc/shared_user_passwd.txt in the chroot. You only need to set it once and it will be used for all the images you build, unless you re-create the chroot.

Setup Raspberry Pi board

In the Chromium OS terminology, a board refers to a class of computer platform with distinct hardware configurations. The board will be used as a target in the process of building software packages and disk image for that specific computer platform.

There are many boards exist in the Chromium OS code base. They are either development platforms or real selling products running Chrome OS, such as Chromebooks you can buy from many vendors.

The Chromium OS project utilizes the Portage package management system from Gentoo Linux. Each board lives in its own “overlay”, which holds distinct build configuration, system configurations, collection of software packages, system services, disk image customization etc. for that board.

In our case here, we created a board named “rpi” and it refers to the Raspberry Pi computer. And we call the overlay “overlay-rpi” or “rpi”, all its files are hosted in this repository.

To build Chromium OS for a board, the first thing is to initialize the board from its overlay.

(cr) $ ./setup_board --board=rpi

Again, it may take 10 to over 30 minutes depends on the speed of your Internet connection and disk I/O.

After it’s done, a directory structure for the “rpi” board will be created under /build/rpi of the chroot.

Re-initialize the board

It is usually not necessary to re-initialize the board as what you have already built will be lost, and you will have to spend hours to rebuild all packages from scratch. But if you really need to do so, just re-run the same setup_board command with the ---force option.

(cr) $ ./setup_board --board=rpi --force

The --force option will remove the existing board directory /build/rpi and re-create it from scratch.

Build packages

Now it time to build all software packages for the rpi board.

(cr) $ ./build_packages --board=rpi

It may take hours depends on your processor power, your memory size, your disk speed and your Internet bandwidth. On a decent machine with 4 cores 8 threads, 16GB memory, files on regular HDD, and 100Mb broadband, it takes about 5 to 6 hours for the command to finish.

When interrupted

The build process is incremental. If it gets interrupted for any reason, you can always rerun the same build_packages command and it will resume the build instead of rebuild from scratch.

Read the output

The build_packages command throw out a lot of information on the console. Fortunately those information are very well organized.

  • Red text: these are error messages and very likely will cause the build process to break.
  • Green text: these are useful messages printed by the build script itself. They are useful when debugging problem.
  • White text: these are regular information that mostly are printed by the commands called in the build script. They provide more details about the build process thus are also useful for debugging.

Read the logs

Most time the build_packages command spends on is running the emerge commands, to build, install and pack those hundreds of software packages required by the overlay. The emerge command is from the portage system of Gentoo Linux.

The emerge command saves the output of its building, installation and packing process into log files. These files are extremely useful if there is failure when building some package. Those log files are located under the /build/rpi/tmp/portage/logs directory of the chroot. They are plain text files so can be viewed by tools like less, or more, or editors such as vim.

Build the disk image

After the build_packages command finished successfully, you can start building the disk image.

(cr) $ ./build_image --board=rpi --noenable_rootfs_verification

It may take 10 to 30 minutes, mainly depends on the speed of your disk. It is much faster on SSD than on HDD.

Find your image

After the command finished successfully, you will have disk images generated, saved under /mnt/host/source/src/build/images/rpi/ directory in the chroot, or /project/chromiumos-R56/src/build/images/rpi in the host OS. These two are the same directory, just bind mounted in the chroot.

Each invoke of the build_image command will create a directory named similar to R56-9000.104.<date time>-a1 under above directory. There is a symlink named latest under above directory, that always point to the image directory of the last successful build.

The disk image is usually named chromiumos_image.bin, under abovementioned directory. So full path to the latest image is


in the chroot, and


in the host OS.

Boot Raspberry Pi from the image

The Raspberry Pi boots from the SD card so we need to write the previously generated disk image on to the SD card. A SD card of at least 8GB capacity is required.

Write the disk image to a SD card

There are two usual ways to write the Chromium OS disk image to a SD card. You can copy the image out to another Window/Mac/Linux system and write it using your favorite GUI/CLI application. It is the same as writing other Linux images for Raspberry Pi, so will not be explained here.

Another Chromium OS specific way is by using the cros command in the chroot.

Write the image by using the cros command

First plug the SD card into the box used to build the image and has the chroot. Then run below command.

(cr) $ cros flash usb:// rpi/latest

This asks to write the latest disk image to USB removable media. A list of USB removable media will be presented, with index number prefixed. You can select which USB drive to write to by type in the index number when prompted.

Boot from the SD card

After the disk image is successfully written to the SD card, plug it into the Raspberry Pi and boot it as usual. After a few seconds you will see a Chromium logo, later on it will boot into GUI mode and the first time setup screen will pop up for you to configure the system and login.

More information

Chromium OS Developer Guide. This is the official source of how to build Chromium OS

The Flint OS website, English site, our home 🙂

The Flint OS website, Chinese site, our home, in Chinese.

About us

We are a UK/China based technology start-up. We have offices in both London and Beijing at the moment and are looking to expand to the city of Shenzhen. The team of 3 founders have spent many years in technology, consulting and media industries and gained valuable experiences of how things work. We have also witnessed the trend of digital transformation and the disruption it has caused to businesses and individuals. Realizing there is the lack of simple, secure and reliable “IT-as-a-Service” offerings for businesses, schools and individuals, we decided to come together and founded Flint.

Flint began with a vision where all applications and services we use today will be living in the Cloud. With the ever advancing browser platform technology and web frontend performances, it’s not surprising that most things we do today with the internet can be done through a single browser window. We are stepping into an era where installable apps will soon become history.

Therefore, we built Flint OS – a simple, secure, fast and productive operating system. Based on the open-source Chromium Project that also powers the well-known Google Chromebooks. Flint OS inherits most of the benefits that Chromebooks have but also bundled with our enhancements and new features. We have turned Flint OS into a more open platform, users will no longer be forced to rely on Google services and have the freedom to choose whichever services they prefer. We have also made Flint OS run on a wider range of hardware platforms ranging from x86 PCs and ARM based single board computers, providing endless of possibilities and potentials of how Flint OS can be used and applied.


Get real time Telegram notification when SSH login on Linux

Traditionally, Linux system admins use commands such as last, lastlog or some log analysis service to monitor user logins and catch suspicious activities. This is a good security practice. However these methods have their drawbacks – as the usual approch is to do periodic scans there could be delay between a login event and the report of incident. The delay could be minutes to hours depends on how the system is configured.

In this post I’m going to introduce a way to get near real time notification when a user login event occurred. This method make use of the popular instant messaging cilent Telegram and its Bot API to send a notification on user login so the delay is usually within just a couple of seconds!

The basic idea

Telegram Bot is a special kind of account that operated by software instead of real human being. Telegram provides a HTTP API to control the bot, to do things like send a message, or receive a message and process it as command, etc. etc. I have made use of that feature to do following things:

  1. On user login, run a script.
  2. The script calls Telegram Bot API to send a message.
  3. The message is sent by Telegram server to my Telegram account.
  4. My Telegram client on phone and PC receive the message, I know a login event occured.

All this happends right after the user logs in. I usually got the notification from Telegram on PC within 2 secs. Keep reading to see how to set up this.

Setup a Telegram bot

First of all, you’ll need to have a Telegram account and create a Telegram bot. I won’t go into detail for this as it is easy and is already well documented on Telegram website. Check it out here.

Once the setup of bot is finished, a token will be allocated for the bot. Keep that token. It will be used by the script to control the bot via Telegram HTTP API.

Connect the bot with your Telegram account

For obvious reason regarding privacy, a Telegram bot cannot initiate a message to a Telegram user. If you need to received messsages from a bot, you need to first start a chat with it and send a hello message. This process is like talking to a readl human account, just talk to the bot and send any text is OK.

After you initiate the chat, the bot and you is now in a conversation which represents by an unique ID in Telegram API. We need to find out this conversation ID, as it will be the “destination” for our script to send notification message to.

Here is how to get that chat ID – first send a message to the bot, then access the Telegram API getUpdates.

The API URL is<your bot token>/getUpdates. HTTP method GET should be used to access it. <your bot token> is the token allocated to your bot, as mentioned above.

I use curl to do this.

$ curl "<bot token>/getUpdates"
"message":{"message_id":1954,"from":{"id":xxxxxxxxx,"first_name":"Your first name","last_name":"Your last name","username":"Your telegram user name"},"chat":{"id":xxxxxxxxx,"first_name":"Your first name","last_name":"Your last name","username":"username","type":"private"},"date":1494671101,"text":"test"}}]}

The return message is a JSON text. This is how it looks after beautification.

  "ok": true,
  "result": [{
    "update_id": xxxxxxxxx,
    "message": {
      "message_id": 1954,
      "from": {
        "id": xxxxxxxxx,
        "first_name": "Your first name",
        "last_name": "Your last name",
        "username": "username"
      "chat": {
        "id": xxxxxxxxx,          <--- This is the chat ID you need.
        "first_name": "Your first name",
        "last_name": "Your last name",
        "username": "username",
        "type": "private"
      "date": 1494671101,
      "text": "test"

Note down the value of, we will need it in the script.

Create a script to send message via the bot

Now we need to create a script which sends a predefined message via Telelgram.

Create a executable script file named /usr/local/bin/ and put below content in.


KEY="<your bot token>"


TARGET="<your chat ID>" # Telegram ID of the conversation with the bot, get it from /getUpdates API

TEXT="User *$PAM_USER* logged in on *$HOSTNAME* at $(date '+%Y-%m-%d %H:%M:%S %Z')
Remote host: $PAM_RHOST
Remote user: $PAM_RUSER


# Run in background so the script could return immediately without blocking PAM
curl -s --max-time 10 --retry 5 --retry-delay 2 --retry-max-time 10 -d "$PAYLOAD" $URL > /dev/null 2>&1 &

Remember to replace <your bot token> and <your chat ID> with the real data of your own.

Basically, this script use curl to access the Telegram HTTP API to send a text message to the designated user. Those $PAM_* variables are provided by PAM, they will be explained later.

You can now run this script to test. If everything goes right, your Telegram client will receieve a message like this:

User  logged in on myhost at 2017-05-13 18:45:20 CST
Remote host: 
Remote user: 

That means the script is working. The real useful information is still empty, they will be filled in when running by PAM.

Run the script when a user logs in on the server

Now the final part is to make the script running when a user logs in on the server.

There are some information available on the Internet regarding this topic, however most of them just put the command to run in the user’s shell init script, such as .bashrc, or /etc/profile or a script under /etc/profile.d/. This way works, but is not secure. Because a user have access to his own shell init scripts such as .bashrc or .zshrc, a hacker who steal the user’s identitiy and log in on the server can easily remove the command from the script or remove the script itself entirely. This makes it practically useless – the script will only be run just once and send out the notification once, then the hacker is able to mute it.

A better approach is to use the capability of PAM. I’ve introduced it in an earlier post, read it if you are not familiar with PAM yet.

Here we are going to use the pam_exec module to accomplish our goal. This module is part of the PAM and it can run a command specified by user. We can use it to run the Telegram message sending script on user login, then we will be able to accomplish our goal. Here is how.

Edit the file /etc/pam.d/system-auth, this is the PAM service that will be called by many system commands that tries to authenticate a user, SSH is one of them. Put below line at the last line.

session optional type=open_session seteuid /usr/local/bin/

This command tells PAM to run the script /usr/local/bin/ after a user successfully authenticated himself and logged into the system. The pam_exec module will export a few environment variables so the script could read them and send in the message. Below are the explaination of the variables.

PAM_USER: name of the user name who is logging in.
PAM_RHOST: the remote host that the user connects from.
PAM_RUSER: the user name on the remote host that the user connects from.
PAM_SERVICE: the PAM service, which in some way represents the system service that user trying to access.
PAM_TTY: if the user logs in locally, this will be the TTY name. For remote login via SSH, it will just be “ssh”.

These variables are send in the Telegram message, so we can now who from where on when has logged in on which server via which service, enough for us to know whether it normal or suspecious.

As this file is only writable by root, a non-root user could not bypass this step. This is much secure than the shell init script method. Of course, if the root user is hacked it still could be modified and the notification could be muted. Some further facility could be deployed to mitigate this problem, such as file integrity check, which is beyond the scope of this article.

Test it

Now login in to the server via SSH, and you will be able to recieve a message on Telegram from the bot, like in below screenshot. Congratulations!


In fact, this is just an demostration of the functionality of pam_exec and Telegram bot. You can actually do more. A few ideas is on my mind.

  • Trigger a system task such as mount a remote filesystem on user login via pam_exec.
  • Send a log to a remote server on user login via pam_exec.
  • Send notification via Telegram on failure of a critical system crontab.
  • Send notification via Telegram when system update is available.
  • And more….

SSH power unleashed – Part 2: use SSH agent as an authentication provider

In the first part of this serial posts I’ve introduced SSH key and agent and their applications. In this one I’m going to take it further, let’s see how we can make use of SSH agent authentication to make things easier in other situations while still keep it secure.

A pain point of sudo

Sudo, the so commonly used tool that probably don’t need any introduction here. I find myself in dilemma while tryhing to keep secure AND convenient at the same time. By default, when you setup sudo to allow some user to own the root power, you ask him to authenticate himself by enter this password when he fires up sudo. It is usually set up by putting a line as below in /etc/sudoers file.

username        ALL=(ALL)       ALL

People like me may set up password very long so every time I ran sudo I have to type in that in a finger dance. Yes I’ve mentioned that in the last post already and I solved that by using SSH agent. Hint: it is going to help us again this time.

Some people trying to solve this problem by setting up sudo like below instead.

username        ALL=(ALL)       NOPASSWD: ALL

The NOPASSWD part makes sudo skip asking password for username when he issues sudo command, and grant him the root power silently. So as you already find out, although it is convenient, it is a huge risk – if the user accidently got hacked, the hacker is able to get root power like a piece of cake.

Let’s take the SSH agent authentication method further beyond just SSH connection

So you may wonder as I had before – we already have a way to authenticate ourselves to SSH command via the SSH agent, and skip password input in a secure way. Can we make use of that same facility on sudo? The answer is yes!

First we need to install the great pam_ssh_agent_auth package. This package provides the capability to authenticate via SSH agent in the PAM framework(I will cover that detail later in this post). This package is included in the repository of many Linux distros so just install it with your favorate package manager. As what I did on a CentOS server:

yum install pam_ssh_agent_auth

Then put a line in the file /etc/pam.d/sudo

auth sufficient file=~/.ssh/authorized_keys          <--- This is the line to put in
auth       include      system-auth
account    include      system-auth
password   include      system-auth
session    optional revoke
session    required

Make sure the line is located above other “auth” lines, like what I did in above example. This line tells PAM that when sudo is trying to authenticate a user, first try to use the pam_ssh_agent_auth module. If it succeeded, the user is authenticated and get sudo power. If it failed, try the next authentication method, the global system-auth method, which in most case would be asking for password.

The file=~/.ssh/authorized_keys parameter in that line tells the pam_ssh_agent_auth module to verify the user SSH agent against the public keys stored in his own home directory. You can also change that to some other file path, say, if you would like sudoers to authentication against an admin managed dedicated SSH key for sudo.

Once it is done, follow the same instructions in the last post regarding SSH agent and agent forwarding. Then you can sudo on a remote server without needing to input the password, while if you accidentally get hacked, the hacker won’t be able to sudo as he don’t have your SSH key.

The magic behind

So what happens behind all this? The major parts in this magic are: SSH agent, PAM, and the pam_ssh_agent_auth module.

I have already talked about SSH agent so I won’t repeat it. The pam_ssh_agent_auth module connects PAM and SSH agent. The key here is PAM.

What is PAM? PAM means Pluggable Authentication Modules. Here is what PAM says about itself in it’s man page.

Linux-PAM is a system of libraries that handle the authentication tasks of applications (services) on the system. The library provides a stable general interface (Application Programming Interface - API) that privilege granting programs (such as login(1) and su(1)) defer to to perform standard authentication tasks.

The principal feature of the PAM approach is that the nature of the authentication is dynamically configurable. In other words, the system administrator is free to choose how individual service-providing applications will authenticate users. This dynamic configuration is set by the contents of the single Linux-PAM configuration file /etc/pam.conf. Alternatively, the configuration can be set by individual configuration files located in the /etc/pam.d/ directory. The presence of this directory will cause Linux-PAM to ignore/etc/pam.conf.

My own sumary for PAM:

  • PAM provides a system with plugin capability, which is easy to extend for both developers and system admins. A plugin is called a module.
  • PAM provides a universal way for applications to use different methods for authentication provided by different modules. The application don’t need to change if the system provides password authentication in the beginning while later adds fingerprint authentication.
  • Each authentication method provided by PAM can be enabled / disabled / configured individually without interfering each other.
  • Different modules could be linked to gether to provide parallel or serial path of authentication flow.

PAM is a fundamental part of Linux system for many years. And many modules have been developed in this framework. The pam_ssh_agent_auth module I mentioned here, is one of them.

Sudo command, like many other applications in Linux, make use of PAM for user authentication. The file /etc/pam.d/sudo controls how sudo will make use of PAM for user authentication. Before we put the line in, it defaults to system-auth, which on most system usually is password authentication(check /etc/pam.d/system-auth if you are interested in what it does on your system). After we put the line in, sudo will first try to execute what that specific line instructs.

What the configuration line I put in /etc/pam.d/sudo does are explained below:

auth - Tells PAM this line is about a method of how to authenticate a user.
sufficient - Tells PAM that the user will be considered successfully authenticated if he passed this one, no need to try othe "auth" line after this one. - Tells PAM to load and excute this module.
file=~/.ssh/authorized_keys - The parameter to the module, the meanning is already explained above.

So when a user fires up sudo, it calls PAM for authentication, PAM then look into the file /etc/pam.d/sudo and decide to try pam_ssh_agent_auth module first. This module then interacts with the SSH agent and verifys the priviate key information provided by SSH agent against the public keys in ~/.ssh/authorized_keys. If the verification susccess, the module turns back to PAM system and tells it the authentication is successful. As PAM sees the sufficient options here, it considers the whole authentication process is successful and sudo can move on. If the user has no SSH agent running, or the agent is not able to provide the correct key, the authentication attempt of this module failed, and PAM moves on to the next line, which allows the user to still authenticate with his password.

More than that

This approach is not specific to sudo. It can be enabled for any program that makes use of PAM for user authentication. So just open your mind and find out what makes you keep typing passwords. If it uses PAM, congratulations, you may save your fingers!

1 Comment

Discourse SSO login/logout state synchronization tips

Discourse provides SSO integration funtionality which allows using account from an external service to log in. The official Discourse SSO document already provided detail of how to setup Discourse to integrate with external SSO provider. That one should be read very carefully.

Here I’m going to talk about how to made the login/logout state synchronized between Discourse and the reset part of the website(the SSO provider). This is quite important if you would like your visitors to have a good and consistent experience on the site. Without proper setup, the experience is bad in several ways, as described below in each section.

The term “website” in this post refers to the part of the site that is not backed by Discourse. It is the SSO provider for Discourse. If you are using WP or some other site builder, it is similar to implement.

Synchronize state from website to Discourse


When a user is logged in on the website, the URL that link to the forum site should be set to

If your site allows anonymous browsing, make sure you detect the user login state and only append the /session/sso part for logged in user. For anonymouse user just direct them to

Without this setup, when a user navigates to Discourse, he will need to click the Discourse login button to login.


When a user logout on the website, the website need to send a Discourse API request to this URL:{user_id}/log_out

The user_id here is the user ID in Discourse, it maybe different from the user ID the website uses.

How to get the user_id of Discourse

Quote from the official SSO document.

User profile data can be accessed using the /users/by-external/{EXTERNAL_ID}.json endpoint. This will return a JSON payload that contains the user information, including the user_id which can be used with the log_out endpoint.

So the website need to send the the user ID as above and get back the Discourse user ID, then logout the user.

Without this setup, when a user logs out from website, he is still in login state on Discourse. If he navigates back and forth between the site and Discourse, he will see unmatched login state, very confusing.

Synchronize state from Discourse to website


No special handling required. When SSO is turned on in Discourse and a user click login button in Discourse, he will be redirected to website login page. Once logged in, he will be in logged in state on both website and Discourse.


The website needs to implement a logout URL such as

In Discourse site settings -> users -> logout redirect, fill in that URL. Then when user logs out from Discourse, he will be redirect to the website and log out there also.

Without this, when a user logs out from Discourse, he is still in login state on website.

After followed above tips, your user will have consistent login/logout state on the website.

I’ve also posted this on

Leave a Comment

SSH power unleashed – Part 1: use private keys

I’m going to talk about some advanced tips regarding SSH in this series. Many people use it on a daily basisi, yet still only use it’s very basic functions: log into a server, do some work, log out. But SSH actually is very feature rich and flexible. It can even do something many people don’t heard before. Let’s first start with the well known public and private key authentication function.

Tip 1: never login use password. Use public key authentication instead.

The first rule of running a production server is , disable root remote login and password login for normal users. Make sure “PermitRootLogin=no” and “PasswordAuthentication=no” are in your SSHd configuration file, usually /etc/ssh/sshd_config.

You may also want to lockout your root passwd. This adds extra protection to your root power. it can be done by below command:

passwd -l root

Once root password is locked, no one can login as root with password. The only way to become root is to either login use some other authentication method, such as SSH public key, or use su/sudo to elevate from a normal user.

Now it is time to get a pair of public and private key for yourself to login. On a Linux machine, this can be done with below command.

Note: these should be done on your local machine, not your server. NEVER put your private key on remtoe server!

ssh-keygen -t rsa -b 2048

Then follow the on screen message to provide file name to save the keys(the default is OK), and the passphrase.

If you want ultra-secure keys, just raise the key bits given for -b option. 2048 is sufficient for nowadays, 4096 may give you longer confidence.

Now look at your ~/.ssh/ directory, your new keys are there. The public key file is named, and the private key file is named id_rsa. The public key file can be disclosed to the world, while the private key file should be kept safe, as safe as how you keep the key of your home, may be even more 😉 .

Now upload the public key file to your remote server, and make it useable for SSH. It can be done as below.

cat >> ~/.ssh/authorized_keys

This puts the content of your newly generated public key into the authorized_keys file, which will be check by SSH upon your login.

Now login from your local machine to the server with key authentication, you need to provide the passphrase of the private key instead of your system account password. And you can safely disable password login in SSHd configuration now.

Tip 2: use ssh-agent to cache your keys

Imagine that you manage a few servers, and you need to frequently login to perform some tasks. It’s going to be a pain to enter the password of the keys every time. It is quite boring for myself – my password is quite long and made of non-sense characters, everytime I enter the password it feels like a finger dance!

The ssh-agent command is made for such purpose. It can be used to invoke a shell and cache yoru private key in memory. Next time when a SSH key is requested, it will provide the data directly without requiring your to enter the password again. Let’s see how to do that.

First start a shell by ssh-agent:

ssh-agent /bin/bash

It seems nothing happend, you are just dropped back to shell prompt. But actually you are in a newly invoked shell now. In this shell, ssh-agent caches the private key password for you.

Now load your private key(s).


It will prompt you the file name of the current private key it going to load, ask you to input the password. Once you are done, the priviate key and its password is cached in memory. Now try to log into your remote server which has the public key installed. You will notice that no password is required during SSH login!

Tips 3: start ssh-agent upon login

Above tip makes life easier, let’s move on to see how we can make it even more easier.

Instead of start ssh-agent every time after you logged in your local machine, it is possible to start it automatically. If you are using Bash as your login shell, put below at the end of your ~/.bashrc file.

eval $(/usr/bin/ssh-agent -s)


And below at the end of your ~/.bash_logout file.

if [ -n "$SSH_AGENT_PID" ] ; then
    /usr/bin/ssh-agent -k

During your login, you will notice the message of “Agent PID xxxxx”, which means your newly added code in ~/.bashrc just ran and started ssh-agent and setup the environment for you. Then the ssh-add command asks you to load your private keys. Type in your password and your ssh-agent is up running just like described in Tip 2. You can SSH login to remote servers without typing in password.

When you logout, the code in ~/.bash_logout will make sure the agent is killed so no key data is left in memory.

Tip 4: use SSH agent forwarding

Still remember that I mentioned above the private key should be kept safe and never uploaded to any remote servers? What if you need to log into another server(let’s call it server B) from your remote server(let’s call it server A)? You don’t have the private key on server A. To log into server B, you have to either use password authentication, or from your local machine that have the private key. But what if you do need to connect from server A to B and don’t want to use the less secure password authentication way?

This is how SSH agent forwarding comes to save you. With this technology, you an “forward” the encrypted information of your private key located on your local machine, via server A, to server B, without actually copying the key file to server A. Let’s see how we can do this.

When you start the connection to server A, first make sure you followed Tip 2 & 3 and in the ssh-agent shell with private key loaded, then use the -A option like below:

ssh -A -p <remote_ssh_port> [email protected]_A

The -A option tells SSH program to forward local ssh-agent to server A. When you use SSH there, it can acccess the private key via the secure SSH connection between your local machine and server A.

Now if on server A you need to connect to server B, just run SSH and you will notice that no password is required, you just log into server B with your private key on your local machine!

If you are using PuTTY on Windows as your SSH client, the agent forwarding option is under Connection -> SSH -> Auth -> Authentication Parameters -> Allow agent forwarding, as shown in below screenshot.

Caution: always make sure server A is trusted before you forward your agent to it.

Leave a Comment

Handy cURL shell script for http troubleshooting

The great cURL tool

Many people know about the famous cURL tool. For those don’t know yet, here is the introduction from it’s own man page.

curl is a tool to transfer data from or to a server, using one of the supported protocols (DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET and TFTP). The command is designed to work without user interaction.

curl offers a busload of useful tricks like proxy support, user authentication, FTP upload, HTTP post, SSL connections, cookies, file transfer resume, Metalink, and more. As you will see below, the number of features will make your head spin!

Many developers, system admins, tech support people and users use it on a day to day basis. The typical way of using it is to view HTTP connection detail such as request and response headers, very handy.

Wait, are you really using it in a great way?

But many people never notice a powerful option cURL provided, the “-w” option. Here are some key information regarding this option from its man page.

-w, --write-out Make curl display information on stdout after a completed transfer. The format is a string that may contain plain text mixed with any number of variables. The format can be specified as a literal "string", or you can have curl read the format from a file with "@filename" and to tell curl to read the format from stdin you write "@-".

Some really useful variable “-w” option supports:

size_download The total amount of bytes that were downloaded.
size_request The total amount of bytes that were sent in the HTTP request.
size_upload The total amount of bytes that were uploaded.
speed_download The average download speed that curl measured for the complete download. Bytes per second.
speed_upload The average upload speed that curl measured for the complete upload. Bytes per second.
time_appconnect The time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed. (Added in 7.19.0)
time_connect The time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.
time_namelookup The time, in seconds, it took from the start until the name resolving was completed.
time_pretransfer The time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.
time_redirect The time, in seconds, it took for all redirection steps include name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections. (Added in 7.12.3)
time_starttransfer The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.
time_total The total time, in seconds, that the full operation lasted.

Try this!

Let’s put these together in a shell script, say


set -e

              Downloaded (byte)  :  %{size_download}
            Request sent (byte)  :  %{size_request}
                Uploaded (byte)  :  %{size_upload}

       Download speed (bytes/s)  :  %{speed_download}
         Upload speed (bytes/s)  :  %{speed_upload}

            DNS lookup time (s)  :  %{time_namelookup}
  Connection establish time (s)  :  %{time_connect}
           SSL connect time (s)  :  %{time_appconnect}
          Pre-transfer time (s)  :  %{time_pretransfer}
              Redirect time (s)  :  %{time_redirect}
        Start-transfer time (s)  :  %{time_starttransfer}

                 Total time (s)  :  %{time_total}


exec curl -w "$curl_format" -o /dev/null -s "[email protected]"

Then we can call this script in such way:


              Downloaded (byte)  :  251340
            Request sent (byte)  :  121
                Uploaded (byte)  :  0

       Download speed (bytes/s)  :  2483400.000
         Upload speed (bytes/s)  :  0.000

            DNS lookup time (s)  :  0.000111
  Connection establish time (s)  :  0.000559
           SSL connect time (s)  :  0.000000
          Pre-transfer time (s)  :  0.000623
              Redirect time (s)  :  0.000000
        Start-transfer time (s)  :  0.023913

                 Total time (s)  :  0.101208

We have a pretty view of how may data are transferred and how much time spent, isn’t it nice?

Next time you need to diagnostic some HTTP issue, besides of the regular curl command you used to run, don’t forget to give this one a try. I use it a lot, hope you will find it helpful as well.

http connection timing, http connection troubleshooting, http connection diagnostic, curl advanced tips
Leave a Comment

What is d_type and why Docker overlayfs need it

In my previous post I’ve mentioned a strange problem that occurs on Discourse running in Docker. Today I’m going to explain this further as this problem could potentially impact any Docker setup uses overlayfs storage driver. Practically, CentOS 7 with all default setup during installation is 100% affected. Docker on Ubuntu uses AUFS so is not affected.

What is d_type

d_type is the term used in Linux kernel that stands for “directory entry type”. Directory entry is a data structure that Linux kernel used to describe the information of a directory on the filesystem. d_type is a field in that data structure which represents the type of an “file” the directory entry points to. It could a directory, a regular file, some special file such as a pipe, a char device, a socket file etc.

d_type information was added to Linux kernel version 2.6.4. Since then Linux filesystems started to implement it over time. However still some filesystem don’t implement yet, some implement it in a optional way, i.e. it could be enabled or disabled depends on how the user creates the filesystem.

Why it is important to Docker

Overlay and Overlay2 are the two supported storage driver of Docker. Both of them depends on the overlayfs filesystem. Below is a picture from Docker document shows how Docker uses overlayfs for its image storage.

Overlay storage driver in Docker

In the overlayfs code(it’s part of the Linux kernel), this d_type information is accessed and used to make sure some file operations are correctly handled. There is code in overlayfs to specifically check for existence of the d_type feature, and print warning message if it does not exist on the underlying filesystem.

Docker, when running on overlay/overlay2 storage driver, requires the d_type feature to functioning correctly. A check was added to Docker 1.13. By running docker info command now you can tell whether your backing filesystems supports it or not. The plan is to issue an error message in Docker 1.16 if d_type is not enabled.

When d_type is no supported on the backing filesystem of overlayfs, containers running on Docker would run into some strange errors doing file operation. Chown error during Discourse bootstrap or rebuild is one common error. There are some other examples you can find in Docker issues on GitHub, I’ve take some for example as below.

Randomly cannot start Containers with “Clean up Error! Cannot destroy container” “mkdir …-init/merged/dev/shm: invalid argument” #22937

Centos 7 fails to remove files or directories which are created on build of the image using overlay or overlay2. #27358

docker run fails with “invalid argument” when using overlay driver on top of xfs #10294

Check whether your system is affected

TL;DR: Ext4? Good. XFS on RHEL/CentOS 7? High chance bad, use xfs_info to confirm

As mentioned above, d_type support is optional for some filesystem. This includes XFS, the default filesystem in Red Hat Enterprise Linux 7, which is the upstream base of CentOS 7. Unfortunately, the Red Hat /CentOS installer and mkfs.xfs command all by default create XFS filesystem without d_type feature turned on…… What a mess!

As a quick rule, if you are using RHEL 7 or CentOS 7, and your filesystem is created by default without specifying an parameter, you can almost be 100% sure that d_type is not turned on on your filesystem. To check for sure, follow below steps.

First you need to find out what filesystem you are currently using. Although XFS is the default during installation, some people or the hosting provider may choose to use Ext4. If that’s the case, then relax, d_type is supported.

If you are on XFS, then you need to run xfs_info command against the filesystem you need to check. Below is an example from my system.

$ xfs_info /
meta-data=/dev/sda1              isize=256    agcount=4, agsize=3276736 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=13106944, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=6399, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Pay attention to the last column of 6th line of the command output. You can see ftype=1. That’s a good news. It means my XFS was created with the correct parameter, ftype=1, thus d_type was turned on. If you see a ftype=0 there, that means d_type is off.

How to solve the problem

Another bad news is this problem can only be fixed by recreate the filesystem. It cannot be change on an existing filesystem! Basically the steps are:

  1. Backup your data
  2. Recreate the filesystem with correct parameter for XFS, or just create an Ext4 filesystem instead.
  3. Restore your data back.

Let’s focus on step #2. DON’T try any of below command on your server before you fully understand them and have backup secured!

If you chose Ext4 filesystem, then it’s easy, just run mkfs.ext4 /path/to/your/device and that’s it.

If you chose XFS filesystem, the correct command is:

mkfs.xfs -n ftype=1 /path/to/your/device

The -n ftype=1 parameter tells mkfs.xfs program to create a XFS with d_type feature turned on.

Take actions

It is a good idea to check your system asap to see if this d_type problem affects your RHEL/CentOS 7 installation. The sooner you fix the problem the better.


Setup Free SSL with Let’s Encrypt and DNS validation

In this howto I’m going to talk about setup free SSL with Let’s Encrypt and DNS challenge validation on a Linux server, with auto-renew support. The free SSL certificates will work for both Nginx and Apache, the two most popular open source web servers.

Why use DNS challenge validation instead of web server validation

There are many articles on the Internet describing how to setup free SSL by using Let’s Encrypt. However most of them use web server challenge validation. This is great, though sometimes your web server setup don’t allow you to do that easily. There are several possible reasons to use DNS instead.

  • You have more than one web servers to serve your site, load balanced for example. You can’t tell in advance which server will respond to the challenges from Let’s Encrypt servers. There are solution for this situation but it is not trivial to setup.
  • Your web site runs in a intranet and no public access, thus unable to perform a web challenge validation.
  • You host quite a lot domains on your web server, you might want to avoid touching web server configurations to lower risks.
  • You have complex setup in your web server, such as all kinds of redirection between domains(non-www to www or vice versa, which is very common), mixed dynamic and static content, reverse caching proxies, firewalls etc., sometimes it might be hard to set it up quickly and correctly for web challenge validation.
  • You web server is managed by some other administrator and you don’t want to bother him.
  • You are building a new site and the web server is not up yet.
  • You want to keep another possible method on hand just in case.


Your domain must be using one of the DNS providers that provide API access and is supported by the Lexicon DNS record manipulating tool. Check here for the full and up-to-date list of supported DNS service providers. Some commonly used DNS providers are in the list, such as AWS Route53, Cloudflare, Digital Ocean, EasyDNS, Namesilo, NS1, Rage4, Vultr.

If you are already using one of these providers to serve your domain, just move on. If you are not, I recommend you to try Cloudflare if possible. I’ll use Cloudflare as example in this howto.

Gather necessary information

First you need to log into your DNS providers’s website and get the required API access information. Take Cloudflare for example, after you logged in, click your email address at the top right corner of the page, then choose “My Settings” from the menu. This will open your account profile. Scroll down and you will see a section for API information, just like below.

Setup free SSL with Let's Encrypt and DNS validation

Click the “View API Key” button for Global API key, you will then see a popup text box with you key in it. We will need this key later. For different DNS providers, they may provide the API key in different but similar name, such as API token.

Install required software

This howto will use CentOS for example to demonstrate how to install necessary software packages. If you are using a different Linux distribution such as Ubuntu / Arch / Gentoo, the command line and package name might be a bit different. Leave a reply below if you don’t know how to find that out, I’m glad to help.

First install OpenSSL Git, Python, Pip.

yum install openssl git python python-pip

Then create a directory to hold the software we need to install from GitHub.

mkdir /usr/local/repo

Install Dehydrated, which is the tool to manage Let’s Encrypt SSL certificate. It can register, apply and renew certs for you.

git clone /usr/local/repo/dehydrated

Then install Lexicon, a Python package that talks to DNS providers and manges DNS records. Dehydrated will use it to create DNS challenge validation record for Let’s Encryption.

pip install dns-lexicon
git clone /usr/local/repo/lexicon

If you are using DNS provider other than Cloudflare demonstrated here, you may need to install extra dependencies for Lexicon, check for details here.


Now you need to configure Dehydrated to tell it necessary information about the SSL certificate you would like to request. Create a new text file /etc/dehydrated/config and put below content in:


Remember to put in your own email address for the second line. This configuration tells Dehydrated to use DNS challenge instead of the default web server challenge.

Then create file /etc/dehydrated/domains.txt to put in the domain name you would like to apply SSL certificate for. For example, if you have a domain, and would like the certificate to work for hosts including,,, then put a line as below in the file.

Apply for Let’s Encrypt SSL certificate

Now you are ready to apply for the Let’s Encrypt SSL certificate for your domain. First is to register on Let’s Encrypt using the email address you provided earlier in the configuration.

/usr/local/repo/dehydrated/dehydrated --register --accept-terms

Then run below commands to apply for a new certificate

export LEXICON_CLOUDFLARE_USERNAME=your_cloudflare_account_email
export LEXICON_CLOUDFLARE_TOKEN=your_cloudflare_api_key
export PROVIDER=cloudflare

This tells the dehydrated script to read the DNS configuration information from the environment variables. We do this just for quick setup. Later we will put them in a script for easier execution.

Once you started these commands, you will see output saying that Dehydrated is trying to request for certificate, setup DNS record etc. Once it’s finished, your certificate will be located under /etc/dehydrated/certs/ directory.

These commands are for Cloudflare. If you are using a different DNS provider, consult this link for corresponding parameters.

Setup web server

Now it’s time to tell your web server to use the certificates for your site. I’m using Nginx so below are the instruction for Nginx. If you need Apache information, please leave a message below and I’m glad to help.

Put below content in your Nginx configuration file. They can be in the http block if you only host, or can be put in the server block for if you host multiple different domains.

# Below 3 lines enhance SSL security
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/nginx/dhparam.pem;
# Change the file name in below two lines to your actual certificate names.
ssl_certificate /etc/dehydrated/certs/;
ssl_certificate_key /etc/dehydrated/certs/;
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 10m;

Make sure you change the two lines that refer to the certificate files to your actual file.

Then run below command to generate the dhparam file as referenced in above configuration. It may take a while depends on the hardware configuration of your server.

# openssl dhparam -out /etc/nginx/dhparam.pem 2048

After it is done, you can now run nginx -t to test your Nginx configuration. Make sure you see it says everything is OK before move on! A wrong configuration may take your site offline.

Now go to, you will see your site loaded via HTTPS if you have done every step correctly.

Setup cron job for auto renewal

The Let’s Encrypt SSL certificate is free and easy to get, but it is only validate for 90 days. So you need to setup an scheduled job(called cron job on Linux) to check the remaining time of your certificates and renew it if necessary.

To do that, first create a script with below content, and save it as /usr/local/sbin/


export LEXICON_CLOUDFLARE_USERNAME=your_cloudflare_account_email
export LEXICON_CLOUDFLARE_TOKEN=your_cloudflare_api_key
export PROVIDER=cloudflare

/usr/local/repo/dehydrated/dehydrated --cron

nginx -s reload

Again, remember to put correct DNS parameters as your actual situation. And remind that the dehydrated command has a “–cron” option. This option takes care of necessary certificate expiration check and renew process. The last command reloads Nginx so if the cerificates are renewed nginx will pick them up.

Then make the script executable and scheduled to run every week.

chmod +x /usr/local/sbin/
ln -s /usr/local/sbin/ /etc/cron.weekly/


Now you have finished setup of Let’s Encrypt free SSL certificate, you site now serves content via the much safer HTTPS protocol, and you have auto renew in place. Enjoy it.

Leave a Comment

Fix strange chown error during Discourse bootstrap or rebuild


Discourse is an forum software runs in Docker. When using overlayfs/overlayfs2 storage driver, Docker requires the backing filesystem supports d_type. Or else some strange error will just pop up during some very basic file operations, such as chown command.

The symptom

When bootstrapping or rebuilding Discourse on CentOS, the process fails with chown command related errors.

# ./launcher bootstrap app
Pups::ExecError: cd /var/www/discourse && chown -R discourse /var/www/discourse failed with return # Location of failure: /pups/lib/pups/exec_command.rb:108:in `spawn'
exec failed with the params {"cd"=>"$home", "hook"=>"web", "cmd"=>["gem update bundler", "chown -R discourse $home"]}

How to confirm whether you are hit by this problem.

If you are running CentOS 7 and the filesystem was created all by default, you get it. CentoOS installer and the mkfs.xfs command both by default create XFS with ftype=0, which does not meet Docker requirement for filesystem d_type support.

Check the xfs_info command output, mind the ftype=0 thing.

# xfs_info /
naming =version 2 bsize=4096 ascii-ci=0 ftype=0

Then run docker info command to see if Docker pointed it out. You will have to use a new enough Docker version, older ones don’t report d_type information. Below is the output from Docker 1.13.

Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: false

If you see above in your system, then you are in trouble now. Not only Discourse but other container apps may run into strange problem when they do file operations on the overlayfs. Fix it ASAP!

The solution

The key is to get your Docker a filesystem with d_type support. Unfortunately this option can only be set while creating the filesytem. Once filesystem creation is done, the only way to change it is to:

  1. Backup data
  2. Recreate filesystem
  3. Restore data.

Step #1 & 3 is out of the scope of this post. Let’s focus on step #2, how to create the filesystem in the correct way. Two options exist, use XFS or Ext4.

If you prefer XFS

When you run mkfs.xfs command to create XFS on your partition/volume, make sure you pass the -n ftype=1 parameter. The command line looks like below

mkfs.xfs -n ftype=1 /path/to/your/device

If you prefer Ext4 FS

Ext4 filesystem created with default option supports d_type, so there is no special parameter to use when you create Ext4 filesystem on your partition/volume. Easy!

Tips for Docker and Discourse

Since Docker puts its files under /var/lib/docker directory, you only need to make sure d_type is supported for this specific directory. So if you have free space on your disk, you don’t have to touch your whole root filesystem. Just allocate some space, create a new filesystem with correct parameter, then mount it under /var/lib/docker and it’s done.

As regards to Discourse, this procedure won’t even hurt your Discourse data. Discourse puts all data under /var/discourse/share directory. When you get a new /var/lib/docker directory, only the container definition is gone. You just need to recreate the container with launcher script, then the site will just back to normal.

With that said, backup data before doing any filesystem or disk related operation is still a good practice!


My post regarding this issue on Discourse official forum.

Issue: Centos 7 fails to remove files or directories which are created on build of the image using overlay or overlay2.

Issue: overlayfs: Can’t delete file moved from base layer to newly created dir even on ext4

1 Comment