I want to run an application on my CoreOS clusters that uses hostnames to communicate between machines. This is a problem, because out of the box CoreOS machines cannot resolve hostnames of other machines in the cluster. So, I wrote a small fleet service that manages the /etc/hosts files on all the machines so they can correctly resolve each others hostnames. In this post I will briefly describe that service.

The Hosts Service

The hosts.service: # hosts.service [Unit] Description=Hosts Manager After=etcd2.service``[Service] EnvironmentFile=/etc/environment Restart=always``ExecStartPre=-/usr/bin/etcdctl mkdir /hosts``ExecStart=/bin/sh -c 'while true; do etcdctl watch --recursive /hosts; \ sleep 1;\ echo "127.0.0.1 localhost" > /etc/hosts; \ for i in $(etcdctl ls /hosts); do \ echo $(etcdctl get $i) $(echo $i | cut -c 8-); \ done >> /etc/hosts; \ done'``ExecStartPost=/usr/bin/etcdctl set /hosts/%H $PRIVATEIP ExecStopPost=/usr/bin/etcdctl rm /hosts/%H``[X-Fleet] Global=true

  • This service uses etcdctl so must start after etcd2
  • in the /etc/environment file the PRIVATEIP is defined as the address the hostname should be resolved to. This can be set in the cloud-config with:`write_files:
  • path: /etc/environment
    content: |
    PRIVATEIP=$private_ipv4`
  • Restart=always to always restart this service
  • Before the service starts (ExecStartPre=-) it will create the etcd directory hosts (etcdctl mkdir /hosts). Note: the =- means that if the directory creation fails (if it already exists) it will not stop the service
  • ExecStart= will start an infinite loop (while true) that waits for changes in the hosts folder (etcdctl watch — recursive /hosts). When a change happens it will:
  • wait (sleep 1) in case there is a burst of changes all at once
  • overwrite the exists /etc/hosts file with localhost information
  • loop over all entries in the /hosts directory (for i in $(etcdctl ls /hosts)) and append the value of the entry (etcdctl get $i) and the path (minus the /hosts bit) (echo $i | cut -c 8-) to the /etc/hosts file
  • After the service has started (ExecStartPost) it will then register its hostname (%H) and address ($PRIVATEIP) in the /hosts directory. This change will be detected by the service, which will rewrite the /etc/hosts file with itself in it.
  • When the service stops it will remove itself from the /hosts directory, which will cause other machines to update their /etc/hosts file.
  • Global=true is there so that this service will automatically run on new machines to the cluster. This means that the /etc/hosts file on all machines in the cluster will always be up-to-date.For AWS users``For AWS users you may want to add the lines:``ExecStartPost=/bin/sh -c "etcdctl set /hosts/$(hostname | awk -F. '{print $1}') $PRIVATEIP" ExecStopPost=/bin/sh -c "etcdctl rm /hosts/$(hostname | awk -F. '{print $1}') $PRIVATEIP"``This is because calling hostname may return a host looking like node1.us-east-1.compute.internal, and some applications require only node1.``Further Reading``[CoreOS Essentials](http://amzn.to/1NW3xGG)