Setting up a development environment for Go

A short blog post on how I’ve setup my development environment so I can develop on Go.

I’ve limited myself to FreeBSD (amd64) and Vim. However most of this setup should also work on other Unix environments and editors (e.g Emacs).

First step: install Go specific vim plugins. The Go source code ships with vim plugins which can be found in $GOROOT/misc/vim. The readme.txt explains the different options to install the plugins. You can soft link to this directory or copy everything straight into your ~/.vim/ directory. These plugins enable syntax highlighting, indentation and documentation lookup.

The second step is to enable you to jump around the code and go to definitions of specific functions. To enable this I use the (rather) well known ctags tools. A version of ctags which support Go can be found here: https://github.com/lyosha/ctags-go.git

Installations is simple, make install (watch out this overwrites previously installed ctags in /usr/local/bin). After this do a `ctags –recurse –sort=yes` at the top of your source tree, this will generater a tags file which can will be used by vim to lookup locations of code definitions. (C-[ : jump to definition, C-t : jump back)

The third step (last) is code completions. This can be done with a plugin called Omni Complete in combination with the Go autocompletion daemon. If your Go environment variables are setup correctly, the following command should do the trick: go get -u github.com/nsf/gocode.  The daemon works in combination with the Go vim plugins. In Vim C-x C-o will open the menu with context sensitive suggestions.

Happy coding!

 

Advertisements

Varnish reordering query string

(Update: now listed as a module on the official Varnish site)
(Update: this code is being used in production without any problems in several companies I worked for)

In Varnish the URL is the key to the caching. If it recognises a previously requested URL it will look if it’s available in its cache and deliver this back.  There is a small problem with URLs which have parameters. Take a look at the following queries:

http://localhost/test?ddd=444&bbb=222&ccc=333&aaa=111
http://localhost/test?ccc=333&aaa=111&ddd=444&bbb=222
http://localhost/test?bbb=222&ccc=333&ddd=444&aaa=111

Each of them will return the same result, the parameters are the same, only the order is different.  Varnish treats each of them as a different query and will, in this case, do three separate requests to the backend and cache all of them.

To deal with this I’ve written a small bit of C code that can be embedded in the varnish configuration file which will order the parameters so URLs with unordered parameters will become the ordered and therefor have an equal cache key.

To follow our example the three URLs all get ordered like this:

http://localhost/test?aaa=111&bbb=222&ccc=333&ddd=444

How it works:
I tokenise the url, put the parameters in a binary tree and do an in order traversal to get them out again. Performance of this method is on average O(n log(n)).

Code can be found here: https://github.com/cyberroadie/varnish-urlsort
It has a FreeBSD license so please feel free to use it. One warning though: I haven’t used it in a live environment, so do take care!!

Olivier

Update: As suggested on the Varnish mailing list it now also compiles as a Varnish module and can be used like this (once installed):

import urlsort;
sub vcl_recv {
  set req.url = urlsort.sortquery(req.url);
}

Update:

Checked memory usage with valgrind, no leaks 🙂

valgrind -v --dsymutil=yes ./urlsort "http://localhost/test?ddd=444&bbb=222&ccc=333&aaa=111"
==66758== 
==66758== HEAP SUMMARY:
==66758== in use at exit: 6,241 bytes in 33 blocks
==66758== total heap usage: 39 allocs, 6 frees, 6,445 bytes allocated
==66758== 
==66758== Searching for pointers to 33 not-freed blocks
==66758== Checked 488,280 bytes
==66758== 
==66758== LEAK SUMMARY:
==66758== definitely lost: 0 bytes in 0 blocks
==66758== indirectly lost: 0 bytes in 0 blocks
==66758== possibly lost: 0 bytes in 0 blocks
==66758== still reachable: 6,241 bytes in 33 blocks
==66758== suppressed: 0 bytes in 0 blocks
==66758== Rerun with --leak-check=full to see details of leaked memory
==66758== 
==66758== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
--66758-- 
--66758-- used_suppression: 1 OSX107:blah
==66758== 
==66758== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)

London travel iPhone apps

So here is a list of essential iPhone apps you need when travelling in London.

iPhone London travel apps

Before you start your journey check the tube status for line and station closures and the state of the service, particularly handy during the weekends when there’s lots of planned engineering work.

Plan your route with TubeMap, it also tells you departure times, has ‘find station’ and a ‘shortest route calculator’ functionality.

Tube Exits

Tube Exits is pure genius. It tells you what tube carriage you have to get on so you walk the shortest distance underground when changing lines or finding the exit. The developer of this app travelled the whole London Undergound the find the shortest route in all the stations.

Travel Deluxe app

Travel Deluxe helps you plan your journey through London on any public transport available. (Yes Boris bikes too).

With the London Cycle app you can find the closet docking station near you of the Barclay Cycle Hire scheme. It also tells you how many bikes and empty spaces are available per station. Comes with a timer so you can keep an eye on the costs.(<30mins is free + days access).

Bus Checker app

And last but not least: London Bus Checker. It helps you locate the bus stop you’re at and tells you how long it takes for the bus to arrive. Comes with map and bus routes.

One-to-one or one-to-many? That is the question!

Overview

There are two models of programming in SCTP, one-to-one and one-to-many. The one-to-many comes with all SCTP has to offers, the one-to-one model only a limited set of features can be implemented. However implementing the one-to-one model is very similar to how you would implement the same functionality in TCP. This makes migrating an existing application a relatively painless exercise. If you want to upgrade your existing TCP application to an one-to-many SCTP application significant retooling is needed [1].

Spot the difference

The easiest way to spot the difference between the two is by looking at how the endpoint for communication is created:

One-to-One style

sd = socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP);

SOCK_STREAM stands for stream socket, the data stream is clearly associated with one socket.

One-to-Many style

sd = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP);

SOCK_SEQPACKET stands for sequenced packet stream. The data stream is sequenced but there is no mention of a socket in the name.

The slightly less obvious way to see the difference is the way the connection between the client and server is set up.

Program Flow

Here is the one-to-one abstraction model:

One to One TCP lookalike

The server creates a passive socket with a listen() followed by an accept() and waits for a connection to come in.  The client creates an active socket and establishes a connection with a connect(). The moment accept() gets a connection request it creates a new socket and allocates a new file descriptor. The association between the two systems gets created explicitly and incoming connections are handled iteratively.

This is the one-to-many model:

One to many, full on SCTP

After the listen() a connection can come in from multiple clients. The association between the systems will be set up implicitly as soon as the client sends a message. As soon as the client closes the association the server releases the association resources too.

So what’s the difference?

Sending data during connection setup

Only the one-to-many is capable of sending data on the third leg of the four-way handshake SCTP does during connection setup [3].  This speeds up data transmission.

Iterative or Concurrent? (another question)

With the one-to-many model multiple associations can transport data over the same socket. The different connections are handled iteratively. With the sctp_peeloff() function an association can be detached in its own separate socket and/or thread. So if you want to you can use a concurrent model.

Connection State

Because of the connectionless nature of the one-to-many mode a lot of the connection state gets handled by the underlying SCTP transport stack and is of no concern for the application.

Conclusion

The one-to-many model has many advantages; it gives you a clear choice on how to handle your connections. It is even possible to combine an iterative server model with a concurrent one. Data can already be sent whilst setting up the association. And last but not least the application has less connection state to maintain.

The one-to-one model on the other hand not only gives you an easy migration path from an existing TCP application it also makes it easy to switch between TCP and SCTP in the same application; the only difference is the socket() call and maybe a setsockopt().

In a following post I will get into more detail on how to implement this. A good book with lots of examples and detailed explanation on how SCTP (and other protocols) work is Unix Network Programming, well worth a read.

References

[1][2] W. R. Stevens, B. Fenner, and A. M. Rudoff, Unix Network Programming: Sockets Networking API v. 1, 3rd ed. Addison Wesley, 2003. p.267, p271

[3] “sctp,” FreeBSD Man Pages. [Online]. Available: http://www.freebsd.org/cgi/man.cgi?query=sctp&manpath=FreeBSD+8.2-RELEASE. [Accessed: 24-Dec-2011].

Zookeeper Single File Leader Election with Retry Logic

In a previous post I explained how to implement leader election as suggested on the Zookeeper website. I posted my solution on the Zookeeper mailing list and got some useful tips. The first one was on how to do leader election. The method suggested is a single file leader election. Works like this:

  1. Try creating ephemeral ZNode
    1. Succes, become leader
    2. Fail, stay inactive
  2. Set watch on ZNode
  3. If ZNode disappears goto 1.

All clients try create the same ephemeral node, which has to be unique, so only one client will be able to create the node. The client who creates the node first becomes the leader, the rest  of the clients stay inactive and wait for the node to disappear before trying to create a node again.

The second suggestion I was given is to keep the connection to Zookeeper in some sort of retry logic. Assuming things will go wrong we have to make sure there is a system in place which can recover from a bad situation.
Here a class diagram of all objects involved:
Class diagram single file leader election
The following sequence diagram explains the flow of the program in case we have a successful election, followed by a signal indicating the node has disappeared.

The ZnodeMonitor implements the Watcher interface which has the process method. As soon as the ZNodeMonitor is set as  the Watcher the Zookeeper can talk back to it in case something changes.

First the client connects to the zookeeper by constructing a RecoverableZookeeper. This happens in the start method, called by the SpeakerServer:

public void start() throws IOException {
    this.zk = new RecoverableZookeeper(connectionString, this);
}

The ZNodeMonitor sets itself as a Watcher (this). This gives Zookeeper the opportunity to call back the ZNodeMonitor as soon as it is connected to the server. It does this by sending a None event type with a SyncConnected state.

@Override
public void process(WatchedEvent watchedEvent) {
    switch (watchedEvent.getType()) {
        case None:
            processNoneEvent(watchedEvent);
            break;
        case NodeDeleted:
            listener.stopSpeaking();
            createZNode();
    }
    try {
        zk.exists(ROOT, this);
    } catch (Exception e) {
        shutdown(e);
    }
}

public void processNoneEvent(WatchedEvent event) {
    switch (event.getState()) {
        case SyncConnected:
            createZNode();
            break;
        case AuthFailed:
        case Disconnected:
        case Expired:
        default:
            listener.stopSpeaking();
            break;
    }
}

Next the client tries to create a ZNode:

public void createZNode() {
    try {
        zk.create(ROOT, listener.getProcessName().getBytes());
        listener.startSpeaking();
    } catch (Exception e) {
        // Something went wrong, lets try set a watch first before
        // we take any action
    }
}

After this (successful or not) it tries setting a watch via the exist() method (also in the process method, see above).

The retry logic is encapsulated in the RecoverableZookeeper. The create() method wraps the original Zookeeper create method in the following retry logic:

public String create(String path, byte[] data) throws KeeperException, InterruptedException {
    RetryCounter retryCounter = retryCounterFactory.create();
    while (true) {
        try {
            return zk.create(path, data, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
        } catch (KeeperException e) {
            logger.debug("Error code: " + e.code());
            switch (e.code()) {
                case NODEEXISTS:
                    if (retryCounter.shouldRetry()) {
                        byte[] currentData = zk.getData(path, false, null);
                        if (currentData != null && Arrays.equals(currentData, data)) {
                            return path;
                        }
                        throw e;
                    }
                    throw e;
                case CONNECTIONLOSS:
                case OPERATIONTIMEOUT:
                    if (!retryCounter.shouldRetry()) {
                        break;
                    }
                default:
                    throw e;
            }
        }
        retryCounter.sleepUntilNextRetry();
        retryCounter.useRetry();
}

If the creation of the Znode fails it retries until the reties get exhausted. In case the node already exists it first checks if it created the node itself, otherwise it will throw an exception. In case of a loss of connection or an operation timeout, it keeps retrying.

The complete code is available in my github account. Any questions or suggestions please feel free to send me an email or post a comment. This code also include some simple Ruby scripts to create configuration files and to start and stop multiple Zookeeper servers on a single machine. Handy if you want to test different scenarios.

For a full implementation of retry logic with Zookeeper I recommend the Netflix Zookeeper Library (Curator) which implements all of this and much more and is tested in a large scale environment.

Code Gem: Retry Logic in HBase

Found this retry logic whilst reading HBase codebase. It encapsulates retry logic in what I think is an elegant manner. It keeps track of retries and the time between retries grows exponential.

This is how you can use it in your code:

public String doAndRetryThis() throws InterruptedException {
    RetryCounter retryCounter = retryCounterFactory.create();
    while (true) {
        try {
            return doThis();
        } catch (Exception e) {
            if (!retryCounter.shouldRetry()) {
                throw e;
           }
        }
        retryCounter.sleepUntilNextRetry();
        retryCounter.useRetry();
    }
}

And this is how it is implemented:


public class RetryCounter {
 private static final Log LOG = LogFactory.getLog(RetryCounter.class);
 private final int maxRetries;
 private int retriesRemaining;
 private final int retryIntervalMillis;
 private final TimeUnit timeUnit;

public RetryCounter(int maxRetries,
 int retryIntervalMillis, TimeUnit timeUnit) {
 this.maxRetries = maxRetries;
 this.retriesRemaining = maxRetries;
 this.retryIntervalMillis = retryIntervalMillis;
 this.timeUnit = timeUnit;
 }

public int getMaxRetries() {
 return maxRetries;
 }

/**
 * Sleep for a exponentially back off time
 * @throws InterruptedException
 */
 public void sleepUntilNextRetry() throws InterruptedException {
 int attempts = getAttemptTimes();
 long sleepTime = (long) (retryIntervalMillis * Math.pow(2, attempts));
 LOG.info("The " + attempts + " times to retry after sleeping " + sleepTime
 + " ms");
 timeUnit.sleep(sleepTime);
 }

public boolean shouldRetry() {
 return retriesRemaining > 0;
 }

public void useRetry() {
 retriesRemaining--;
 }

 public int getAttemptTimes() {
 return maxRetries-retriesRemaining+1;
 }
}

And it comes with its little Factory:

public class RetryCounterFactory {
 private final int maxRetries;
 private final int retryIntervalMillis;

public RetryCounterFactory(int maxRetries, int retryIntervalMillis) {
 this.maxRetries = maxRetries;
 this.retryIntervalMillis = retryIntervalMillis;
 }

public RetryCounter create() {
 return new RetryCounter(
 maxRetries, retryIntervalMillis, TimeUnit.MILLISECONDS
 );
 }
}

Code can be found here and here

Implementing Leader Election with Zookeeper

Update: An alternative implementation can be found in my next blogpost: Single file leader election with retry logic.

In the following post I will demonstrate how to create a distributed application where one of the running processes gets elected as the master process with Apache Zookeepers help. The application is very straightforward, a single process writes a unique identifier to a file. Multiple of these applications can be started but only one is allowed to write to the output file. In case of failure a new leader will be elected.

(For a brief explanation on how Apache Zookeeper works see my previous post)

Recipe

To implement this I followed a recipe from the Apache Zookeeper website, which suggest the following algorithm for leader election:

Let ELECTION be a path of choice of the application. To volunteer to be a leader:

  1. Create znode z with path “ELECTION/n_” with both SEQUENCE and EPHEMERAL flags;
  2. Let C be the children of “ELECTION”, and i be the sequence number of z;
  3. Watch for changes on “ELECTION/n_j”, where j is the smallest sequence number such that j < i and n_j is a znode in C;

Upon receiving a notification of znode deletion:

  1. Let C be the new set of children of ELECTION;
  2. If z is the smallest node in C, then execute leader procedure;
  3. Otherwise, watch for changes on “ELECTION/n_j”, where j is the smallest sequence number such that j < i and n_j is a znode in C;

Prerequisites

  • Object representing the Zookeeper server (or the connection to it)
  • Object implementing the Watcher interface: Something the Zookeeper server can send messages to
  • A running program: main() {while(true) doTask()}
  • A running instance of Zookeeper
The whole thing will run on localhost.

Querying Zookeeper

Before we start a quick overview how to query Zookeeper so you can see what’s going on:
The Zookeeper stack contains a command line tool which you can use to connect to a Zookeeper server and query it. It’s kind of similar to mysql client connecting to mysqld. To start the client connecting to a Zookeeper server running on localhost:
./zkCli.sh  -server localhost:2181
And here a list of useful commands to query the data tree, it has tab completion to make life easier:
  • create [path] [data] – creates a znode:
    create /test test_data
  • ls [path] – print a path:
    ls /test
  • stat [path] – get the metadat of a specific znode:
    stat /test
  • delete [path] – deletes a znode
    delete /test

Running the Code

The whole application is available on GitHub 
It’s a maven project and can be build with the following command:
mvn clean install
And can be run from the command line or within your favourite IDE.
Once you’ve started Zookeeper and the Speaker Server you can see whats happening in Zookeeper by issuing the following command in zkCLI:

ls /ELECTION

It will list the nodes created by every instance run.

Design

The whole application exists out of three classes. The first class, the Speaker, does the work and writes it’s process id and an incremental counter to a file (out.txt). It writes to the file if the canSpeak flag is set to true (defaults to false).
The second class is a monitoring class (NodeMonitor). It talks a-synchronous to the Zookeeper server and depending on the response and the state of the connection decides if the Speaker has to start or stop writing to the output file.
The third class is is the server. It instantiates the Speaker and the NodeMonitor, adds the Speaker as a listener to the NodeMonitor and start the whole thing up in a separate fixed scheduled thread. The scheduled thread will run the Speaker every second (1 second is the default but can be specified on the command line).

How it works

Starting the thread:

(SpeakerServer.java)
Speaker speaker = null;
try {
    speaker = new Speaker(args[0]);
    monitor = new NodeMonitor();
    monitor.setListener(speaker);
} catch (Exception e) {
    e.printStackTrace();
    System.exit(1);
}

scheduler.scheduleWithFixedDelay(speaker, 0, delay, TimeUnit.MILLISECONDS);

First the Speaker (which implements the Runnable interface) gets set as a listener on the monitor. This gives the monitor, who talks talks with the Zookeeper a callback point.
The Monitor defines an interface which a listener (in this case the Speaker) has to implement, so the monitor knows how to talk back to the listener:
(NodeMonitor.java)
public interface NodeMonitorListener {
    public void startSpeaking();
    public void stopSpeaking();
    public String getProcessName();
}
In the NodeMonitor constructor we create a Zookeeper instance:
(NodeMonitor.java):
public NodeMonitor() throws IOException, InterruptedException, KeeperException {
        this.zooKeeper = new ZooKeeper("localhost:2181", 3000, this);
}
After connecting to the Zookeeper server a SyncConnected event get sent back to the Watcher (NodeMonitor implementing Watcher interface) via the process method:
(NodeMonitor.java)
 

public void processNoneEvent(WatchedEvent event) {
    switch (event.getState()) {
        case SyncConnected:
            try {
                createRootIfNotExists();
                zooKeeper.getChildren(ROOT, true, this, null);
                sequenceNumber = createZnode();
...
If the root znode doesn’t exists it gets created and after that we set a watch on the root node (/ELECTION) via zooKeeper.getChildren() method. We pass ourself to this method so ZooKeeper can do a asynchronous callback via the processResult() method. After that we create the znode underneath /ELECTION representing this running instance.
The essence of the a-synchronous callback method use by zookeeper to signal any changes on the children of the watched node:
(NodeMonitor.java)
@Override
public void processResult(int i, String s, Object o, List children) {
....
    if(getLowestNumber(children) == sequenceNumber)
        listener.startSpeaking();
    else
        listener.stopSpeaking();
....
}
If the lowest sequence number equals its own sequence number, the server become the leader and will start talking.
The NodeMonitor implements the AsyncCallback.ChildrenCallback interface. This because the watch has been set via the getChildren() method. Every method who sets a watch has it’s own callback method. E.g. if you use exists() as the method to set the watch you need the AsyncCallback.StatCallback.
It’s a very straight forward application and I hope it clarifies to workings of Zookeeper and how to implement it in your Java application. Any questions, ping me and I’ll be more than  happy to answer your questions.