Performance testing our Asynchronous PostgreSQL library

13/11/18 — capitol

We have done some performance testing between the pgjdbc driver and the pgAdba driver for PostgreSQL.

results

The test was performed 20 times for each number of threads, with a warm-up set before. The raw data can be found as text or csv

A linear regression gives that pgAdba is 16.6% faster than pgjdbc.

Method

The test was designed to expose the connection library, rather than the postgresql database itself. So if your application does a large number of very cheap sql queries then this might be relevant.

The test is done by performing a large number of http requests against an spring boot web server, for each request the server takes the url of the requests, sends it to the database server as a string, which echo the request right back without any disk I/O, and the server returns the url as a string to the client.

The test code for the async library pgAdba is:

    @RequestMapping("/{val}")
    public CompletableFuture<String> index(@PathVariable("val") String val) {
        Submission<List<RowColumn>> sub = session.<List<RowColumn>>rowOperation("select $1 as t")
            .set("$1", val, AdbaType.VARCHAR)
            .collect(Collectors.toList())
            .submit();

        return sub.getCompletionStage().thenApply(rc -> rc.get(0).at("t").get(String.class)).toCompletableFuture();
    }

and the code for the blocking pgjdbc driver looks like this:

    @RequestMapping("/{val}")
    public String index(@PathVariable("val") String val) throws SQLException {
        PreparedStatement ps = connection.prepareStatement("select ? as t");
        ps.setString(1, val);
        ResultSet rs = ps.executeQuery();
        if(rs.next()) {
            return rs.getString("t");
        } else {
            throw new RuntimeException("error");
        }
    }

The machine that the test ran on had a local database server, that the web server communicated with over localhost, for minimal latency between the web server and the database server.

Configuration and setup of the test

System configuration to ensure that we don’t run out of open sockets while the test runs: echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range

sysctl -w net.ipv4.tcp_tw_reuse=1

System Hardware:

  • CPU: Intel i7-4500U 1.8 GHz
  • 8 gig of ram

Software versions:

  • OS: Ubuntu 18.04.1 LTS
  • Spring boot: 2.0.5.RELEASE
  • pgjdbc: 42.2.5
  • pgAdba: 0.1.0-ALPHA
  • postgresql: 10.5-0ubuntu0.18.04

jmeter command: jmeter -n -t jmeter-performance-test-5.jmx -l /tmp/res-5 -e -o /tmp/out-5

Java command to run spring boot: java -server -XX:+UseParallelGC -XX:+AggressiveOpts -XX:+UseLargePages -Xmn1g -Xms2g -Xmx2g -jar target/pgadba-example-application-spring-boot-0.1.0.jar

pgAdba code here

pgjdbc code here

The image was produced with R like this

mydata <- read.csv(file="/home/capitol/project/hackeriet/blog/assets/performance-data-pgadba.csv", header=TRUE, sep=",")

library(ggplot2)

ggplot() + geom_smooth(data=mydata, aes(x = threads, y = async, colour = "pgAdba"))
 + geom_smooth(data=mydata, aes(x = threads, y = sync, colour = "pgjdbc"))
  + xlab("Number of threads") + ylab("Queries per second") + ggtitle("")
   + geom_point(data=mydata, aes(x = threads, y = async, colour = "pgAdba"))
    + geom_point(data=mydata, aes(x = threads, y = sync, colour = "pgjdbc"))
     + expand_limits(x = 0, y = 0) + theme_light()

Simple and Extended queries in PostgreSQL

11/11/18 — capitol

elephant-family

Simple and Extended queries in PostgreSQL

The postgresql wire format have two mechanism to send queries to the database, the simple and the extended protocol. Lets go over how they work and the difference between them.

Simple protocol

Queries are sent as plain text to the server, without any parameters. It’s also allowed to send multiple queries in the same frame by separating them with ;.

This makes the simple protocol a bit more vulnerable to sql injection attacks.

The query looks like this on the wire:

type marker length query
byte int32 str
‘Q’ 8 select 1

The server can respond with one or more of

  • CommandComplete
  • CopyInResponse
  • CopyOutResponse
  • RowDescription
  • DataRow
  • EmptyQueryResponse
  • ErrorResponse
  • ReadyForQuery
  • NoticeResponse

We will not go into detail about the server responses in this blog post

Extended protocol

The extended protocol is a bit more complex, but in return you get better type safety and less exposure to sql injections.

The sending of queries is split into five steps

  • Parse
  • Describe
  • Bind
  • Execute
  • Sync

Parse

The parse message contains the query, with placeholders on the format $<number>, for example $1.

The parse message looks like this on the wire:

type marker length name query num params param oid
byte int32 str str int16 int32
‘P’ 23 q1 select $1 1 5

In hex: 500000001771310073656c65637420243100000100000017

An oid is an id of a type in postgresql. A complete list of all the standard types can be retrieved by doing select * from pg_type.

The name is optional, if it’s specified the query is saved in the connection context and the parsed query can be reused, otherwise it’s saved in the unnamed statement slot and will be overwritten by the next unnamed parse message or simple query.

Describe

The describe message is used to get a description of the statement or portal in order to know the types of the returned columns.

type marker length type name
byte int32 ‘S’ or ‘P’ str
‘D’ 8 P p1

In hex: 440000000853713100

Bind

Once the query is parsed it’s possible to bind parameters to it with a bind message.

This creates an output portal.

type marker length portal query num formats format num parameters parameter length value
byte int32 str str int16 int16 int16 int16 str or bytes
‘B’ 26 p1 q1 1 1 1 1 1

In hex: 420000001a70310071310000010001000100000004000000010000

The format of a parameter can either be 1 or 0 indicating if the parameter is formatted as text or binary.

Execute

Once everything has been setup by the other frames it’s time to execute the query and let the server do the actual work.

type marker length portal row limit
byte int32 str int32
‘E’ 11 p1 0

In hex: 450000000b70310000000000

Sync

The sync message signals that the frontend is finished and that the backend can close the current transaction unless the query is in a begin transaction block.

type marker length
byte int32
‘S’ 4

In hex: 5300000004

And after this the query is finished and it’s possible to start over with the next query.

Release of pgAdba, an asynchronous connection library for PostgreSQL

09/11/18 — capitol

coffee-elephant

Release of pgAdba, an asynchronous connection library for PostgreSQL

The jdbc group have been working on building an api for doing asynchronous sql queries against a database, called ADBA.

The purpose of this api is to reduce overhead when doing database intensive tasks. Take a standard http request as an example, we can break down it’s lifecycle into four phases

1) request arrives to the application server over http 2) application code collects data to be able to render the output 3) output is rendered into a string 4) said string is sent over the network to the client

Point 1 and 4 is already handled asynchronous on modern application servers and 3 is only limited by cpu speed. Left is the collection of data to fulfill the rendering and in many cases this involve sending queries to a database in a blocking manner. That means that we have an application thread that starts processing, send a query to the database and then sits and waits until the database responds.

If we instead could hand the query/queries of to the connection library and receive a future that the connection library completes when the database answers we can let the thread work on something else in the meantime. That way reducing the number of threads needed, thus minimizing memory overhead and context switching.

The pgAdba library implements a useful subset of this proposed API for postgresql, the plan is to implement the whole api, but it’s better to release early and often than waiting for perfection.

Architectural changes from JDBC

Asynchronous

A useful mental model for how the programming against the api work are two parallel threads, the one the user controls build up one or more queries, called Operations, fills them with parameter data and everything else that’s needed and hands them over to the other thread. It gets a future in return, in the form of a Submission.

The other thread is responsible for the network communication, and sends the query to the database. On return it completes the future, either normally or in case of an error exceptionally.

Queries are not parsed by the connection library

That both the connection library and the database should parse the query is a waste of effort, therefor the pgAdba library never parses the sql query string and instead sends it as-is to the database server.

This removes the need for complicated parsing and escaping logic in the driver, and makes debugging problems with the query simpler, as you can know that what you as a developer see and what the server see are the same.

Minimal amount of state inside the library

The library manages the state of the connections to the database, but everything related to updatable ResultSets have been removed.

Technical improvements

Query pipelining

By using the extended query protocol in the postgresql wire format we can ensure that we get one answer for every query we send. This enables us to start sending query number two before we have received the answer to the first query.

This really helps with throughput, especially in situations where there is increased latency between the database server and the application server.

Use of java.time date classes

The new time classes that arrived in java 8 are a huge improvement over java.util.Date and they are first class citizens in this library.

What remains

A lot! Suggestions, bug reports and pull requests are very welcome.

The version number includes the -ALPHA denomination, this has two meanings. The first is that the API published under the jdk.incubator.sql2 namespace isn’t stable and will change in future releases. The other is that the driver itself isn’t tested under production workloads and there will be bugs.

But regardless of that it’s possible to experiment and get a feeling for how the proposed API works.

How to test it out

Here is a complete example of an small REST based webserver application built with Spring Boot.

Or just include the maven dependency in your project:

Maven

<dependency>
  <groupId>org.postgresql</groupId>
  <artifactId>pgadba</artifactId>
  <version>0.1.0-ALPHA</version>
</dependency>

Gradle

compile 'org.postgresql:pgadba:0.1.0-ALPHA'

Talk 15 Sept: Functional Embedded Programming on the ESP8266

15/09/18 — sgo

esp-alien

Jonas S Karlsson[1] is a visiting hacker from Hong Kong, and he’s been part of creating hackerspaces in San Fransisco, Sydney, Hong Kong, Shanghai and various makerfaires.

He will show his implementation[2] of a LISP-based, funny functional programming language on the ESP8266 $3 wifi computer.

The talk is not a typical functional type-system theorem proving session, but more about the practical implementation using no-nonsense C, with efficient memory usage. Including a practical demo.

Hope to see you there.

Workshop 10-11 Oct: Build an open source mesh node

28/08/18 — sgo

mesh-workshop

WORKSHOP:

(10-11 Oct. at Hackeriet, 18:00++, bring a raspi, free)

benhylau from tomesh.net will be in oslo, so we’re going to do a few days of building mesh nodes that work without centralized architecture.

More info: Oslo Mesh Week

KEYWORDS:

  • cjdns, yggdrasil, ipv6
  • ipfs, secure scuttlebutt
  • raspberry pi, wifi, 802.11s

What do I need to bring?

  • laptop
  • (optional) raspberry pi 3b or 3b+ with power adapter

Interesting..!

ok. go to #oslohackerspace on Freenode

TODO: post meetup.com link