Better password hashing in PostgreSQL with SCRAM-SHA-256

19/11/18 — capitol

elephant-hashing

Many connections to PostgreSQL servers are not protected by TLS and for those it’s important that the password isn’t sent as clear text over the network.

PostgreSQL has supported MD5 hashing with salt for a long time, and as we all know MD5 is considered very broken by today’s standards. Not only is MD5 weak, the login sequence only contains 32 bits of new entropy per connection, so if you can listen to multiple connection attempts then you can easily perform a replay attack on the MD5 packet.

Let’s look on how an MD5 based login sequence looks.

Client sends a startup message

length protocol param name param value
int32 int32 str str
88 3 user test

The first element is just a standard length, the second is the protocol version that the client wants to use. PostgreSQL has been on protocol version 3 since version 7.4. One interesting thing about the protocol field is that it’s reused when trying to start a TLS connection. The value 80877103 means that a TLS connection will be initiated instead.

After that a number of parameter name/value pairs are sent. The most important for this topic is the user one, that sets the username.

The packet looks like this as a hex dump:

0000   00 00 00 58 00 03 00 00 75 73 65 72 00 74 65 73
0010   74 00 64 61 74 61 62 61 73 65 00 74 65 73 74 00
0020   61 70 70 6c 69 63 61 74 69 6f 6e 5f 6e 61 6d 65
0030   00 6a 61 76 61 5f 73 71 6c 32 5f 63 6c 69 65 6e
0040   74 00 63 6c 69 65 6e 74 5f 65 6e 63 6f 64 69 6e
0050   67 00 55 54 46 38 00 00

Server responds with an Authentication request

type marker length type type specific data
byte int32 int32 various, int32 with md5
‘R’ 12 5 0xfce5c980

What the server responds with is controlled by the pg_hba.conf file. In this case it has a line that looks like this that instructs it to use MD5:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    all             all             ::1/128                 md5

The type specific data in case of MD5 is a salt value that should be hashed together with the password.

The packet looks like this as a hex dump:

0000   52 00 00 00 0c 00 00 00 05 fc e5 c9 80

Client sends a Password message

type marker length value
byte int32 str
‘p’ 40 md5374c5f834de33f6297af8f17aa050229

The hashing method in use here is "md5" + md5(md5(user + password) + salt).

The packet looks like this as a hex dump:

0000   70 00 00 00 28 6d 64 35 33 37 34 63 35 66 38 33
0010   34 64 65 33 33 66 36 32 39 37 61 66 38 66 31 37
0020   61 61 30 35 30 32 32 39 00

Improved security with SCRAM-SHA-256

In order to improve the situation the hashing system SCRAM-SHA-256 was introduced, as defined in RFC 5802 and RFC 7677.

It’s a more complicated protocol, but it’s significantly more secure.

Let’s go over the packet exchange of that also.

Client sends a startup message

Same as above, not repeated.

Server responds with an Authentication request

type marker length type type specific data
byte int32 int32 various, str with SASL
‘R’ 23 10 SCRAM-SHA-25

This authentication system uses the SASL packets, types SASL (10), SASL_CONTINUE (11) and SASL_FINAL (12).

The type specific data is a list of mechanisms which the client can choose one of, in this case it has only one field.

The packet looks like this as a hex dump:

0000   52 00 00 00 17 00 00 00 0a 53 43 52 41 4d 2d 53
0010   48 41 2d 32 35 36 00 00

Client responds with the first password message

type marker length mechanism length parameter string
byte int32 str int32 str
‘p’ 54 SCRAM-SHA-256 32 n,,n=,r=/z+giZiTxAH7r8sNAeHr7cvp

When sending a password packet back as part of a SASL exchange it has a few more fields. Most notable is the parameter string, whose content is determined by RFC 5802.

Notable is that this message also contains a username in the attribute n, but the server doesn’t use that so we leave it blank.

The attribute r is the client specified nounce, in other words the entropy that the client supplies.

The packet looks like this as a hex dump:

0000   70 00 00 00 36 53 43 52 41 4d 2d 53 48 41 2d 32
0010   35 36 00 00 00 00 20 6e 2c 2c 6e 3d 2c 72 3d 2f
0020   7a 2b 67 69 5a 69 54 78 41 48 37 72 38 73 4e 41
0030   65 48 72 37 63 76 70

Server sends a SASL CONTINUE message

type marker length type type specific data
byte int32 int32 various, str with SASL CONTINUE
‘R’ 92 11 r=/z+giZiTxAH7r8sNAeHr7cvpqV3uo7G/bJBIJO3pjVM7t3ng,s=4UV68bIkC8f9/X8xH7aPhg==,i=4096

The server asks us to perform 4096 iterations of the hashing in attribute i, the RFC specifies that the complete exchange without network lag should take at least 0.1 seconds. So this value will likely increase with future versions of PostgreSQL.

In attribute r the server has taken the entropy that the client supplied and added its own.

s is a server generated salt for the user.

The packet looks like this as a hex dump:

0000   52 00 00 00 5c 00 00 00 0b 72 3d 2f 7a 2b 67 69
0010   5a 69 54 78 41 48 37 72 38 73 4e 41 65 48 72 37
0020   63 76 70 71 56 33 75 6f 37 47 2f 62 4a 42 49 4a
0030   4f 33 70 6a 56 4d 37 74 33 6e 67 2c 73 3d 34 55
0040   56 36 38 62 49 6b 43 38 66 39 2f 58 38 78 48 37
0050   61 50 68 67 3d 3d 2c 69 3d 34 30 39 36

Client performs the hashing and returns another password message

type marker length parameter string
byte int32 str
‘p’ 108 c=biws,r=/z+giZiTxAH7r8sNAeHr7cvpqV3uo7G/bJBIJO3pjVM7t3ng,p=AFpSYH/K/8bux1mRPUwxTe8lBuIPEyhi/7UFPQpSr4A=

Here the client has done the actual work of hashing the password and sending the proof of that in attribute p.

c is channel binding data. This isn’t used by the pgAdba connection code yet, so we will come back to this in the future.

The packet looks like this as a hex dump:

0000   70 00 00 00 6c 63 3d 62 69 77 73 2c 72 3d 2f 7a
0010   2b 67 69 5a 69 54 78 41 48 37 72 38 73 4e 41 65
0020   48 72 37 63 76 70 71 56 33 75 6f 37 47 2f 62 4a
0030   42 49 4a 4f 33 70 6a 56 4d 37 74 33 6e 67 2c 70
0040   3d 41 46 70 53 59 48 2f 4b 2f 38 62 75 78 31 6d
0050   52 50 55 77 78 54 65 38 6c 42 75 49 50 45 79 68
0060   69 2f 37 55 46 50 51 70 53 72 34 41 3d

Server ends the authentication exchange with a SASL FINISH message

type marker length type type specific data
byte int32 int32 various, str with SASL FINISH
‘R’ 54 12 v=d1PXa8TKFPZrR3MBRjLy3+J6yxrfw/zzp8YT9exV7s8=

The server sends a server signature so that the client can verify the server in attribute v.

The packet looks like this as a hex dump:

0000   52 00 00 00 36 00 00 00 0c 76 3d 64 31 50 58 61
0010   38 54 4b 46 50 5a 72 52 33 4d 42 52 6a 4c 79 33
0020   2b 4a 36 79 78 72 66 77 2f 7a 7a 70 38 59 54 39
0030   65 78 56 37 73 38 3d

Performance testing our Asynchronous PostgreSQL library

13/11/18 — capitol

We have done some performance testing between the pgjdbc driver and the pgAdba driver for PostgreSQL.

results

The test was performed 20 times for each number of threads, with a warm-up set before. The raw data can be found as text or csv

A linear regression gives that pgAdba is 16.6% faster than pgjdbc.

Method

The test was designed to expose the connection library, rather than the postgresql database itself. So if your application does a large number of very cheap sql queries then this might be relevant.

The test is done by performing a large number of http requests against an spring boot web server, for each request the server takes the url of the requests, sends it to the database server as a string, which echo the request right back without any disk I/O, and the server returns the url as a string to the client.

The test code for the async library pgAdba is:

    @RequestMapping("/{val}")
    public CompletableFuture<String> index(@PathVariable("val") String val) {
        Submission<List<RowColumn>> sub = session.<List<RowColumn>>rowOperation("select $1 as t")
            .set("$1", val, AdbaType.VARCHAR)
            .collect(Collectors.toList())
            .submit();

        return sub.getCompletionStage().thenApply(rc -> rc.get(0).at("t").get(String.class)).toCompletableFuture();
    }

and the code for the blocking pgjdbc driver looks like this:

    @RequestMapping("/{val}")
    public String index(@PathVariable("val") String val) throws SQLException {
        PreparedStatement ps = connection.prepareStatement("select ? as t");
        ps.setString(1, val);
        ResultSet rs = ps.executeQuery();
        if(rs.next()) {
            return rs.getString("t");
        } else {
            throw new RuntimeException("error");
        }
    }

The machine that the test ran on had a local database server, that the web server communicated with over localhost, for minimal latency between the web server and the database server.

Configuration and setup of the test

System configuration to ensure that we don’t run out of open sockets while the test runs: echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_range

sysctl -w net.ipv4.tcp_tw_reuse=1

System Hardware:

  • CPU: Intel i7-4500U 1.8 GHz
  • 8 gig of ram

Software versions:

  • OS: Ubuntu 18.04.1 LTS
  • Spring boot: 2.0.5.RELEASE
  • pgjdbc: 42.2.5
  • pgAdba: 0.1.0-ALPHA
  • postgresql: 10.5-0ubuntu0.18.04

jmeter command: jmeter -n -t jmeter-performance-test-5.jmx -l /tmp/res-5 -e -o /tmp/out-5

Java command to run spring boot: java -server -XX:+UseParallelGC -XX:+AggressiveOpts -XX:+UseLargePages -Xmn1g -Xms2g -Xmx2g -jar target/pgadba-example-application-spring-boot-0.1.0.jar

pgAdba code here

pgjdbc code here

The image was produced with R like this

mydata <- read.csv(file="/home/capitol/project/hackeriet/blog/assets/performance-data-pgadba.csv", header=TRUE, sep=",")

library(ggplot2)

ggplot() + geom_smooth(data=mydata, aes(x = threads, y = async, colour = "pgAdba"))
 + geom_smooth(data=mydata, aes(x = threads, y = sync, colour = "pgjdbc"))
  + xlab("Number of threads") + ylab("Queries per second") + ggtitle("")
   + geom_point(data=mydata, aes(x = threads, y = async, colour = "pgAdba"))
    + geom_point(data=mydata, aes(x = threads, y = sync, colour = "pgjdbc"))
     + expand_limits(x = 0, y = 0) + theme_light()

Simple and Extended queries in PostgreSQL

11/11/18 — capitol

elephant-family

Simple and Extended queries in PostgreSQL

The postgresql wire format have two mechanism to send queries to the database, the simple and the extended protocol. Lets go over how they work and the difference between them.

Simple protocol

Queries are sent as plain text to the server, without any parameters. It’s also allowed to send multiple queries in the same frame by separating them with ;.

This makes the simple protocol a bit more vulnerable to sql injection attacks.

The query looks like this on the wire:

type marker length query
byte int32 str
‘Q’ 8 select 1

The server can respond with one or more of

  • CommandComplete
  • CopyInResponse
  • CopyOutResponse
  • RowDescription
  • DataRow
  • EmptyQueryResponse
  • ErrorResponse
  • ReadyForQuery
  • NoticeResponse

We will not go into detail about the server responses in this blog post

Extended protocol

The extended protocol is a bit more complex, but in return you get better type safety and less exposure to sql injections.

The sending of queries is split into five steps

  • Parse
  • Describe
  • Bind
  • Execute
  • Sync

Parse

The parse message contains the query, with placeholders on the format $<number>, for example $1.

The parse message looks like this on the wire:

type marker length name query num params param oid
byte int32 str str int16 int32
‘P’ 23 q1 select $1 1 5

In hex: 500000001771310073656c65637420243100000100000017

An oid is an id of a type in postgresql. A complete list of all the standard types can be retrieved by doing select * from pg_type.

The name is optional, if it’s specified the query is saved in the connection context and the parsed query can be reused, otherwise it’s saved in the unnamed statement slot and will be overwritten by the next unnamed parse message or simple query.

Describe

The describe message is used to get a description of the statement or portal in order to know the types of the returned columns.

type marker length type name
byte int32 ‘S’ or ‘P’ str
‘D’ 8 P p1

In hex: 440000000853713100

Bind

Once the query is parsed it’s possible to bind parameters to it with a bind message.

This creates an output portal.

type marker length portal query num formats format num parameters parameter length value
byte int32 str str int16 int16 int16 int16 str or bytes
‘B’ 26 p1 q1 1 1 1 1 1

In hex: 420000001a70310071310000010001000100000004000000010000

The format of a parameter can either be 1 or 0 indicating if the parameter is formatted as text or binary.

Execute

Once everything has been setup by the other frames it’s time to execute the query and let the server do the actual work.

type marker length portal row limit
byte int32 str int32
‘E’ 11 p1 0

In hex: 450000000b70310000000000

Sync

The sync message signals that the frontend is finished and that the backend can close the current transaction unless the query is in a begin transaction block.

type marker length
byte int32
‘S’ 4

In hex: 5300000004

And after this the query is finished and it’s possible to start over with the next query.

Release of pgAdba, an asynchronous connection library for PostgreSQL

09/11/18 — capitol

coffee-elephant

Release of pgAdba, an asynchronous connection library for PostgreSQL

The jdbc group have been working on building an api for doing asynchronous sql queries against a database, called ADBA.

The purpose of this api is to reduce overhead when doing database intensive tasks. Take a standard http request as an example, we can break down it’s lifecycle into four phases

1) request arrives to the application server over http 2) application code collects data to be able to render the output 3) output is rendered into a string 4) said string is sent over the network to the client

Point 1 and 4 is already handled asynchronous on modern application servers and 3 is only limited by cpu speed. Left is the collection of data to fulfill the rendering and in many cases this involve sending queries to a database in a blocking manner. That means that we have an application thread that starts processing, send a query to the database and then sits and waits until the database responds.

If we instead could hand the query/queries of to the connection library and receive a future that the connection library completes when the database answers we can let the thread work on something else in the meantime. That way reducing the number of threads needed, thus minimizing memory overhead and context switching.

The pgAdba library implements a useful subset of this proposed API for postgresql, the plan is to implement the whole api, but it’s better to release early and often than waiting for perfection.

Architectural changes from JDBC

Asynchronous

A useful mental model for how the programming against the api work are two parallel threads, the one the user controls build up one or more queries, called Operations, fills them with parameter data and everything else that’s needed and hands them over to the other thread. It gets a future in return, in the form of a Submission.

The other thread is responsible for the network communication, and sends the query to the database. On return it completes the future, either normally or in case of an error exceptionally.

Queries are not parsed by the connection library

That both the connection library and the database should parse the query is a waste of effort, therefor the pgAdba library never parses the sql query string and instead sends it as-is to the database server.

This removes the need for complicated parsing and escaping logic in the driver, and makes debugging problems with the query simpler, as you can know that what you as a developer see and what the server see are the same.

Minimal amount of state inside the library

The library manages the state of the connections to the database, but everything related to updatable ResultSets have been removed.

Technical improvements

Query pipelining

By using the extended query protocol in the postgresql wire format we can ensure that we get one answer for every query we send. This enables us to start sending query number two before we have received the answer to the first query.

This really helps with throughput, especially in situations where there is increased latency between the database server and the application server.

Use of java.time date classes

The new time classes that arrived in java 8 are a huge improvement over java.util.Date and they are first class citizens in this library.

What remains

A lot! Suggestions, bug reports and pull requests are very welcome.

The version number includes the -ALPHA denomination, this has two meanings. The first is that the API published under the jdk.incubator.sql2 namespace isn’t stable and will change in future releases. The other is that the driver itself isn’t tested under production workloads and there will be bugs.

But regardless of that it’s possible to experiment and get a feeling for how the proposed API works.

How to test it out

Here is a complete example of an small REST based webserver application built with Spring Boot.

Or just include the maven dependency in your project:

Maven

<dependency>
  <groupId>org.postgresql</groupId>
  <artifactId>pgadba</artifactId>
  <version>0.1.0-ALPHA</version>
</dependency>

Gradle

compile 'org.postgresql:pgadba:0.1.0-ALPHA'

Talk 15 Sept: Functional Embedded Programming on the ESP8266

15/09/18 — sgo

esp-alien

Jonas S Karlsson[1] is a visiting hacker from Hong Kong, and he’s been part of creating hackerspaces in San Fransisco, Sydney, Hong Kong, Shanghai and various makerfaires.

He will show his implementation[2] of a LISP-based, funny functional programming language on the ESP8266 $3 wifi computer.

The talk is not a typical functional type-system theorem proving session, but more about the practical implementation using no-nonsense C, with efficient memory usage. Including a practical demo.

Hope to see you there.