Show HN: WAL Implementation in Golang (opens in new tab)

(github.com)

59 pointsstym061y ago29 comments

I wrote this simple WAL library in Golang that I use to write data that my kafka producer fails due to errors like Broker going down or some other issue. Took inspiration from etcd/wal

29 comments

while1malloc01y ago

Cool library. Two small generic Go library issues:

1. The rebuf.Init function panics. I almost never want a library to call panic, and when it does, I want the library function to denote that. The convention I’ve seen most often is to start the function name with Must, so MustInit instead of Init. In this case though, I think it’d be safe to be a little more lenient in what you accept as input and trim the trailing slash.

2. I never (not almost, actually never) want library code to call any of the fmt.Print functions unless the library is explicitly for writing output, or that behavior is strictly opt in. If the library really must print things, it should take a user supplied os.Writer and write to that. Let the user control what gets printed or not.

stym06OP1y ago

1. You're right! Will fix it to handle this as well as the support for relative directories

2. Yes, Will integrate a logging library instead of fmt (https://github.com/uber-go/zap)

while1malloc01y ago

Based on the child thread about zap vs slog I think I might not have been clear in my phrasing. The issue isn’t the specific functions used to print to the screen, it’s that library code is doing it at all. As the user of a library, I don’t want that library printing things to the screen if I don’t explicitly tell it to; decisions on logging/printing text to the screen are the responsibility of the person writing the end-user application code, not the library author. If the library author feels really strongly about printing stuff on the screen, they should make that behavior opt in, either with a configuration option or by providing some other mechanism that gives the user as much control over that behavior as possible (hence my example of throwing printing behavior into a user-supplied io.Writer)

arp2421y ago

What if I don't use zap?

Ideally you want to add an option via a function or interface:

  func New(optOne bool, log func(string, ...any)) {
    if log == nil {
        log = func(m string, a ...any) { fmt.Printf(m, a...) }
    }
    log("starting; optOne=%v", optOne)
  }

Or something along those lines.

1 more reply

antook1y ago

FYI, Go already has a structured logging package called [slog](https://go.dev/blog/slog) in the standard library since V1.21.

vlowther1y ago

Having written one of these, a few optimizations will go a long way:

1. syscall.Iovec allows you to build up multiple batches semi independently and then write them all in a single syscall and sync the file with the next one. It is a good basis for allowing multiple pending writes to proceed in independent go routines and have another one have all the responsibility for flushing data.

2. It is better to use larger preallocated files than a bunch of smaller ones, along with batching, fixed size headers and padding write blocks to a known size. 16 megabytes per wal and a 128 byte padding worked well for me.

3. Batching writes until they reach a max buffer size and/or a max buffer age can also massively increase throughput. 1 megabyte max pending write or 50 ms time passed worked pretty well for me for batching and throughput to start with, then dynamically tuning the last bound to the rolling average of the time the last 16 write+sync operations (and a hard upper bound to deal with 99th percentile latency badness) worked better. Bounded channels and a little clever math makes parallelizing all of this pretty seamless.

4. Mmap'ing the wals makes consistency checking and byte level fiddling much easier on replay. No need to seek or use a buffered reader, just use slice math and copy() or append() to pull out what you need.

stym06OP1y ago

Thanks! Could you please point me to a reference for (1)

etcd/wal actually does do preallocations (https://github.com/etcd-io/etcd/blob/24e05998c68f481af2bd567...)

Yet to implement max buffer age! Any references for this would be bomb!

Is mmap() really needed here? Came across a similar project that does this? Really gotta dig deep here! https://github.com/jhunters/bigqueue

vlowther1y ago

Can't share my references with you directly, the implementation I wrote is closed-source and is heavily intermingled with other internal bits. But I can provide examples:

1. syscall.Iovec is a struct that the writev() systemcall uses. You build it up something like this:

    func b2iov(bs [][]byte) []syscall.Iovec {
        res := []syscall.Iovec{}
        for i := range bs {
            res = append(res, syscall.Iovec{Base: &bs[i][0], Len: uint64(len(bs[i])}
        }
        return res
    }

Then, once you are ready to write:

    func write(fi *os.File, iov []syscall.Iovec, at int64) (written int64, err error) {
        if _, err = fi.Seek(at, io.SeekStart); err != nil {
            return
        }
        wr, _, errno := syscall.Syscall(syscall.SYS_WRITEV, fi.Fd(), uintptr(unsafe.Pointer(&iov[0])), uintptr(len(iov)))
        if errno != 0 {
            err = errno
            return
        }
        written = int64(wr)
        err = fi.Sync()
        return
    }

These are not tested and omit some more advanced error checking, but the basic idea is that you use the writev() system call (POSIX standard, so if you want to target Windows you will need to find its equivalent) to do the heavy lifting of writing a bunch of byte buffers as a single unit to the backing file at a known location.

2. Yeah, I just zero-filled a new file using the fallocate as well.

3. I handled max buffer age by feeding writes to the WAL using a channel, then the main reader loop for that channel select on both the main channel and a time.Timer.C channel. Get clever with the Reset() method on that timer and you can implement whatever timeout scheme you like.

4. No, it is not needed, but my WAL implementation boiled down to a bunch of byte buffers protected by a rolling CRC64, and for me just mmap'ing the whole file into a big slice and sanity-checking the rolling crcs along with other metadata was easier and faster that way.

tjungblut1y ago

Besides what Phil mentioned below, I can't write more than one record to the WAL. You're closing the file after every write, the second time you write the error `seek data/rebuf.tmp: file already closed` is returned.

I also think your rotation will delete the wrong segment when you have more than ten segments - imagine you're writing rebuf-1 to rebuf-10 - what's the "oldest file" to delete now? Besides, should you really delete those files?

stym06OP1y ago

Yes there are a lot of bugs since I just wrote this in one sitting today. Will be fixing all of this. For log rotation, I'll sort by the last_modified_at ts and then purge those

tjungblut1y ago

Your generational approach to segment numbering is fine, if you prepend enough zeros to format the files properly then you're also able to sort them correctly. etcd uses the same trick.

Smaug1231y ago

This is one of the absolutely classic cases where I'd expect a very small amount of property-based testing to flush out a very large number of bugs, by the way.

stym06OP1y ago

Will do! Thanks for the comment

stym06OP1y ago

OP here! Pls feel free to raise any bugs you encounter! I'll be doing the following immmediate fixes:

1. Use fsync for durable writes in case of system crashes

2. Fix log-rotation-purging logic

3. Fix `file already closed` bug on consecutive writes

4. Add CRC checksum

0xjnml1y ago

Another simple WAL: https://pkg.go.dev/modernc.org/file#WAL

DLion1y ago

Having some tests is necessary to avoid mostly of the bugs that other comments are pointing out.

Perhaps it's just me, but I don't trust code that hasn't been tested.

stym06OP1y ago

Yet to implement linting and unit tests. This is kind of a rough draft/v0

eatonphil1y ago

Did I miss it or is there no call to os.File.Sync(), i.e. fsync, anywhere?

Since you mention etcd/wal:

https://github.com/etcd-io/etcd/blob/v3.3.27/wal/wal.go#L671

https://github.com/etcd-io/etcd/blob/v3.3.27/pkg/fileutil/sy...

stym06OP1y ago

I used the bufio flush mechanism https://pkg.go.dev/bufio#Writer.Flush

Thanks for your comment, I'll definitely check it out. It was my first attempt at this. How can I make it better?

sakras1y ago

Essentially, unless you `fsync`, there's no guarantee that your data will be durably written to disk. This is because the operating system keeps data buffered in in-memory caches, so if the machine crashes you may lose some data. The `fsync` system call forces the data to be flushed from the in-memory OS cache to the disk. As far as I could tell, the Flush you use does not `fsync`.

1 more reply

neonsunset1y ago

This reminds me of ZoneTree which is persistent LSM tree project based on top of WAL, written in C#: https://github.com/koculu/ZoneTree

Similar to RocksDB.

stym06OP1y ago

Absolutely loved your project ZoneTree. Gotta dig deep on this

Smaug1231y ago

golangci-lint finds three errors in rebuf.go at commit 615209d. It's never safe to write golang without the linters!

drgo1y ago

in line 200 of rebuf.go, did you mean to return err (instead of returning nil even when an error occurs)?

stym06OP1y ago

I guess so! Will be fixed

j / k navigate · click thread line to collapse

29 comments

while1malloc01y ago

Cool library. Two small generic Go library issues:

stym06OP1y ago

1. You're right! Will fix it to handle this as well as the support for relative directories

2. Yes, Will integrate a logging library instead of fmt (https://github.com/uber-go/zap)

while1malloc01y ago

arp2421y ago

What if I don't use zap?

Ideally you want to add an option via a function or interface:

  func New(optOne bool, log func(string, ...any)) {
    if log == nil {
        log = func(m string, a ...any) { fmt.Printf(m, a...) }
    }
    log("starting; optOne=%v", optOne)
  }

Or something along those lines.

1 more reply

antook1y ago

FYI, Go already has a structured logging package called [slog](https://go.dev/blog/slog) in the standard library since V1.21.

vlowther1y ago

Having written one of these, a few optimizations will go a long way:

stym06OP1y ago

Thanks! Could you please point me to a reference for (1)

etcd/wal actually does do preallocations (https://github.com/etcd-io/etcd/blob/24e05998c68f481af2bd567...)

Yet to implement max buffer age! Any references for this would be bomb!

Is mmap() really needed here? Came across a similar project that does this? Really gotta dig deep here! https://github.com/jhunters/bigqueue

vlowther1y ago

Can't share my references with you directly, the implementation I wrote is closed-source and is heavily intermingled with other internal bits. But I can provide examples:

1. syscall.Iovec is a struct that the writev() systemcall uses. You build it up something like this:

    func b2iov(bs [][]byte) []syscall.Iovec {
        res := []syscall.Iovec{}
        for i := range bs {
            res = append(res, syscall.Iovec{Base: &bs[i][0], Len: uint64(len(bs[i])}
        }
        return res
    }

Then, once you are ready to write:

    func write(fi *os.File, iov []syscall.Iovec, at int64) (written int64, err error) {
        if _, err = fi.Seek(at, io.SeekStart); err != nil {
            return
        }
        wr, _, errno := syscall.Syscall(syscall.SYS_WRITEV, fi.Fd(), uintptr(unsafe.Pointer(&iov[0])), uintptr(len(iov)))
        if errno != 0 {
            err = errno
            return
        }
        written = int64(wr)
        err = fi.Sync()
        return
    }

2. Yeah, I just zero-filled a new file using the fallocate as well.

tjungblut1y ago

stym06OP1y ago

Yes there are a lot of bugs since I just wrote this in one sitting today. Will be fixing all of this. For log rotation, I'll sort by the last_modified_at ts and then purge those

tjungblut1y ago

Your generational approach to segment numbering is fine, if you prepend enough zeros to format the files properly then you're also able to sort them correctly. etcd uses the same trick.

Smaug1231y ago

This is one of the absolutely classic cases where I'd expect a very small amount of property-based testing to flush out a very large number of bugs, by the way.

stym06OP1y ago

Will do! Thanks for the comment

stym06OP1y ago

OP here! Pls feel free to raise any bugs you encounter! I'll be doing the following immmediate fixes:

1. Use fsync for durable writes in case of system crashes

2. Fix log-rotation-purging logic

3. Fix `file already closed` bug on consecutive writes

4. Add CRC checksum

0xjnml1y ago

Another simple WAL: https://pkg.go.dev/modernc.org/file#WAL

DLion1y ago

Having some tests is necessary to avoid mostly of the bugs that other comments are pointing out.

Perhaps it's just me, but I don't trust code that hasn't been tested.

stym06OP1y ago

Yet to implement linting and unit tests. This is kind of a rough draft/v0

eatonphil1y ago

Did I miss it or is there no call to os.File.Sync(), i.e. fsync, anywhere?

Since you mention etcd/wal:

https://github.com/etcd-io/etcd/blob/v3.3.27/wal/wal.go#L671

https://github.com/etcd-io/etcd/blob/v3.3.27/pkg/fileutil/sy...

stym06OP1y ago

I used the bufio flush mechanism https://pkg.go.dev/bufio#Writer.Flush

Thanks for your comment, I'll definitely check it out. It was my first attempt at this. How can I make it better?

sakras1y ago

1 more reply

neonsunset1y ago

This reminds me of ZoneTree which is persistent LSM tree project based on top of WAL, written in C#: https://github.com/koculu/ZoneTree

Similar to RocksDB.

stym06OP1y ago

Absolutely loved your project ZoneTree. Gotta dig deep on this

Smaug1231y ago

golangci-lint finds three errors in rebuf.go at commit 615209d. It's never safe to write golang without the linters!

drgo1y ago

in line 200 of rebuf.go, did you mean to return err (instead of returning nil even when an error occurs)?

stym06OP1y ago

I guess so! Will be fixed

j / k navigate · click thread line to collapse