slowfs

Slowing down storage for fun and profit

$ whoami
Working on RHV storage since 2013
Tinkering with Python since 2003
Free software enthusiast
Father of two
          

Agenda

  • Why slowfs?
  • FUSE
  • Quick tutorial
  • Demo time
  • Future work
  • Questions

The real agenda

  • /me getting a T-Shirt
  • You contribute to slowfs!

Why slowfs?

Bug 1270220

  • SPM in problem
  • Data center down
  • Manual cleanup required

Digging...

  • unlink() takes 90 seconds!
  • Lots of deletes

Steps to reproduce

  • Setup overloaded NetApp filer
  • Create huge images
  • Setup 30 hosts
  • Delete one VM per minute

Vote!

  1. Close WONTFIX, fix your storage
  2. Dig deeper?

Digging more...

  • Try to simulate with iptables by delaying packets
  • No way to get slow unlink without bringing the entire system down

FUSE

Filesystems in User Space

Fwd: NetApp slow file delete simulation

From: Tim Speetjens <tim.speetjens@redhat.com>
To: Nir Soffer <nsoffer@redhat.com>
Date: Tue, 20 Oct 2015 10:12:18 -0400 (EDT)

> I was thinking about the following setup:
>
> Step 1: Setup a RHEL filer which slows down deletes of
> large files artificially
> Based on the example FUSE filesystem in
> http://www.cs.nmsu.edu/~pfeiffer/fuse-tutorial/
            

FUSE Tutorial

bbfs.c
/*
  Big Brother File System
  Copyright (C) 2012 Joseph J. Pfeiffer, Jr., Ph.D. 
  ...
*/
#include "config.h"
#include "params.h"

#include <ctype.h>
#include <dirent.h>
...
#ifdef HAVE_SYS_XATTR_H
#include <sys/xattr.h>
#endif
...
[900 lines of C]
            

Vote no2

What is the best languages to write a file sytem?

  1. C
  2. Python
  3. Ansible

Python!

  • Python is faster
  • First version was written in one evening

Re: NetApp slow file delete simulation

From: Nir Soffer <nsoffer@redhat.com>
To: Tim Speetjens <tim.speetjens@redhat.com>
Date: Wed, 21 Oct 2015 00:16:49 +0300

OK, we have now a very slow file system for testing:
https://github.com/nirs/slowfs

$ mkdir /realfs /slowfs
$ python slowfs.py /realfs /slowfs

$ touch /slowfs/test
$ time rm /slowfs/test

real 0m10.013s
user 0m0.001s
sys 0m0.002s
            

slowfs v0

import os
import sys
import time
import fuse
...
class SlowFS(fuse.Operations):
    ...
    def unlink(self, path):
        time.sleep(10)
        return os.unlink(self._full_path(path))
    ...

def main(root, mountpoint):
    fuse.FUSE(SlowFS(root), mountpoint, foreground=True)

if __name__ == '__main__':
    main(sys.argv[1], sys.argv[2])
            

Next morning - reproduced!

Elad 2015-10-21 10:38:10 EDT                    Comment 26

We've simulated customer case using a 3.5 setup with
vdsm-4.16.20-1.el6ev.x86_64 installed on host.

I used NFS storage server with Nir's code that simulates a
slow file system.
...
            

Quick tutorial

Installing

Tested only on Fedora

# dnf install fuse fuse-devel

# git clone https://github.com/nirs/slowfs.git

# cd slowfs

# pip install -r requirements.txt
            

Creating directories

Files under /realfs will be exposed
(slowly) under /slowfs

# mkdir /realfs /slowfs
            

How slow do you want to go today?

Create a configuration file

# cat slowfs.cfg
unlink = 60
            

Starting the file system

# python slowfs.py -c slowfs.cfg /realfs /slowfs
            

Testing locally

# touch /slowfs/test

# time rm /slowfs/test

real    1m0.063s
user    0m0.000s
sys     0m0.001s
            

It works!

How can we use this from another host?

Exporting via NFS

# cat /etc/exports
/slowfs    *(rw,sync,no_subtree_check,fsid=0)
            

Note: you must export the /slowfs directory, exporting the parent directory will not work

Restart NFS server

# systemctl restart nfs-server
            

Maybe there is a nicer way

Mounting on a client

# NFS 3
# mount my.server:/slowfs mountpoint

# NFS 4
# mount -t nfs4 my.server:/ mountpoint
            

Testing remotely

# touch mountpoint/test

# time rm mountpoint/test

real    1m0.063s
user    0m0.000s
sys     0m0.001s
            

I want to change the configuration

without stopping the file system!

slowfsctl

# ../slowfs/slowfsctl help
Available comamnds:
  disable     disable configuration
  enable      enable configuration
  get         get config value
  help        show this help message
  log         change log level
  reload      reload configuration
  set         set config value
  status      show current status
            

Must run in the same directory you started the file system

Tuning slowness

# ../slowfs/slowfsctl get unlink
60

# ../slowfs/slowfsctl set unlink 1

# touch slowfs/test

# time rm -f slowfs/test

real	0m1.005s
user	0m0.000s
sys	0m0.002s
            

My configuration is too slow!

# ../slowfs/slowfsctl set write 1

# time dd if=/dev/zero of=slowfs/test bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 10.0285 s, 4.1 kB/s

real	0m10.036s
user	0m0.002s
sys	0m0.003s
            

slowfsctl disable

# ../slowfs/slowfsctl disable

# ../slowfs/slowfsctl status
Disabled

# time dd if=/dev/zero of=slowfs/test bs=4k count=10
10+0 records in
10+0 records out
40960 bytes (41 kB, 40 KiB) copied, 0.00640398 s, 6.4 MB/s

real	0m0.013s
user	0m0.002s
sys	0m0.001s
            

slowfsctl log

[root@slowfs test]# ../slowfs/slowfsctl log debug
[root@slowfs test]# echo "looking under the hood!" > slowfs/test 
            

Example debug log

INFO [ctl] Setting log level to 'debug'
DEBUG [fs] -> getattr u'/test' (None,)
DEBUG [fs] <- getattr {'st_ctime': 1512251116.465498 ...
DEBUG [fs] -> open u'/test' (32769,)
DEBUG [fs] <- open 6
DEBUG [fs] -> getxattr u'/test' (u'security.capability',)
DEBUG [fs] <- getxattr [Errno 95] Operation not supported
DEBUG [fs] -> truncate u'/test' (0,)
DEBUG [fs] <- truncate None
DEBUG [fs] -> getattr u'/test' (None,)
DEBUG [fs] <- getattr {'st_ctime': 1512251336.2405853, ...
DEBUG [fs] <- flush None
DEBUG [fs] -> getxattr u'/test' (u'security.capability',)
DEBUG [fs] <- getxattr [Errno 95] Operation not supported
DEBUG [fs] -> write u'/test' ('looking under the hood!\n', 0, 6L)
DEBUG [fs] <- write 24
DEBUG [fs] -> flush u'/test' (6L,)
DEBUG [fs] <- flush None
DEBUG [fs] -> release u'/test' (6L,)
DEBUG [fs] <- release None
            

demo time

Future work

FUSE Locking

  • libfuse takes a lock when calling unlink
  • Slowing down unlink will block any operation in the same directory

Tests

  • We have no tests
  • Hard to change without tests

How slow is python?

libfuse passthrough.c example

$ ./passthrough modules=subdir,subdir=/realfs /slowfs

$ dd if=/dev/zero of=/slowfs/test bs=8M count=128
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.23748 s, 480 MB/s
            

8X times slower

$ python slowfs.py /realfs /slowfs

$ dd if=/dev/zero of=/slowfs/test bs=8M count=128
128+0 records in
128+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 19.8493 s, 54.1 MB/s
            

Rewrite in C?

What else can we do?

  • Faking errors
  • Corrupting data
  • Tracing syscalls

How can I contribute?

Fork slowfs on github
https://github.com/nirs/slowfs

Fork this talk on github
https://github.com/nirs/slowfs-qecamp

Questions?