I was searching for a way to improve my system responsiveness knowing that I/O is a major factor that can influence this. Yes, I can change my HDD with a affordable SSD but I was looking for something that would not cost me money. So I have read little bit about kernel I/O schedulers.
I Linux kernel 3.0.6 I have found just 3 options:
How would I know which one is the best for my hardware and for the type of work I am doing on my system? Hard to say, BUT actually you can test each of them to see which one gives you a better result.
So, the first step is to set your kernel to use one of these three I/O schedulers. The second step is to run a I/O benchmark on your hard-disk using that particular scheduler.
How to choose the right kernel I/O scheduler
How can you check which is the current kernel I/O scheduler? Run at the console the following command:
cat /sys/block/sda/queue/scheduler
It will return something like:
noop deadline [cfq]
The one which is included between parenthesis is the current kernel I/O scheduler. To change at runtime the kernel I/O scheduler you can run the following command at the console:
echo "<i/o scheduler>" /sys/block/sda/queue/scheduler
where is one of those which your kernel support.
Run a I/O benchmark on your disk
There exists many I/O benchmark tools on the net. The one which is my favourite is IOzone. By simple running the iozone -h command at your terminal you will get a comprehensive list with all available options (and there are dozens).
I found an interesting post about how to stress your hard-disk and how to get those benchmark's numbers that you can use in a spreadsheet to plot a visual chart which will help you understand which of these I/O schedulers behave better on your hardware:
https://bbs.archlinux.org/viewtopic.php?pid=969117
So I used the following bash script (eg: iozone-scheduler) which actually calls IOzone sequentially and then compiles a log file (a tab-delimited format) that one can use to plot a chart:
#!/bin/bash # Test schedulers with iozone # See https://bbs.archlinux.org/viewtopic.php?pid=969117 # by fackamato, Aug 1, 2011 # changelog: # 03082011 # Added: Support for Linux MD devices # Added/fixed: take no. of threads as argument and test accordingly (big rewrite) # 02082011 # Added: Should now output to a file with the syntax requested by graysky # Fixed: Add support for HP RAID devices # Fixed: Drop caches before each test run if [ "$EUID" -ne "0" ]; then echo "Needs su, exiting"; exit 1; fi unset ARGS;ARGS=$# if [ ! $ARGS -lt "5" ]; then DEV=$1 DIR=`echo $2 | sed 's//$//g'` # Remove trailing slashes from path OUTPUTDIR=`echo $4 | sed 's//$//g'` # Remove trailing slashes from path # Create the log file directory if it doesn't exist if [ ! -d "$OUTPUTDIR" ]; then mkdir -p $OUTPUTDIR;fi # Check the test directory if [ ! -d "$DIR" ]; then echo "Error: Is $DIR a directory?" exit 1 fi # Check the device name MDDEV="md*" HPDEV="c?d?" case "$DEV" in $HPDEV ) # HP RAID unset SYSDEV;SYSDEV="/sys/block/cciss!$DEV/queue/scheduler" unset MD;declare -i MD;MD=0 ;; $MDDEV ) # mdadm RAID echo "Found a Linux MD device, checking for schedulers..." unset MD;declare -i MD;MD=1 unset SYSDEV SYSDEV=$(mdadm -D /dev/md0 | grep active | awk -F '/' '{print $3}' | sed 's/[0-9]//g') ;; * ) unset SYSDEV;SYSDEV="/sys/block/$DEV/queue/scheduler" unset MD;declare -i MD;MD=0 ;; esac # Check for the output log unset OUTPUTLOG;OUTPUTLOG="$OUTPUTDIR/iozone-$DEV-all-results.log" if [ -e "$OUTPUTLOG" ]; then echo "$OUTPUTLOG exists, aborting"; exit 1;fi # Find available schedulers if [ $MD -eq 0 ]; then echo "not md device" declare -a SCHEDULERS SCHEDULERS=`cat $SYSDEV | sed 's/[//g' | sed 's/]//g'` else declare -a SCHEDULERS; unset MDMEMBER for MDMEMBER in ${SYSDEV[@]}; do unset SYSDEVMD;SYSDEVMD="/sys/block/"$MDMEMBER"/queue/scheduler" done SCHEDULERS=`cat $SYSDEVMD | sed 's/[//g' | sed 's/]//g'` fi if [ -z "$SCHEDULERS" ]; then echo "No schedulers found! Wrong device specified? Tried looking in $SYSDEV" exit 1 else echo "Schedulers found under $DEV: "$SCHEDULERS SIZE=$(($3*1024)) # Size is now MB per thread unset RUNS; declare -i RUNS;RUNS=$5 fi # Set record size if [ -z "$6" ]; then echo "Using the default record size of 16MiB" RECORDSIZE="16384" # Set default to 16MB else RECORDSIZE=$6"m" fi # Set no. threads if [ -z "$7" ]; then echo "Testing with 1, 2 "amp; 3 threads (default)" THREADS=3 else THREADS=$7 fi SHELL=`which bash` else echo "# Usage:" echo "`basename $0` <#runs> " echo "time `basename $0` sda /mnt 20480 /dev/shm/server1 3 16 3" echo "# The above command will test sda with 1, 2 " 3 threads 3 times per scheduler with 20GiB of data using" echo "# 16MiB record size and save logs in /dev/shm/server1/ ." echo "# If the record size is omitted the default of 16MiB will be used. (should be buffer size of device)" echo "# For HP RAID controllers use device name format c0d0 or c1d2 etc." exit 1 fi function createOutputLog () { unset FILE echo -e "TesttThroughput (KB/s)tI/O SchedulertThreadstn" > $OUTPUTLOG for FILE in $OUTPUTDIR/$DEV*.txt; do # results unset WRITE;unset REWRITE; unset RREAD; unset MIXED; unset RWRITE # Scheduler, threads, iteration unset SCHED;unset T; unset I;unset IT SCHED=`echo "$FILE" | awk -F'-' '{print $2}'` T=`echo "$FILE" | awk -F'-' '{print $3}' | sed 's/t//g'` # FIXME, it's ugly IT=`echo "$FILE" | awk -F'-' '{print $4}'` I=`expr ${IT:1:1}` # Get values WRITE=`grep " Initial write " $FILE | awk '{print $5}'` REWRITE=`grep " Rewrite " $FILE | awk '{print $4}'` RREAD=`grep " Random read " $FILE | awk '{print $5}'` MIXED=`grep " Mixed workload " $FILE | awk '{print $5}'` RWRITE=`grep " Random write " $FILE | awk '{print $5}'` # echo "iwrite $WRITE rwrite $REWRITE rread $RREAD mixed $MIXED random $RWRITE" # Print to the file if [ -z "$WRITE" -o -z "$REWRITE" -o -z "$RREAD" -o -z "$MIXED" -o -z "$RWRITE" ]; then # Something's wrong with our input file, or bug in script echo "BUG, unable to parse result:" echo "write $WRITE rewrite $REWRITE random read $RREAD mixed $MIXED random write $RWRITE" exit 1 else echo -e "Initial writet$WRITEt$SCHEDt$Tt$I" >> $OUTPUTLOG echo -e "Rewritet$RWRITEt$SCHEDt$Tt$I" >> $OUTPUTLOG echo -e "Random readt$RREADt$SCHEDt$Tt$I" >> $OUTPUTLOG echo -e "Mixed workloadt$MIXEDt$SCHEDt$Tt$I" >> $OUTPUTLOG echo -e "Random writet$RWRITEt$SCHEDt$Tt$I" >> $OUTPUTLOG fi done } unset ITERATIONS; declare -i ITERATIONS; ITERATIONS=0 unset CURRENTTHREADS; declare -i CURRENTTHREADS unset IOZONECMD cd "$DIR" echo "Using iozone at `which iozone`" until [ "$ITERATIONS" -ge "$RUNS" ]; do let ITERATIONS=$ITERATIONS+1 for SCHEDULER in $SCHEDULERS; do # Change the scheduler if [ $MD -eq 1 ]; then unset MEMBER for MEMBER in $SYSDEV; do echo $SCHEDULER > /sys/block/$MEMBER/queue/scheduler done else echo $SCHEDULER > $SYSDEV fi CURRENTTHREADS=1 # Repeat until we've tested with all requested threads until [ $CURRENTTHREADS -gt $THREADS ]; do unset IOZONECMDAPPEND IOZONECMDAPPEND="$OUTPUTDIR/$DEV-$SCHEDULER-t$CURRENTTHREADS-i$ITERATIONS.txt" #echo "iozonecmdappend is $IOZONECMDAPPEND" # Append all test files to the command line (threads/processes) unset I; unset IOZONECMD_FILES for I in `seq 1 $CURRENTTHREADS`; do IOZONECMD_FILES="$IOZONECMD_FILES$DIR/iozone-temp-$I " done # Drop caches echo 3 > /proc/sys/vm/drop_caches echo "Testing $SCHEDULER with $CURRENTTHREADS thread(s), run #$ITERATIONS" IOZONECMD="iozone -R -i 0 -i 2 -i 8 -s $SIZE -r $RECORDSIZE -b $OUTPUTDIR/$DEV-$SCHEDULER-t$CURRENTTHREADS-i$ITERATIONS.xls -l 1 -u $CURRENTTHREADS -F $IOZONECMD_FILES" # Run the command echo time $IOZONECMD time $IOZONECMD | tee -a $IOZONECMDAPPEND # Done testing $CURRENTTHREADS threads/processes, increase to test one more in the loop (if applicable) let CURRENTTHREADS=$CURRENTTHREADS+1 done done echo "Run #$ITERATIONS done" | tee -a $IOZONECMDAPPEND done echo createOutputLog echo "Done, logs saved in $OUTPUTDIR" exit 0
So, to test my disk I had used the following command at the terminal:
sudo iozone-scheduler sda /mnt 1024 /dev/shm 3 8 3
where:
- sda is the name of my disk device as recognized by Linux (check your /dev/)
- /mnt is the folder where the test file will be saved temporary
- 1024 represent the size in MB of the test file (where I/O will run)
- /dev/shm is the folder where will be saved the XLS spreadsheets and log files
- first 3 is the number of runs (cycles) of the test
- 8 is the record size in MB that will be used for IOzone test (I set 8 because my hdd buffer size is 8MB)
- the last 3 represents the number of the maximum concurrent threads that will perform I/O operations
Well, I got mine iozone-sda-all-results.log which have the following structure (tab-delimited):
Test Throughput (KB/s) I/O Scheduler Threads n
Initial write 45654.45 cfq 1 1
Rewrite 49748.07 cfq 1 1
Random read 915288.75 cfq 1 1
Mixed workload 1243356.50 cfq 1 1
Random write 49748.07 cfq 1 1
Initial write 60800.41 cfq 1 2
Rewrite 64921.82 cfq 1 2
Random read 1242507.88 cfq 1 2
Mixed workload 1251540.75 cfq 1 2
...........
Using the above information I draw a chart for every of the following 5 I/O tests I've run:
- Initial write
- Re-write
- Random write
- Random read
- Mixed workload
I have 5 tests on 3 distinct runs, that means a total of 15 charts to plot. Well, I am not going to post all those here (make no sense) but I will tell you that: sometimes cfq behaves better then deadline which behaves better than noop, other time is vice-versa, other time is.... so I have got some interesting info every time. It is hard to decide which one is better than other (because one is better than other on reading, other on writing, other is better in the 2nd run than in the 1st run, etc) . In order to determine which one to choose I approached the problem with the following naive method:
- I compared each runs individually
- for each test I compared which of those I/O schedulers behaves better than others
- I gave 2 point for the best, 1 for the average and 0 for the worst one
- I added those points that each I/O scheduler have obtained
- the one which added the most points I decreed as a winner.
I tested my laptop and my desktop workstation.
On my laptop where I have an TOSHIBA MK1637GSX disk the cfq I/O scheduler was the winner (20 points) and noop was the looser (11 points).
On my desktop where I have an WDC WD5000AAKS-60A7B2 disk the noop I/O scheduler was the winner (18 points) and deadline was the looser (11 points). Very close came the cfq (15 points) but noop was little better on all test so I decided to use noop in the future on that system.
Another interesting piece of information is "how much has improved the I/O by changing the kernel scheduler?"
Well, the difference is not a magnitude order but sometimes is 30% better, other time just 18% or only 2%. So the difference can vary between 0-30% or even over. But any improvement is welcome so when you get a positive improvement why not get it?
How to permanently change your kernel I/O scheduler
Well, the method I would prefer is to recompile the kernel, so:
- Enable the block layer --->
- IO schedulers --->
- enable Deadline I/O scheduler (if you intend to use it)
- enable CFQ I/O scheduler (if you intend to use it)
- Default I/O schedulers --->
- check on of the Deadline, CFQ or No-op available schedulers that fits your need.
- IO schedulers --->
After you recompile and install your new kernel your I/O operations should (hopefully) behave little better.
Now, if you think that this article was interesting don't forget to rate it. It shows me that you care and thus I will continue write about these things.
Eugen Mihailescu
Latest posts by Eugen Mihailescu (see all)
- Dual monitor setup in Xfce - January 9, 2019
- Gentoo AMD Ryzen stabilizator - April 29, 2018
- Symfony Compile Error Failed opening required Proxies - January 22, 2018
Hi Eugen
Thanks for such an informative post. Actually I was searching for some comparative studies between different I/O Schedulers in Android when I came across your post. The results are informative but just 1 question here.... why aren't there are results with real world tasks/applications. All the benchmarks well, can only be good till a certain extent and can't always show the true picture. Do you know of any links or pages that provide a detailed comparison between I/O schedulers especially for android?
Check out this article: http://andrux-and-me.blogspot.se/2014/05/io-schedulers-and-performance-2.html
But of course, as always GIYF.