there is no place like ~

sed introduction

sed

sed was originally written 1973 or 1974 by Lee E. McMahon as a stream editor. And this is exactly what sed does: you can modify streams of text on the fly. The work cycle of sed is, for each line:

  1. read an entire line from stdin into its pattern buffer
  2. modify the pattern buffer according to the supplied commands
  3. print the pattern buffer to stdout

sed uses Basic Regular Expressions as (opposed to Extended Regular Expressions or Perl Compatible Regular Expressions used in most other programs). Basic Regular Expressions are very similar to other types of Regular Expressions, in fact many users won't see any difference at all.

sed Synopsis

sed [options] program [inputfile]

The following simple program consists of one command only: 'd'. The command 'd' tells sed to delete the pattern buffer.

bash$ sed -e 'd' /etc/hosts

When you lauch this script (apparently) nothing happens. Remember the general workflow of sed: read a line into the pattern buffer, elaborate the line according to the script and then print the line to stdout. And this is exactly what happened. In this case the pattern buffer is deleted by the script, and no output was generated.

Another command is 'p'. It tells sed to print the pattern buffer.

bash$ sed -e 'p' /etc/hosts

The effect of this script is to print the line twice. Remember the operation mode of sed:

  1. read a line from stdin
  2. print the line (because of the 'p' command)
  3. print the pattern buffer to stdout

Addresses

Not always we want to apply a command to every single line. Sometimes we want to apply a command to a single line, or a block of lines. sed provides a mechanism to work only on specific lines. The mechanism to select specific lines in sed is called an address.

An address is one of the following:

n selects line number n.
$ selects the last line
/re/ selects the lines matching the Regular Expression re
\crec selects the lines matching the Regular Expression re. The character c can be freely chosen
first~step (GNU extension!) Selects every step'th line starting with line first
addr1,addr2 Address range: selects all input lines which match the inclusive range of lines starting from the first address and continuing to the second address
addr! select lines that do not match addr

Examples

The command '=' prints the current line number. A substitute program for "wc -l" (count the number of lines) might be:

bash$ sed -n -e '$='

Both examples that follow emulate the UNIX program "head":

bash$ sed -n -e '1,10p'
bash$ sed -e '10q'

The first example uses the address pai '1,10' to select the lines to print. The second example uses the implicit print command at each cycle to provide the output. When the address '10' matches, sed will be terminated.

Substitution Command

Eliminate comments

bash$ sed -e 's/#.*//' /etc/inetd

Eliminate comments and empty lines

bash$ sed -e 's/#.*//;/^$/d' /etc/inetd

Have a 133t prompt

bash$ ls -l | sed -e 's/o/0/;s/l/1/;s/e/3/'
bash$ ls -l | sed -e 's/o/0/g;s/l/1/g;s/e/3/g'
bash$ ls -l | sed -e 'y/ole/013/g'

Convert a file from DOS to UNIX and vice versa

# Under UNIX: convert DOS newlines (CR/LF) to Unix format
bash$ sed 's/.$//' file    # assumes that all lines end with CR/LF
bash$ sed 's/^M$// file    # in bash/tcsh, press Ctrl-V then Ctrl-M
 # Under DOS: convert Unix newlines (LF) to DOS format
C:\> sed 's/$//' file    # method 1
C:\> sed -n p file       # method 2

Alternatively use the utilities dos2unix and unix2dos, or the command

tr -d [^M] < inputfile > outputfile

for a conversion from DOS to UNIX, or

:set fileformat=dos:set fileformat=unix

from within vim, or...

Comments

The character "#" is a command (which cannot have any address). Ths is useful if the sed-program is stored in a file. The whole program can be executed with

bash$ sed -f programfile < inputdata

The "{" and "}" commands group different commands. "}" is a command → it must be preceded by a semicolon.

bash$ sed -ne '/gimme this line number/{=;q;}'

The command "n" reads a new line from stdin

/skip this line/{d;n;}
 # do some ugly stuff
 ...

REs are greedy

Example: eliminating HTML-tags from a file

bash$ sed -e 's/<.*>//g' text.html

If the file contains a line like:

This <b> is </b> a <i>example</i>.

then the result will be:

This.

Solution:

bash$ sed -e 's/<[^>]*>//g' text.html

References

The "elleff"-Language:

Every vocale c in a word is substituted with clcfc. → The ampersand (&) holds the matched string:

bash$ sed -e 's/[aeiou]\+/&l&f&/g'

Referencing a substring

Substrings enclosed with "\(" and "\)" can be referenced with "\n" (n is a digit from 1 to 9)

bash$ sed -e 's/\([^ ]\+\)  *\([^ ]\+\)  *\([^ ]\+\)/\3 \2 \1/'

The "elleff"-Backtransform

The RE following matches strings which are not "ellef"-vokales.

[aeiou]l[aeiou]f[aeiou]

Basic REs can use the backreference in the RE itself!

bash$ sed -e 's/\([aeiou]\+\)l\1f\1/\1/g'

Space Balls

D Delete text in the pattern space up to the first newline
N Add a newline to the pattern space, then append the next line of input to the pattern space
P Print out the portion of the pattern space up to the first newline
h Replace the contents of the hold space with the contents of the pattern space
H Append a newline to the contents of the hold space, and then append the contents of the pattern space to that of the hold space
g Replace the contents of the pattern space with the contents of the hold space
G Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space
x Exchange the contents of the hold and pattern spaces

Space Balls: Example

Print the first line as last

bash$ sed -n -e '1h;1!p;${g;p;}'

h: hold space <- pattern space

g: pattern space <- hold space

Emulation of tac

bash$ sed -n -e 'G;h;$p'

G: pattern space <<- '\n' hold space

Problem: The output shows a exceeding newline at the end: it is because "G" adds a newline followed by the content of the hold buffer to the pattern buffer, even in the first line (which is printed at the end).

tac improved

bash$ sed -n -e 'G;h;$s/.$//p'
bash$ sed -n -e '1!G;h;$p'

Example: a counter in sed

/^[[:digit:]][[:digit:]]*$/!n;         # the line must contain only digits
x;s/.*//;x;                            # clear the hold space
: add
/9$/{s/9$//;x;s/.*/0&/;x;b add;};      # eliminate the last 9 from the p.s.
                                       # and add a 0 in front of the h.s.
s/8$/9/
s/7$/8/
s/6$/7/
s/5$/6/
s/4$/5/
s/3$/4/
s/2$/3/
s/1$/2/
s/0$/1/
s/^$/1/
G;s/\n//g;            # add the content of the h.s to the p.s

Branches

: label Definition of label (up to 8 characters)
b label unconditionally branch to label
t label branch to label only if there has been a successful 's'ubstitution since the last input line was read or 't' branch was taken

If label is ommitted in the b or t command, then the next cycle ist started.

Eliminate K/K++ comments

#!/bin/sed -f

# delete K++ comments
/^[[:blank:]]*kk.*/d
s/kk.*//

# If no comment is found, then start a new cicle
: test
/ko/!b

# Append new lines to the pattern space until a entire K-comment is in the
# pattern space
: append
/ok/!{N;b append;}

# delete every K-comment (but don't be greedy!)
s/ko\([^o]\|o[^k]\)*o\?ok//g

t test
Copyright (C) 2007–2009 by Thomas Pircher