sed was originally written 1973 or 1974 by Lee E. McMahon as a stream editor. And this is exactly what sed does: you can modify streams of text on the fly. The work cycle of sed is, for each line:
sed uses Basic Regular Expressions as (opposed to Extended Regular Expressions or Perl Compatible Regular Expressions used in most other programs). Basic Regular Expressions are very similar to other types of Regular Expressions, in fact many users won't see any difference at all.
sed [options] program [inputfile]
The following simple program consists of one command only: 'd'. The command 'd' tells sed to delete the pattern buffer.
bash$ sed -e 'd' /etc/hosts
When you lauch this script (apparently) nothing happens. Remember the general workflow of sed: read a line into the pattern buffer, elaborate the line according to the script and then print the line to stdout. And this is exactly what happened. In this case the pattern buffer is deleted by the script, and no output was generated.
Another command is 'p'. It tells sed to print the pattern buffer.
bash$ sed -e 'p' /etc/hosts
The effect of this script is to print the line twice. Remember the operation mode of sed:
Not always we want to apply a command to every single line. Sometimes we want to apply a command to a single line, or a block of lines. sed provides a mechanism to work only on specific lines. The mechanism to select specific lines in sed is called an address.
An address is one of the following:
| n | selects line number n. |
| $ | selects the last line |
| /re/ | selects the lines matching the Regular Expression re |
| \crec | selects the lines matching the Regular Expression re. The character c can be freely chosen |
| first~step | (GNU extension!) Selects every step'th line starting with line first |
| addr1,addr2 | Address range: selects all input lines which match the inclusive range of lines starting from the first address and continuing to the second address |
| addr! | select lines that do not match addr |
The command '=' prints the current line number. A substitute program for "wc -l" (count the number of lines) might be:
bash$ sed -n -e '$='
Both examples that follow emulate the UNIX program "head":
bash$ sed -n -e '1,10p' bash$ sed -e '10q'
The first example uses the address pai '1,10' to select the lines to print. The second example uses the implicit print command at each cycle to provide the output. When the address '10' matches, sed will be terminated.
Eliminate comments
bash$ sed -e 's/#.*//' /etc/inetd
Eliminate comments and empty lines
bash$ sed -e 's/#.*//;/^$/d' /etc/inetd
Have a 133t prompt
bash$ ls -l | sed -e 's/o/0/;s/l/1/;s/e/3/'
bash$ ls -l | sed -e 's/o/0/g;s/l/1/g;s/e/3/g'
bash$ ls -l | sed -e 'y/ole/013/g'
Convert a file from DOS to UNIX and vice versa
# Under UNIX: convert DOS newlines (CR/LF) to Unix format bash$ sed 's/.$//' file # assumes that all lines end with CR/LF bash$ sed 's/^M$// file # in bash/tcsh, press Ctrl-V then Ctrl-M # Under DOS: convert Unix newlines (LF) to DOS format C:\> sed 's/$//' file # method 1 C:\> sed -n p file # method 2
Alternatively use the utilities dos2unix and unix2dos, or the command
tr -d [^M] < inputfile > outputfile
for a conversion from DOS to UNIX, or
:set fileformat=dos:set fileformat=unix
from within vim, or...
The character "#" is a command (which cannot have any address). Ths is useful if the sed-program is stored in a file. The whole program can be executed with
bash$ sed -f programfile < inputdata
The "{" and "}" commands group different commands. "}" is a command → it must be preceded by a semicolon.
bash$ sed -ne '/gimme this line number/{=;q;}'
The command "n" reads a new line from stdin
/skip this line/{d;n;}
# do some ugly stuff
...
bash$ sed -e 's/<.*>//g' text.html
If the file contains a line like:
This <b> is </b> a <i>example</i>.
then the result will be:
This.
bash$ sed -e 's/<[^>]*>//g' text.html
Every vocale c in a word is substituted with clcfc. → The ampersand (&) holds the matched string:
bash$ sed -e 's/[aeiou]\+/&l&f&/g'
Substrings enclosed with "\(" and "\)" can be referenced with "\n" (n is a digit from 1 to 9)
bash$ sed -e 's/\([^ ]\+\) *\([^ ]\+\) *\([^ ]\+\)/\3 \2 \1/'
The RE following matches strings which are not "ellef"-vokales.
[aeiou]l[aeiou]f[aeiou]
Basic REs can use the backreference in the RE itself!
bash$ sed -e 's/\([aeiou]\+\)l\1f\1/\1/g'
| D | Delete text in the pattern space up to the first newline |
| N | Add a newline to the pattern space, then append the next line of input to the pattern space |
| P | Print out the portion of the pattern space up to the first newline |
| h | Replace the contents of the hold space with the contents of the pattern space |
| H | Append a newline to the contents of the hold space, and then append the contents of the pattern space to that of the hold space |
| g | Replace the contents of the pattern space with the contents of the hold space |
| G | Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space |
| x | Exchange the contents of the hold and pattern spaces |
bash$ sed -n -e '1h;1!p;${g;p;}'
h: hold space <- pattern space
g: pattern space <- hold space
bash$ sed -n -e 'G;h;$p'
G: pattern space <<- '\n' hold space
Problem: The output shows a exceeding newline at the end: it is because "G" adds a newline followed by the content of the hold buffer to the pattern buffer, even in the first line (which is printed at the end).
bash$ sed -n -e 'G;h;$s/.$//p' bash$ sed -n -e '1!G;h;$p'
/^[[:digit:]][[:digit:]]*$/!n; # the line must contain only digits
x;s/.*//;x; # clear the hold space
: add
/9$/{s/9$//;x;s/.*/0&/;x;b add;}; # eliminate the last 9 from the p.s.
# and add a 0 in front of the h.s.
s/8$/9/
s/7$/8/
s/6$/7/
s/5$/6/
s/4$/5/
s/3$/4/
s/2$/3/
s/1$/2/
s/0$/1/
s/^$/1/
G;s/\n//g; # add the content of the h.s to the p.s
| : label | Definition of label (up to 8 characters) |
| b label | unconditionally branch to label |
| t label | branch to label only if there has been a successful 's'ubstitution since the last input line was read or 't' branch was taken |
If label is ommitted in the b or t command, then the next cycle ist started.
#!/bin/sed -f
# delete K++ comments
/^[[:blank:]]*kk.*/d
s/kk.*//
# If no comment is found, then start a new cicle
: test
/ko/!b
# Append new lines to the pattern space until a entire K-comment is in the
# pattern space
: append
/ok/!{N;b append;}
# delete every K-comment (but don't be greedy!)
s/ko\([^o]\|o[^k]\)*o\?ok//g
t test