SED – a journey through a stream without getting wet

So I am parsing some rather lengthy word documents into usable web formatting and rather than continuing to do them by hand, I have opted to use the command line to leverage a bit more efficient power.? However, there is a learning curve price to pay up front, but I am confident it will more than pay itself back later on.

Currently I am trying to get a better grasp of the sed command, sed stands for Stream EDitor.? It is possible that another kind of script using regex might be a better solution to this problem, but I picked up the “sed hammer” now every problem I see is going to look like a “sed nail”.? This is by choice to force me to become better versed with this tool before moving on to another.

My current problem requires reading multiple lines at once and then adding appropriate tags based on what is read in if a matching pattern is found.? Sed is not the best candidate for this as it typically will just read one line at a time.? However, with some specific commands it can get the job done, and do a fine job of it as well.

I am currently reading through a great sed “manual” at http://www.grymoire.com/Unix/Sed.html that covers all the details I need and I will highlight some of the more pertinent items as I come across them.

Since I will be performing several sed commands on the same document I should use the -e command described from the excerpt below:

Multiple commands with -e command

One method of combining multiple commands is to use a -e before each command:

sed -e 's/a/A/' -e 's/b/B/' <old >new

A “-e” isn’t needed in the earlier examples because sed knows that there must always be one command. If you give sed one argument, it must be a command, and sed will edit the data read from standard input.

The long argument version is

sed --expression='s/a/A/' --expression='s/b/B/' <old >new

Since I want to mimic grep’s behavior in some instances I will need to use the -n flag as detailed below:

sed -n: no printing

The “-n” option will not print anything unless an explicit request to print is found. I mentioned the “/p” flag to the substitute command as one way to turn printing back on. Let me clarify this. The command

sed  's/PATTERN/&/p' file

acts like the cat program if PATTERN is not in the file: e.g. nothing is changed. If PATTERN is in the file, then each line that has this is printed twice. Add the “-n” option and the example acts like grep:

sed -n 's/PATTERN/&/p' file

Nothing is printed, except those lines with PATTERN included.

The long argument of the -n command is either

sed --quiet 's/PATTERN/&/p' file

or

sed --silent 's/PATTERN/&/p' file

Also note:

Using ‘sed -n /pattern/p’ to duplicate the function of grep

If you want to duplicate the functionality of grep, combine the -n (noprint) option with the /p print flag:

sed -n '/PATTERN/p' file

Sed can act like grep by combining the print operator to function on all lines that match a regular expression:

sed -n '/match/ p'

which is the same as:

grep match

Reversing the restriction with !

Sometimes you need to perform an action on every line except those that match a regular expression, or those outside of a range of addresses. The “!” character, which often means not in UNIX utilities, inverts the address restriction. You remember that

sed -n '/match/ p'

acts like the grep command. The “-v” option to grep prints all lines that don’t contain the pattern. Sed can do this with

sed -n '/match/ !p' </tmp/b

Because I will be using multiple sed commands there are some available options to make this happen:

sed -f scriptname

If you have a large number of sed commands, you can put them into a file and use

sed -f sedscript <old >new

where sedscript could look like this:

# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g

When there are several commands in one file, each command must be on a separate line.

The long argument version is

sed --file=sedscript <old >new

Also see here

sed in shell scripts

If you have many commands and they won’t fit neatly on one line, you can break up the line using a backslash:

sed -e 's/a/A/g' 
    -e 's/e/E/g' 
    -e 's/i/I/g' 
    -e 's/o/O/g' 
    -e 's/u/U/g'  <old >new

Because I may want to combine multiple files later to output a single file I am including:

Reading in a file with the ‘r’ command

There is also a command for reading files. The command

sed '$r end' <in>out

will append the file “end” at the end of the file (address “$).” The following will insert a file after the line with the word “INCLUDE:”

sed '/INCLUDE/ r file' <in >out

You can use the curly braces to delete the line having the “INCLUDE” command on it:

#!/bin/sh
sed '/INCLUDE/ {
	r file
	d
}'

An important detail about the delete (‘d’) command:

The other subtlety is the “d” command deletes the current data in the pattern space. Once all of the data is deleted, it does make sense that no other action will be attempted. Therefore a “d” command executed in a curly brace also aborts all further actions. As an example, the substitute command below is never executed:

Because I want to insert tags before and after a matched pattern it might be convenient to use:

Append a line with ‘a’

The “a” command appends a line after the range or pattern. This example will add a line after every line with “WORD:”

#!/bin/sh
sed '
/WORD/ a
Add this line after every line with WORD
'

You could eliminate two lines in the shell script if you wish:

#!/bin/sh
sed '/WORD/ a
Add this line after every line with WORD'

I prefer the first form because it’s easier to add a new command by adding a new line and because the intent is clearer. There must not be a space after the “”.

Insert a line with ‘i’

You can insert a new line before the pattern with the “i” command:

#!/bin/sh
sed '
/WORD/ i
Add this line before every line with WORD
'

Adding more than one line

All three commands will allow you to add more than one line. Just end each line with a “:”

 

#!/bin/sh
sed '
/WORD/ a
Add this line
This line
And this line
'

 

 

 

Debugging Tips:

Displaying control characters with a l

The “l” command prints the current pattern space. It is therefore useful in debugging sed scripts. It also converts unprintable characters into printing characters by outputting the value in octal preceded by a “” character. I found it useful to print out the current pattern space, while probing the subtleties of sed.

Leave a Reply

Your email address will not be published. Required fields are marked *