Monthly Archives: April 2013

SED – a journey through a stream without getting wet

So I am parsing some rather lengthy word documents into usable web formatting and rather than continuing to do them by hand, I have opted to use the command line to leverage a bit more efficient power.? However, there is a learning curve price to pay up front, but I am confident it will more than pay itself back later on.

Currently I am trying to get a better grasp of the sed command, sed stands for Stream EDitor.? It is possible that another kind of script using regex might be a better solution to this problem, but I picked up the “sed hammer” now every problem I see is going to look like a “sed nail”.? This is by choice to force me to become better versed with this tool before moving on to another.

My current problem requires reading multiple lines at once and then adding appropriate tags based on what is read in if a matching pattern is found.? Sed is not the best candidate for this as it typically will just read one line at a time.? However, with some specific commands it can get the job done, and do a fine job of it as well.

I am currently reading through a great sed “manual” at http://www.grymoire.com/Unix/Sed.html that covers all the details I need and I will highlight some of the more pertinent items as I come across them.

Since I will be performing several sed commands on the same document I should use the -e command described from the excerpt below:

Multiple commands with -e command

One method of combining multiple commands is to use a -e before each command:

sed -e 's/a/A/' -e 's/b/B/' <old >new

A “-e” isn’t needed in the earlier examples because sed knows that there must always be one command. If you give sed one argument, it must be a command, and sed will edit the data read from standard input.

The long argument version is

sed --expression='s/a/A/' --expression='s/b/B/' <old >new

Since I want to mimic grep’s behavior in some instances I will need to use the -n flag as detailed below:

sed -n: no printing

The “-n” option will not print anything unless an explicit request to print is found. I mentioned the “/p” flag to the substitute command as one way to turn printing back on. Let me clarify this. The command

sed  's/PATTERN/&/p' file

acts like the cat program if PATTERN is not in the file: e.g. nothing is changed. If PATTERN is in the file, then each line that has this is printed twice. Add the “-n” option and the example acts like grep:

sed -n 's/PATTERN/&/p' file

Nothing is printed, except those lines with PATTERN included.

The long argument of the -n command is either

sed --quiet 's/PATTERN/&/p' file

or

sed --silent 's/PATTERN/&/p' file

Also note:

Using ‘sed -n /pattern/p’ to duplicate the function of grep

If you want to duplicate the functionality of grep, combine the -n (noprint) option with the /p print flag:

sed -n '/PATTERN/p' file

Sed can act like grep by combining the print operator to function on all lines that match a regular expression:

sed -n '/match/ p'

which is the same as:

grep match

Reversing the restriction with !

Sometimes you need to perform an action on every line except those that match a regular expression, or those outside of a range of addresses. The “!” character, which often means not in UNIX utilities, inverts the address restriction. You remember that

sed -n '/match/ p'

acts like the grep command. The “-v” option to grep prints all lines that don’t contain the pattern. Sed can do this with

sed -n '/match/ !p' </tmp/b

Because I will be using multiple sed commands there are some available options to make this happen:

sed -f scriptname

If you have a large number of sed commands, you can put them into a file and use

sed -f sedscript <old >new

where sedscript could look like this:

# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g

When there are several commands in one file, each command must be on a separate line.

The long argument version is

sed --file=sedscript <old >new

Also see here

sed in shell scripts

If you have many commands and they won’t fit neatly on one line, you can break up the line using a backslash:

sed -e 's/a/A/g' 
    -e 's/e/E/g' 
    -e 's/i/I/g' 
    -e 's/o/O/g' 
    -e 's/u/U/g'  <old >new

Because I may want to combine multiple files later to output a single file I am including:

Reading in a file with the ‘r’ command

There is also a command for reading files. The command

sed '$r end' <in>out

will append the file “end” at the end of the file (address “$).” The following will insert a file after the line with the word “INCLUDE:”

sed '/INCLUDE/ r file' <in >out

You can use the curly braces to delete the line having the “INCLUDE” command on it:

#!/bin/sh
sed '/INCLUDE/ {
	r file
	d
}'

An important detail about the delete (‘d’) command:

The other subtlety is the “d” command deletes the current data in the pattern space. Once all of the data is deleted, it does make sense that no other action will be attempted. Therefore a “d” command executed in a curly brace also aborts all further actions. As an example, the substitute command below is never executed:

Because I want to insert tags before and after a matched pattern it might be convenient to use:

Append a line with ‘a’

The “a” command appends a line after the range or pattern. This example will add a line after every line with “WORD:”

#!/bin/sh
sed '
/WORD/ a
Add this line after every line with WORD
'

You could eliminate two lines in the shell script if you wish:

#!/bin/sh
sed '/WORD/ a
Add this line after every line with WORD'

I prefer the first form because it’s easier to add a new command by adding a new line and because the intent is clearer. There must not be a space after the “”.

Insert a line with ‘i’

You can insert a new line before the pattern with the “i” command:

#!/bin/sh
sed '
/WORD/ i
Add this line before every line with WORD
'

Adding more than one line

All three commands will allow you to add more than one line. Just end each line with a “:”

 

#!/bin/sh
sed '
/WORD/ a
Add this line
This line
And this line
'

 

 

 

Debugging Tips:

Displaying control characters with a l

The “l” command prints the current pattern space. It is therefore useful in debugging sed scripts. It also converts unprintable characters into printing characters by outputting the value in octal preceded by a “” character. I found it useful to print out the current pattern space, while probing the subtleties of sed.

Working in OSX terminal behind a proxy

I am still a noob when it comes to the terminal so I am still trying to get a lay of the land.? Fortunately for me I have some friends from Upfront Wichita who are much more advanced and have all kinds of tips and tricks to inspire and challenge me.

One of the biggest challenges I run into is that I have to work through a proxy at my work computer.? I am on a mac using the terminal and often can’t use the same exact commands my compatriots can due to said proxy.? I am going to document the various cases that I run into periodically for posterity and to possibly help someone else who finds themself in the same boat.

Case 1:? Trying to install tmux following the instructions found here, but can’t run a simple curl command.? Here is my failed attempt output:

mac:build user$ curl -OL http://downloads.sourcforge.net/tmux/tmux-1.5.tar.gz
% Total??? % Received % Xferd? Average Speed?? Time??? Time???? Time? Current
Dload? Upload?? Total?? Spent??? Left? Speed
0???? 0??? 0???? 0??? 0???? 0????? 0????? 0 --:--:--? 0:00:17 --:--:--???? 0^C

I need to go THROUGH the proxy to download this file, but currently curl doesn’t know to do that.? I am sure there are several ways to do this, but this is the one I found and it works:

mac:build user$ curl -x 99.99.99.99:9999 -OL http://downloads.sourcforge.net/tmux/tmux-1.5.tar.gz
% Total??? % Received % Xferd? Average Speed?? Time??? Time???? Time? Current
Dload? Upload?? Total?? Spent??? Left? Speed
100? 7936? 100? 7936??? 0???? 0? 12467????? 0 --:--:-- --:--:-- --:--:-- 12497

You can read more about the -x option by typing ‘man curl’ in the terminal.? Man is just short for manual for all my noob comrades.

Bonus tip: Even better than the option listed above is setting up a proxy variable to be used every time curl is run.? To do so, go to your home directory by typing: cd ~/
Next create a new file with your text editor of choice (I will use vim) by typing: vim .curlrc

Note: the .curlrc is a hidden file that holds a set of run commands to be run when the program starts up every time.? Its like priming the engine of a lawn mower so it is ready to go when you yank the cord.? In this case we are making it “ready to go” by defining the proxy so we don’t have to every time the command is run.

Next in you text editor type this and save: proxy = 99.99.9.99:9999???? <–Except of course you need to replace the 9’s with your actual proxy address.? Don’t forget to set the appropriate port denoted by my ‘:9999’ at the tail of the address.

I believe a similar setup can work for other commands where you run into similar proxy issues.? I will each case as I come upon them and if an instance occurs that differs I will be sure to document my solution.

*******************
****UPDATE****
*******************

So I was unable to follow the later instructions listed above as I tried to unpack the tar.bz files I received errors and couldn’t resolve.?? I found another set of instructions that required homebrew first…which has a whole set of other issues in itself, but I have encountered this brew command enough times I thought I would give it a shot.

So far, its a no go with my proxy.? Here is my input command and the output that follows:

mac:local user$ ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
==> This script will install:
/usr/local/bin/brew
/usr/local/Library/...
/usr/local/share/man/man1/brew.1
Press ENTER to continue or any other key to abort
==> Downloading and Installing Homebrew...
error: Could not resolve host: github.com; nodename nor servname provided, or not known while accessing https://github.com/mxcl/homebrew/info/refs?service=git-upload-pack
fatal: HTTP request failed
Failed during: git fetch origin master:refs/remotes/origin/master -n
mac:local user$

So… git needs to be configured for the proxy as well.? Type the following line to setup the proxy:

$ git config --global http.proxy http://proxyuser:proxypwd@proxy.server.com:8080

…and for https…

$ git config --global https.proxy https://proxyuser:proxypwd@proxy.server.com:8080

Of course you will need to replace the proxy address with your own.? In my case it is an ip address so I use the schema 99.99.99.99:9999 and I do not need to include my username or proxy– perhaps because this is already setup in my mac network settings?

Once that is set, then I just type:

brew install tmux

And BOOM!? Done!

GEM INSTALL:

So I have run into yet another issue with my proxy and another subtle variation in how to deal with it:
I went to install tmuxinator using the following command:

mac:Safety-Concerns user$ gem install tmuxinator

But this resulting in a timeout.? So after some reading, I found that I have to set another http_proxy setting up as follows:
I tried: mac:~ user$ set HTTP_PROXY=99.99.99.99:9999 –but this didn’t work.
Then I found: mac:~ user$ export http_proxy=99.99.99.99:9999? — and that was the ticket!

After that I did one more command and installed tmuxinator succesfully:

mac:~ user$ sudo gem install tmuxinator
Password:
[...]
Thank you for installing tmuxinator

Good luck!