[TriLUG] Need some help parsing a file

John Vaughters jvaughters04 at yahoo.com
Mon Dec 30 14:50:20 EST 2013


I realize this is mostly a dead thread, but I just wanted to point out one very neat little trick that I find very useful. As I had established, I like cut 1st and awk 2nd. There is a reason for this. If you are dealing with a file that does not have consistently spaced chars, you need other options. Both cut and awk use the field concept, which I find very easy and quick to get what I need. When I cannot get what I need easily with cut I use awk. The reason is that awk has a tremendous feature that allows you to use any type of delimiter you like. Even a REGEX delimiter. This can be very handy. The cut command only allows a single delimiter char.

I find ruling things out is much easier than worrying about including every possibility in a REGEX and unless you are very proficient with REGEX, you will struggle trying to include everything.

So in this case I am using awk with a very simple REGEX as a delimiter. Basically a space, digits or comma and another space " [,0-9] " is the delimiter. Then printing the second field. So I am ruling out a number that allows commas with a space on each end, which in this case is the file size. 

echo '11/09/2013  11:49 AM  7,887,098 this is filename 1.txt' | awk -F " [,0-9]+ " '{print $2}'


Using a file

cat file.txt | awk -F " [,0-9]+ " '{print $2}'


Sometimes I use them together

cat file.txt | cut -d ':' -f 2 | awk -F " [,0-9]+ " '{print $2}'


This is not necessary on this case, but it does eliminate a portion of the string that I do not care about and that is sometimes useful.

Using this technique over the years may account for why I am poor at REGEX, because between cut with simple delimiters and awk with complex delimiters, I have never had to learn REGEX well. Not that I don't use them, but just not enough to be proficient. 

One other advantage to using some of these older commands is that you may find yourself on a minimal embedded system or legacy system that only has the older commands. I personally deal with that limitation on a regular basis.

John Vaughters


More information about the TriLUG mailing list