[TriLUG] Need some help parsing a file
John Vaughters
jvaughters04 at yahoo.com
Mon Dec 30 14:50:20 EST 2013
I realize this is mostly a dead thread, but I just wanted to point out one very neat little trick that I find very useful. As I had established, I like cut 1st and awk 2nd. There is a reason for this. If you are dealing with a file that does not have consistently spaced chars, you need other options. Both cut and awk use the field concept, which I find very easy and quick to get what I need. When I cannot get what I need easily with cut I use awk. The reason is that awk has a tremendous feature that allows you to use any type of delimiter you like. Even a REGEX delimiter. This can be very handy. The cut command only allows a single delimiter char.
I find ruling things out is much easier than worrying about including every possibility in a REGEX and unless you are very proficient with REGEX, you will struggle trying to include everything.
So in this case I am using awk with a very simple REGEX as a delimiter. Basically a space, digits or comma and another space " [,0-9] " is the delimiter. Then printing the second field. So I am ruling out a number that allows commas with a space on each end, which in this case is the file size.
echo '11/09/2013 11:49 AM 7,887,098 this is filename 1.txt' | awk -F " [,0-9]+ " '{print $2}'
Using a file
cat file.txt | awk -F " [,0-9]+ " '{print $2}'
Sometimes I use them together
cat file.txt | cut -d ':' -f 2 | awk -F " [,0-9]+ " '{print $2}'
This is not necessary on this case, but it does eliminate a portion of the string that I do not care about and that is sometimes useful.
Using this technique over the years may account for why I am poor at REGEX, because between cut with simple delimiters and awk with complex delimiters, I have never had to learn REGEX well. Not that I don't use them, but just not enough to be proficient.
One other advantage to using some of these older commands is that you may find yourself on a minimal embedded system or legacy system that only has the older commands. I personally deal with that limitation on a regular basis.
John Vaughters
More information about the TriLUG
mailing list