[TriLUG] Need some help parsing a file

Mon Dec 30 22:49:09 EST 2013

On Sun, 29 Dec 2013 21:04:16 -0500
Brian Blater <brb.lists at gmail.com> wrote:

> This has never been my forte and just can't seem to figure out what I
> need to do.
> 
> I've got a file that basically has a directory listing. I need to
> parse out everything but the filenames. The format of the document is
> basically like this:
> 
> 11/09/2013  11:49 AM         7,887,098 this is filename 1.txt
> 11/05/2013  08:09 PM        11,652,690 this is filename 2.sh
> 
> Basically I need to strip the date, time and bytes and just leave the
> filename. Filenames will have spaces and various characters, but it is
> always after the bytes and spaces are what separate everything.

I'd take advantage of the fact that you want to get rid of the first
whitespace and everything after it:

cat junk.txt | sed -e"s/\s.*//"

I tried the preceding, and it worked perfectly.

Personally, I think AWK's a little bit overkill for this (but I use AWK
all the time for tougher parsing), and using Perl for this (or Python
or Ruby) is insanity.

The cut option's also excellent, but I remember regex a lot better than
cut's arguments and options. And as someone says, but removing the
first space and everything after it, you get around implementation
problems, unless some version of ls prepends lines with spaces, or you
use ls -l.

Thanks,

SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance