10-16-04 02:29 AM
robert <root@wheel.invalid> wrote:
> Hello all.
>
> Does there exist a utility similar to split which will split by regex?
>
> I'm trying to massage some data gathered from varied sources in
> preparation for inputting it into a database. Currently the records
> are in a single file.
>
> Picture a large flat-file database that has records divided by a
> certain known token pattern. 99 percent of these records are say,
> five lines long, with the rest of varying length because of some
> accident in creation of the file. I don't know which ones are wrong,
> so I would like to split along the separator tokens and then wc -l on
> the split output files so I can readily see which records are broken.
1. awk -v FS=... -v RS=...
2. csplit
3. Read the entire file into a string, and cut/slice based on regex.
You can use Python, Perl, or patched Bash shell
(freshmeat.net/projects/bashdiff).
--
William Park <opengeometry@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
[ Post a follow-up to this message ]
|