split by regex?
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix administration > split by regex?




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    split by regex?  
robert


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-16-04 02:29 AM

Hello all.

Does there exist a utility similar to split which will split by regex?

I'm trying to massage some data gathered from varied sources in preparation
for inputting it into a database.  Currently the records are in a single
file.

Picture a large flat-file database that has records divided by a certain
known token pattern.  99 percent of these records are say, five lines
long, with the rest of varying length because of some accident in creation
of the file.  I don't know which ones are wrong, so I would like to split
along the separator tokens and then wc -l on the split output files so I can
readily see which records are broken.

Thanks.
-Robert






[ Post a follow-up to this message ]



    Re: split by regex?  
Stephane CHAZELAS


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-16-04 02:29 AM

2004-10-13, 03:05(-05), robert:
> Hello all.
>
> Does there exist a utility similar to split which will split by regex?
>
> I'm trying to massage some data gathered from varied sources in preparatio
n
> for inputting it into a database.  Currently the records are in a single
> file.
[...]

With nawk you can use a regexp for FS.

With GNU awk, I think, you can also use a regexp for RS.

Or you can use perl.

--
Stephane





[ Post a follow-up to this message ]



    Re: split by regex?  
Liam Cunningham


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-16-04 02:29 AM

On Wed, 13 Oct 2004 03:05:02 -0500, robert wrote:

> Hello all.
>
> Does there exist a utility similar to split which will split by regex?
>
[snip]
>
> Thanks.
> -Robert
Try csplit (ie Context Split). The man pages should help with usage.


--

If at first you don't succeed,
read the manual......






[ Post a follow-up to this message ]



    Re: split by regex?  
William Park


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-16-04 02:29 AM

robert <root@wheel.invalid> wrote:
> Hello all.
>
> Does there exist a utility similar to split which will split by regex?
>
> I'm trying to massage some data gathered from varied sources in
> preparation for inputting it into a database.  Currently the records
> are in a single file.
>
> Picture a large flat-file database that has records divided by a
> certain known token pattern.  99 percent of these records are say,
> five lines long, with the rest of varying length because of some
> accident in creation of the file.  I don't know which ones are wrong,
> so I would like to split along the separator tokens and then wc -l on
> the split output files so I can readily see which records are broken.

1. awk -v FS=... -v RS=...

2. csplit

3. Read the entire file into a string, and cut/slice based on regex.
You can use Python, Perl, or patched Bash shell
(freshmeat.net/projects/bashdiff).

--
William Park <opengeometry@yahoo.ca>
Open Geometry Consulting, Toronto, Canada





[ Post a follow-up to this message ]



    Re: split by regex?  
robert


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-16-04 02:29 AM

begin  Liam Cunningham <liam@consumercontact.com> wrote:
> On Wed, 13 Oct 2004 03:05:02 -0500, robert wrote:
> 
> [snip] 
> Try csplit (ie Context Split). The man pages should help with usage.
>
>


THANK YOU!!!
csplit works perfectly for my needs.






[ Post a follow-up to this message ]



    Re: split by regex?  
robert


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-16-04 02:29 AM

begin  William Park <opengeometry@yahoo.ca> wrote:
> robert <root@wheel.invalid> wrote: 
...[vbcol=seagreen]
>
> 1. awk -v FS=... -v RS=...
>
> 2. csplit
>
> 3. Read the entire file into a string, and cut/slice based on regex.
>   You can use Python, Perl, or patched Bash shell
>   (freshmeat.net/projects/bashdiff).
>


Thanks, William.
My first thought was reading into a C string and chopping it up, but I didn'
t
want to reinvent the wheel.  csplit is exactly what I was looking for.
I may need more than that in the future though, so thanks for the awk FS/RS
pointer.

BTW, bashdiff sounds awesome.  I'm going to try it out next week when I get
some free time.  I'm currently using a lot of bash scripts that call psql,
so the bashdiff builtin PostgreSQL operations sound particularly exciting
to me.






[ Post a follow-up to this message ]



    Re: split by regex?  
robert


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
10-16-04 02:29 AM

begin  Stephane CHAZELAS <this.address@is.invalid> wrote:
> 2004-10-13, 03:05(-05), robert: 
> [...]
>
> With nawk you can use a regexp for FS.
>
> With GNU awk, I think, you can also use a regexp for RS.
>
> Or you can use perl.
>


Thanks, Stephane.
csplit appears to meet my needs for now.

Coming from a mostly C background, I've never really been able to wrap
my brain around perl.  That said, ironically most of my trivial regex
work I do with simple PERL scripts.  I've never used awk, but the
FS/RS stuff looks promising.






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 12:33 PM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register