 |
|
 |
|
01-17-06 11:05 PM
I am having some data in excel sheet in following manner.
----------------------------------------------------------------------------
---------------------------------------------------------------
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-5 2004 Univ
Illinois
Argonne Natl Lab CNM CNM-AR-5 2004 Univ
Illinois
Argonne Natl Lab CNM CNM-AR-9 2004 Univ
Chicago
Argonne Natl Lab CNM CNM-AR-9 2004 Univ
Chicago
----------------------------------------------------------------------------
---------------------------------------------------------------
I want the output in following manner.
----------------------------------------------------------------------------
---------------------------------------------------------------
CNM-AR-3 CNM
CNM-AR-3 BioDelivery Sciences Intl Inc
CNM-AR-5 CNM
CNM-AR-5 Univ Illinois
CNM-AR-9 CNM
CNM-AR-9 Univ Chicago
----------------------------------------------------------------------------
---------------------------------------------------------------
Can I use AWK for this. Or I should use any database like MS-Access
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-17-06 11:05 PM
friend.05@gmail.com wrote:
> I am having some data in excel sheet in following manner.
>
>
--------------------------------------------------------------------------------------------
-----------------------------------------------
>
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
>
>
--------------------------------------------------------------------------------------------
-----------------------------------------------
>
> I want the output in following manner.
>
>
--------------------------------------------------------------------------------------------
-----------------------------------------------
>
> CNM-AR-3 CNM
> CNM-AR-3 BioDelivery Sciences Intl Inc
> CNM-AR-5 CNM
> CNM-AR-5 Univ Illinois
> CNM-AR-9 CNM
> CNM-AR-9 Univ Chicago
>
>
>
--------------------------------------------------------------------------------------------
-----------------------------------------------
>
> Can I use AWK for this. Or I should use any database like MS-Access
>
You can use awk, but your line-wrapping makes it unclear whether or not
your data's all on one line. Please post a small example of your problem
that doesn't wrap around lines, and also come up with a better subject
line than "AWK" which you've used repeatedly and so mixes up threads in
newsreaders.
Ed.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-17-06 11:05 PM
My data is in one line as follows:
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-5 2004 Illinois
Argonne Natl CNM CNM-AR-5 2004 Illinois
Argonne Natl CNM CNM-AR-9 2004 Chicago
Argonne Natl CNM CNM-AR-9 2004 Chicago
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-17-06 11:05 PM
2006-01-17, 10:58(-08), friend.05@gmail.com:
> I am having some data in excel sheet in following manner.
>
> --------------------------------------------------------------------------
-----------------------------------------------------------------
>
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
>
> --------------------------------------------------------------------------
-----------------------------------------------------------------
>
> I want the output in following manner.
>
> --------------------------------------------------------------------------
-----------------------------------------------------------------
>
> CNM-AR-3 CNM
> CNM-AR-3 BioDelivery Sciences Intl Inc
> CNM-AR-5 CNM
> CNM-AR-5 Univ Illinois
> CNM-AR-9 CNM
> CNM-AR-9 Univ Chicago
>
>
> --------------------------------------------------------------------------
-----------------------------------------------------------------
>
> Can I use AWK for this. Or I should use any database like MS-Access
You can use awk, not AWK (remember case is sensitive on Unix),
but it may not be the best tool as awk has no logic to say
"what's after field 6".
POSIXLY_CORRECT=1 awk '
match(/^[[:blank:]]*([^[:blank:]]+[[:blank:]]+){
;6}/) {
print $5, $4
print $5, substr($0, RLENGTH+1)
}' < file.in > file.out
--
Stéphane
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-17-06 11:05 PM
friend.05@gmail.com wrote:
> My data is in one line as follows:
>
>
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-9 2004 Chicago
> Argonne Natl CNM CNM-AR-9 2004 Chicago
>
Please read http://cfaj.freeshell.org/google before posting again as
you're falling foul of google..
Now, from your previous post, I think what you want to output from the
above is this:
CNM-AR-3 CNM
CNM-AR-3 BioDelivery
CNM-AR-5 CNM
CNM-AR-5 Illinois
CNM-AR-9 CNM
CNM-AR-9 Chicago
so that might just be (untested):
awk '$4!=prev{print $4,$3; print $4,$NF; prev=$4}' file
depending on what your more general requirements are.
Regards,
Ed.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-17-06 11:05 PM
On 2006-01-17, friend.05@gmail.com wrote:
> My data is in one line as follows:
What do you want to do with the data?
Please read: <http://cfaj.freeshell.org/google>
And please use a more specific subject than "AWK".
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-9 2004 Chicago
> Argonne Natl CNM CNM-AR-9 2004 Chicago
>
--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-17-06 11:05 PM
2006-01-17, 19:49(+00), Stephane CHAZELAS:
[...]
> POSIXLY_CORRECT=1 awk '
> match(/^[[:blank:]]*([^[:blank:]]+[[:blank:]]+){6}/
) {
match($0, /...
sorry.
> print $5, $4
> print $5, substr($0, RLENGTH+1)
> }' < file.in > file.out
>
Note that the POSIXLY_CORRECT is in case your awk is gawk which
doesn't handle the {} POSIX ERE intervals without that
environment variable.
--
Stéphane
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-17-06 11:05 PM
Stephane CHAZELAS wrote:
> 2006-01-17, 19:49(+00), Stephane CHAZELAS:
> [...]
>
>
>
> match($0, /...
>
> sorry.
>
>
>
>
> Note that the POSIXLY_CORRECT is in case your awk is gawk which
> doesn't handle the {} POSIX ERE intervals without that
> environment variable.
>
If you're using gawk and you just want to use RE intervals, then as a
rule you should use "gawk --re-interval" rather than setting
POSIXLY_CORRECT or "gawk --posix" because if you do the former you
retain the gawk extensions (e.g. gensub()) that aren't supported in
POSIX compliance mode:
$ echo "hello" | gawk '{print gensub(/l{2}/,"LL","")}'
hello
$ echo "hello" | POSIXLY_CORRECT=1 gawk '{print gensub(/l{2}/,"LL"
,"")}'
gawk: warning: regexp constant for parameter #1 yields boolean value
gawk: (FILENAME=- FNR=1) fatal: function `gensub' not defined
$ echo "hello" | gawk --posix '{print gensub(/l{2}/,"LL","")}'
gawk: warning: regexp constant for parameter #1 yields boolean value
gawk: (FILENAME=- FNR=1) fatal: function `gensub' not defined
$ echo "hello" | gawk --re-interval '{print gensub(/l{2}/,"LL","")
}'
heLLo
Regards,
Ed
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-18-06 07:50 AM
2006-01-17, 17:47(-06), Ed Morton:
[...]
>
> If you're using gawk and you just want to use RE intervals, then as a
> rule you should use "gawk --re-interval" rather than setting
> POSIXLY_CORRECT or "gawk --posix" because if you do the former you
> retain the gawk extensions (e.g. gensub()) that aren't supported in
> POSIX compliance mode:
[...]
The point was to use awk and be portable. POSIXLY_CORRECT fixes
gawk in that case and is harmless for other awks. That makes my
awk script portable to systems where awk is a GNU awk and to
other systems that have a UNIX/POSIX compliant awk.
Now it's true that gensub could be useful for that particular
problem, so you could write a gawk solution for it. But UNIX
systems generally don't have gawk. It's more likely that they
have perl, so I would prefer giving a PERL solution over a gawk
solution and an awk solution over a PERL one (in c.u.*).
--
Stéphane
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
01-18-06 10:55 PM
Stephane CHAZELAS wrote:
> 2006-01-17, 17:47(-06), Ed Morton:
> [...]
>
>
> [...]
>
> The point was to use awk and be portable.
That's fine, but not everyone cares about portability to other awks.
Many of us are just fine being dependent on gawks MANY useful features
over other gawks, not least of which is gensub().
POSIXLY_CORRECT fixes
> gawk in that case and is harmless for other awks. That makes my
> awk script portable to systems where awk is a GNU awk and to
> other systems that have a UNIX/POSIX compliant awk.
Yes, I know, but again that wasn't my point. My point was about the way
to get REs in gawk() without sacrificing non-POSIX functionality.
> Now it's true that gensub could be useful for that particular
> problem, so you could write a gawk solution for it. But UNIX
> systems generally don't have gawk. It's more likely that they
> have perl, so I would prefer giving a PERL solution over a gawk
> solution and an awk solution over a PERL one (in c.u.*).
>
If gawk isn't available on your machine you can always install it. I'm
sure PERL is a fine tool. It's not available on many of the UNIX
machines I use daily at work while gawk is but that's really beside the
point which, once again, is:
"If you're using gawk and you just want to use RE intervals, then as a
rule you should use "gawk --re-interval""
Ed.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 06:14 PM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|