|
Home > Archive > Unix questions > February 2006 > AWK
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| friend.05@gmail.com 2006-01-17, 6:05 pm |
| I am having some data in excel sheet in following manner.
-------------------------------------------------------------------------------------------------------------------------------------------
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-3 2004
BioDelivery Sciences Intl Inc
Argonne Natl Lab CNM CNM-AR-5 2004 Univ
Illinois
Argonne Natl Lab CNM CNM-AR-5 2004 Univ
Illinois
Argonne Natl Lab CNM CNM-AR-9 2004 Univ
Chicago
Argonne Natl Lab CNM CNM-AR-9 2004 Univ
Chicago
-------------------------------------------------------------------------------------------------------------------------------------------
I want the output in following manner.
-------------------------------------------------------------------------------------------------------------------------------------------
CNM-AR-3 CNM
CNM-AR-3 BioDelivery Sciences Intl Inc
CNM-AR-5 CNM
CNM-AR-5 Univ Illinois
CNM-AR-9 CNM
CNM-AR-9 Univ Chicago
-------------------------------------------------------------------------------------------------------------------------------------------
Can I use AWK for this. Or I should use any database like MS-Access
| |
| Ed Morton 2006-01-17, 6:05 pm |
|
friend.05@gmail.com wrote:
> I am having some data in excel sheet in following manner.
>
>
-------------------------------------------------------------------------------------------------------------------------------------------
>
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
>
>
-------------------------------------------------------------------------------------------------------------------------------------------
>
> I want the output in following manner.
>
>
-------------------------------------------------------------------------------------------------------------------------------------------
>
> CNM-AR-3 CNM
> CNM-AR-3 BioDelivery Sciences Intl Inc
> CNM-AR-5 CNM
> CNM-AR-5 Univ Illinois
> CNM-AR-9 CNM
> CNM-AR-9 Univ Chicago
>
>
>
-------------------------------------------------------------------------------------------------------------------------------------------
>
> Can I use AWK for this. Or I should use any database like MS-Access
>
You can use awk, but your line-wrapping makes it unclear whether or not
your data's all on one line. Please post a small example of your problem
that doesn't wrap around lines, and also come up with a better subject
line than "AWK" which you've used repeatedly and so mixes up threads in
newsreaders.
Ed.
| |
| friend.05@gmail.com 2006-01-17, 6:05 pm |
| My data is in one line as follows:
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-3 2004 BioDelivery
Argonne Natl CNM CNM-AR-5 2004 Illinois
Argonne Natl CNM CNM-AR-5 2004 Illinois
Argonne Natl CNM CNM-AR-9 2004 Chicago
Argonne Natl CNM CNM-AR-9 2004 Chicago
| |
| Stephane CHAZELAS 2006-01-17, 6:05 pm |
| 2006-01-17, 10:58(-08), friend.05@gmail.com:
> I am having some data in excel sheet in following manner.
>
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-3 2004
> BioDelivery Sciences Intl Inc
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-5 2004 Univ
> Illinois
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
> Argonne Natl Lab CNM CNM-AR-9 2004 Univ
> Chicago
>
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> I want the output in following manner.
>
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> CNM-AR-3 CNM
> CNM-AR-3 BioDelivery Sciences Intl Inc
> CNM-AR-5 CNM
> CNM-AR-5 Univ Illinois
> CNM-AR-9 CNM
> CNM-AR-9 Univ Chicago
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------
>
> Can I use AWK for this. Or I should use any database like MS-Access
You can use awk, not AWK (remember case is sensitive on Unix),
but it may not be the best tool as awk has no logic to say
"what's after field 6".
POSIXLY_CORRECT=1 awk '
match(/^[[:blank:]]*([^[:blank:]]+[[:blank:]]+){6}/) {
print $5, $4
print $5, substr($0, RLENGTH+1)
}' < file.in > file.out
--
Stéphane
| |
| Ed Morton 2006-01-17, 6:05 pm |
|
friend.05@gmail.com wrote:
> My data is in one line as follows:
>
>
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-9 2004 Chicago
> Argonne Natl CNM CNM-AR-9 2004 Chicago
>
Please read http://cfaj.freeshell.org/google before posting again as
you're falling foul of google..
Now, from your previous post, I think what you want to output from the
above is this:
CNM-AR-3 CNM
CNM-AR-3 BioDelivery
CNM-AR-5 CNM
CNM-AR-5 Illinois
CNM-AR-9 CNM
CNM-AR-9 Chicago
so that might just be (untested):
awk '$4!=prev{print $4,$3; print $4,$NF; prev=$4}' file
depending on what your more general requirements are.
Regards,
Ed.
| |
| Chris F.A. Johnson 2006-01-17, 6:05 pm |
| On 2006-01-17, friend.05@gmail.com wrote:
> My data is in one line as follows:
What do you want to do with the data?
Please read: <http://cfaj.freeshell.org/google>
And please use a more specific subject than "AWK".
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-3 2004 BioDelivery
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-5 2004 Illinois
> Argonne Natl CNM CNM-AR-9 2004 Chicago
> Argonne Natl CNM CNM-AR-9 2004 Chicago
>
--
Chris F.A. Johnson, author | <http://cfaj.freeshell.org>
Shell Scripting Recipes: | My code in this post, if any,
A Problem-Solution Approach | is released under the
2005, Apress | GNU General Public Licence
| |
| Stephane CHAZELAS 2006-01-17, 6:05 pm |
| 2006-01-17, 19:49(+00), Stephane CHAZELAS:
[...]
> POSIXLY_CORRECT=1 awk '
> match(/^[[:blank:]]*([^[:blank:]]+[[:blank:]]+){6}/) {
match($0, /...
sorry.
> print $5, $4
> print $5, substr($0, RLENGTH+1)
> }' < file.in > file.out
>
Note that the POSIXLY_CORRECT is in case your awk is gawk which
doesn't handle the {} POSIX ERE intervals without that
environment variable.
--
Stéphane
| |
| Ed Morton 2006-01-17, 6:05 pm |
|
Stephane CHAZELAS wrote:
> 2006-01-17, 19:49(+00), Stephane CHAZELAS:
> [...]
>
>
>
> match($0, /...
>
> sorry.
>
>
>
>
> Note that the POSIXLY_CORRECT is in case your awk is gawk which
> doesn't handle the {} POSIX ERE intervals without that
> environment variable.
>
If you're using gawk and you just want to use RE intervals, then as a
rule you should use "gawk --re-interval" rather than setting
POSIXLY_CORRECT or "gawk --posix" because if you do the former you
retain the gawk extensions (e.g. gensub()) that aren't supported in
POSIX compliance mode:
$ echo "hello" | gawk '{print gensub(/l{2}/,"LL","")}'
hello
$ echo "hello" | POSIXLY_CORRECT=1 gawk '{print gensub(/l{2}/,"LL","")}'
gawk: warning: regexp constant for parameter #1 yields boolean value
gawk: (FILENAME=- FNR=1) fatal: function `gensub' not defined
$ echo "hello" | gawk --posix '{print gensub(/l{2}/,"LL","")}'
gawk: warning: regexp constant for parameter #1 yields boolean value
gawk: (FILENAME=- FNR=1) fatal: function `gensub' not defined
$ echo "hello" | gawk --re-interval '{print gensub(/l{2}/,"LL","")}'
heLLo
Regards,
Ed
| |
| Stephane CHAZELAS 2006-01-18, 2:50 am |
| 2006-01-17, 17:47(-06), Ed Morton:
[...]
>
> If you're using gawk and you just want to use RE intervals, then as a
> rule you should use "gawk --re-interval" rather than setting
> POSIXLY_CORRECT or "gawk --posix" because if you do the former you
> retain the gawk extensions (e.g. gensub()) that aren't supported in
> POSIX compliance mode:
[...]
The point was to use awk and be portable. POSIXLY_CORRECT fixes
gawk in that case and is harmless for other awks. That makes my
awk script portable to systems where awk is a GNU awk and to
other systems that have a UNIX/POSIX compliant awk.
Now it's true that gensub could be useful for that particular
problem, so you could write a gawk solution for it. But UNIX
systems generally don't have gawk. It's more likely that they
have perl, so I would prefer giving a PERL solution over a gawk
solution and an awk solution over a PERL one (in c.u.*).
--
Stéphane
| |
| Ed Morton 2006-01-18, 5:55 pm |
| Stephane CHAZELAS wrote:
> 2006-01-17, 17:47(-06), Ed Morton:
> [...]
>
>
> [...]
>
> The point was to use awk and be portable.
That's fine, but not everyone cares about portability to other awks.
Many of us are just fine being dependent on gawks MANY useful features
over other gawks, not least of which is gensub().
POSIXLY_CORRECT fixes
> gawk in that case and is harmless for other awks. That makes my
> awk script portable to systems where awk is a GNU awk and to
> other systems that have a UNIX/POSIX compliant awk.
Yes, I know, but again that wasn't my point. My point was about the way
to get REs in gawk() without sacrificing non-POSIX functionality.
> Now it's true that gensub could be useful for that particular
> problem, so you could write a gawk solution for it. But UNIX
> systems generally don't have gawk. It's more likely that they
> have perl, so I would prefer giving a PERL solution over a gawk
> solution and an awk solution over a PERL one (in c.u.*).
>
If gawk isn't available on your machine you can always install it. I'm
sure PERL is a fine tool. It's not available on many of the UNIX
machines I use daily at work while gawk is but that's really beside the
point which, once again, is:
"If you're using gawk and you just want to use RE intervals, then as a
rule you should use "gawk --re-interval""
Ed.
| |
| Sven Mascheck 2006-01-18, 5:55 pm |
| Ed Morton wrote:
> If gawk isn't available on your machine you can always install it.
- It's often not my machine,
- It might be cumbersome to install something behind firewalls,
- ...
Fine to use gawk. Fine to discuss specific solution here, too.
But IMHO it's questionable to generally advocate for _GNU-only_
in this group - your statement is close to this. (intentionally?)
One of the finest purposes of this group for me is learning
what implementations of shell utilities _actually_ exist.
In Usenet i can learn what's "common unix" and what's proprietary.
Where else, if not here?
That's why posting portable solutions and discussing the reasons even
makes sense if one is not emphasizing on portable programming all day.
Otherwise this group were nothing but a support-forum for homework.
(Manuals, References and Guides are better to be found elsewhere.)
| |
| Sven Mascheck 2006-01-18, 5:55 pm |
| Ed Morton wrote:
> If gawk isn't available on your machine you can always install it.
- It's often not my machine,
- It might be cumbersome to install something behind firewalls,
- ...
Fine to use gawk. Fine to discuss specific solution here, too.
But IMHO it's questionable to generally advocate for _GNU-only_
in this group - your statement is close to this. (intentionally?)
One of the finest purposes of these groups (here and *.shell) for me
is learning what implementations of shell utilities _actually_ exist.
In Usenet i can learn what's "common unix" and what's proprietary.
Where else, if not here?
That's why posting portable solutions and discussing the reasons even
makes sense if one is not emphasizing on portable programming all day.
Otherwise this group were nothing but a support-forum for homework.
(Manuals, References and Guides are better to be found elsewhere.)
| |
| Ed Morton 2006-01-18, 5:55 pm |
|
Sven Mascheck wrote:
> Ed Morton wrote:
>
>
>
>
> - It's often not my machine,
> - It might be cumbersome to install something behind firewalls,
> - ...
>
> Fine to use gawk. Fine to discuss specific solution here, too.
> But IMHO it's questionable to generally advocate for _GNU-only_
> in this group - your statement is close to this. (intentionally?)
This is getting absolutely ridiculous. Once again, for the last time, my
point (stated repeatedly in the part you snipped) was:
"If you're using gawk and you just want to use RE intervals, then as a
rule you should use "gawk --re-interval""
That's all.
Ed.
| |
| Sven Mascheck 2006-01-18, 5:55 pm |
| Ed Morton wrote:
[vbcol=seagreen]
[vbcol=seagreen]
> This is getting absolutely ridiculous. Once again, for the last time,
> my point (stated repeatedly in the part you snipped) was: [snip]
I snipped the rest, because i was not referring to it!
I am not objecting to anything else but the sentence above - and you
_did_ want to tell us something with that sentence, didn't you?
(And i did point out, _why_ i bothered to join here.)
| |
| Ed Morton 2006-01-18, 5:55 pm |
|
Sven Mascheck wrote:
> Ed Morton wrote:
>
>
>
>
>
>
>
>
> I snipped the rest, because i was not referring to it!
>
> I am not objecting to anything else but the sentence above - and you
> _did_ want to tell us something with that sentence, didn't you?
Yes, but what that was was related to the context you snipped. By just
taking that one line above out of context and then saying "But IMHO it's
questionable to generally advocate for _GNU-only_
in this group - your statement is close to this. (intentionally?)" you
created a totally false impression of the real point I was trying to
make in my post.
> (And i did point out, _why_ i bothered to join here.)
And you've now again snipped the context for why I responded as I did to
you, but I put it back in my above response.
Ed.
| |
| Sven Mascheck 2006-01-18, 5:55 pm |
| Ed Morton wrote:
> Yes, but what that was was related to the context you snipped.
Sorry, then i just don't get it; never mind 
| |
|
| try this ..
sort -u inputfile | awk '{ print $4 " " $3 "\n" $4 " " $6 }'
| |
| friend.05@gmail.com 2006-01-24, 6:23 pm |
| can use awk with excel sheet
| |
| Ed Morton 2006-01-24, 6:23 pm |
|
friend.05@gmail.com wrote:
> can use awk with excel sheet
>
If that's supposed to be a question, you'll have to do a lot better than
that to get an answer. If it's a statement, you'll have to do a lot
better than that to convey it's meaning.
Ed.
| |
| simonp@nospam.com 2006-02-22, 2:48 am |
| friend.05@gmail.com <hirenshah.05@gmail.com> wrote:
> can use awk with excel sheet
>
Google has alot to answer for.
Simon
--
www.simonpole.ca
Two-fisted reflections from around the world and outer space.
|
|
|
|
|