[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ale] sed regexp question

Subject: [ale] sed regexp question
From: esoteric at denali.atlnet.com (Wandered Inn)
Date: Tue, 10 Jul 2001 19:18:43 -0400

"Joseph A. Knapka" wrote:
> 
> Christopher Bergeron wrote:
> >
> > That would only get websites that start with www;  I can't predict all the
> > possible names that might arise.  i do know that the url is always encoded
> > in a page as:
> >
> > <A HREF="http://xxx.pornsite.com/pictures1.html/";>
> >
> > so, all I need to do is take everything between the "http:// and the ">
> >
> > any suggestions?
> 
> Here's a briefish Tcl script that will do it:

You've heard people scream FOOD FIGHT, well, LANGUAGE WAR!!

Perl one liner, I think, at least it worked for my data file.  Put it
into a script and execute it by passing the file(s) as command line
arguments.  This works for any case combination of the string 'href=':

#!/usr/bin/perl

while (<>) {

        chomp;
        /[Hh][Rr][Ee][Ff]=/ && printf "%s\n", substr($_, index($_, "=")
+ 2,
                index($_, ">") - index($_, "=") - 3);
}

--
Until later: Geoffrey		esoteric at denali.atlnet.com

"Great spirits have always found violent opposition from mediocre minds.
The latter cannot understand it when a man does not thoughtlessly submit
to hereditary prejudices but honestly and courageously uses his
intelligence." - Albert Einstein
--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.

Prev by Date: [ale] 1/3 full disk, but it is full?
Next by Date: [ale] gpm & usb problem
Previous by thread: [ale] sed regexp question
Next by thread: [ale] sed regexp question
Index(es):
- Date
- Thread