Wednesday, June 13, 2012

Extracting links from web pages using Fuzuli

We have too much written about Fuzuli and its core components but we didn't publish any real world applications run on it.

Fuzuli, our new general purpose programming language and interpreter, is first introduced in Practical Code Solutions blog and has the official web page

Here we have an example of extracting links from an HTTP connection. Program asks for a domain name. The default one is and processed by just pressing enter key. Then program sends an HTTP GET request to the server and reads the content. After collecting all of the content, program starts to parse HTML codes and shows the tags start with an A tag.  The Fuzuli code is shown below:

# Loading required packages
(require "/usr/lib/fuzuli/nfl/")
(require "/usr/lib/fuzuli/nfl/")
(require "/usr/lib/fuzuli/nfl/")

# Getting a domain name from user.
(puts "Please give a domain (for default just type enter):")
(let word (readline))

# If user did not type anything
# set the default page to
(if (< (strlen word) 3)
   (let word "")
(print "Doing " word "\n")

# Open a socket connection to host
(print "Connecting " word "\n")
(let socket (fsockopen word 80))

# Sending HTTP Request
(print "Sending request\n")
(fsockwrite socket (strcat (list "GET /\n\n")))

# Reading html content
(print "Retrieving result\n")
(def htmllist LIST)
(while 1
   (let c (fsockread socket 1))
   (if (= (typeof c) NULL) (break))
   (append htmllist c)
# Closing socket
(fsockclose socket)

(print "Constucting string\n")

(let html (strcat htmllist))
(let len (strlen html))
(print len " bytes read.\n")

(def part STRING)
(def i INTEGER)

# Parsing loaded content
(for (let i 0) (< i len) (inc i)
   (let part (substr html i (+ i 7)))
   (if (= part "<a href")
         (print "link found: \n")
         (while (!= part "</a>")
            (let part (substr html i (+ i 4)))
            (print (substr html i (+ i 1)))
            (inc i)
         (print part "\n")

The example given above combines variable definitions and scopes, loops, sockets and basic io. Please get more detailed information about the keywords, commands and functions using Fuzuli's documentation site.

No comments:

Post a Comment