<< HTML_and_ConTeXt | Setting_up_ConTeXt_document >
The next step is to retrieve the HTML pages created in the step above. Here I have used the ruby library 'open-uri' to retrieve the web-page and another libray 'hpricot' to edit these pages and translate html markup into ConTeXt markup.
#scan_page.rb = Retrieves the html page of interest from the server,
# navigates to links within the main page and construct a
# context document
#!/usr/bin/ruby
require 'rubygems'
require 'open-uri' # the open-uri library
require 'hpricot' # the hpricot library
require 'scrape_page' # user-defined function to filter html into ConTeXt
# scans the home page and lists
# all the directories and subdirectories
doc=Hpricot(open("http://ipa.dd.re.ss/AnnRep07"))