Click and navigate to chapters and sections

<< HTML_and_ConTeXt | HTML_to_ConTeXt >

In this example, we created new pages for chapters and sections so that each part of the document could be authored by a different person. In Informl new pages are indicated by the CSS class name "existingWikiWord" as shown in the following figure.

.

<p>
  <a class="existingWikiWord"
      href="http://localhost:3010/AnnRep07/pages/APCC+Research+and+Development+Projects">
      APCC Research and Development Projects
  </a>
</p>

Knowing this, I have used the following 'hpricot' code to click on chapter and section links to retrieve their contents.

chapters= (doc/"p/a.existingWikiWord")

# we need to navigate one more level into the web page
# let us discover the links for that
chapters.each do |ch|
  chap_link = ch.attributes['href']
  # using inner_html we can create subdirectories


  chap_name = ch.inner_html.gsub(/\s*/,"")
  chap_name_org = ch.inner_html

  # We create chapter directories
  system("mkdir -p #{chap_name}")
  fil.write "\\input #{chap_name}  \n"
  chapFil="#{chap_name}.tex"
  `rm #{chapFil}`
  cFil=File.new(chapFil,"a")
  cFil.write "\\chapter{ #{chap_name_org} } \n"

  # We navigate to sections now
  doc2=Hpricot(open(chap_link))
  sections= (doc2/"p/a.existingWikiWord")
  sections.each do |sc|
    sec_link = sc.attributes['href']
    sec_name = sc.inner_html.gsub(/\s*/,"")

    secFil="#{chap_name}/#{sec_name}.tex"
    `rm #{secFil}`
    sFil=File.new(secFil,"a")
    sechFil="#{chap_name}/#{sec_name}.html"
    `rm #{sechFil}`
    shFil=File.new(sechFil,"a")

After navigating to sections (h1 elements in HTML) retrieve their contents and send it to the ruby function "scrape_page.rb" for filtering.

    #  scrape_the_page(sec_link,"#{chap_name}/#{sec_name}")
    scrape_the_page(sec_link,sFil,shFil)
    cFil.write "\\input #{chap_name}/#{sec_name} \n"
  end
end
fil.write "\\stoptext \n"