I'm Michael Suodenjoki - a software engineer living in Kgs. Lyngby, north of Copenhagen, Denmark. This is my personal site containing my blog, photos, articles and main interests.
I'm Michael Suodenjoki - a software engineer living in Kgs. Lyngby, north of Copenhagen, Denmark. This is my personal site containing my blog, photos, articles and main interests.
Updated 2011.01.23 17:22 +0100 |
Ikke tilgængelig på dansk.
Version 1.2 August 2002.
This article describes how you can improve the quality of your HTML/XHTML pages by integrating offline HTML/XHTML validation into Microsoft FrontPage using James Clark's SGML Conforming Parser (SP).
Related Article: Integrating HTML Tidy into Microsoft FrontPage.
The code in this article requires:
Download the validation.zip source file for this article.
The validation.zip file contains:
1 Introduction
1.1 Introduction to Validation
1.2 When to Validate
2 The Validator
3 Integration with FrontPage
3.1 Customizing the VBA code
3.2 Customizing the FrontPage menu
4 Conclusion
Appendix A: Integration with your ASP based web server.
How often have you written web documents in editors or text processors that simply couldn't produce the underlying web language correctly? You may not be aware of it, but most of today HTML editors are not very good at producing valid HTML. As a author of web documents you have an interest in authoring your documents so that your pages can be read in one of the browsers available.
Most of my pages in my personal homepage are written as XHTML documents which are the emerging standard for web documents (see www.w3.org). It's a XML-based version of the HTML standard, with some important differences (among others):
If you want to be sure that your XHTML document can be viewed in browsers only supporting HTML you may follow the guidelines described in the XHTML specification, Appendix C HTML Compatibility Guidelines.
For me it is most important that the code is "pretty", commented and valid with respect to the right standards. And this should be true whether you have written the code by hand in a regular text editor (like notepad) or generated it via a WYSIWYG editor (like FrontPage).
This article describes how you can improve the web documents written with the Microsoft FrontPage editor. I will mainly focus on the XHTML part. Microsoft FrontPage is just one out of many editors in which you can create web documents or manage/edit entire webs (collections of web documents). FrontPage is a fairly decent editor that produce good quality XHTML code, however it's not perfect.
Whenever you write a web page you do so using a language as e.g. HTML or XHTML. Actually there are many "languages" available that are more or less strict with respect to how you write your code. A web page is said to be valid when it conforms to the syntax (and semantics) of the language specified in its DOCTYPE declaration.
The DOCTYPE declaration - appearing typically as the first code line of your web page - defines which "language" that the page are written with (or is supposed to). There are a few different languages that you may use. The difference between them lay in the maturity (the version) and in the strictness of the syntax. 3 types are typically available for each language. For example for HTML 4.01, these are:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
The table below summarizes some of the languages. Most web pages are today written in HTML 4.0 or in XHTML 1.0.
Language | Description |
---|---|
HTML 3.2 | The first widespread HTML version back from 1997. Prior versions where not as standardized. |
HTML 4.0 | Intermediate version. |
HTML 4.01 | Latest/current HTML version. |
XHTML 1.0 | A rewritten HTML based on XML. Currently (June 2002) the latest version. |
While most web page WYSIWYG editors today do not validate the web page, this is definitely something that will change in the future. Standardization is the right way forward and validation is a key toward that. Until the editors catch up with the standards we - as web page authors - must validate our pages manually, online as a web service or using external tools (offline validation).
One such (offline) validator is James Clark's SGML Parser (SP for short) which will be used as validator in this article.
Validation should occur as close to the actual editing of the web page as possible. We can prioritize when to validate:
As of today most web pages are either not validated or validated by the author after it has been uploaded. This corresponds to prioritization level 4 above (depending on the quality of your web editor). After reading this article and implementing the code you can push the level to 2. While this is not perfect it's at least a step in the right direction.
As mentioned previously the validator that we will use will be James Clark's SGML Conforming Parser (SP). But what is SGML? SGML is an acronym for Standard Generalized Markup Language and is a system for defining markup languages. Authors mark up their documents by representing structural, presentational, and semantic information alongside content. HTML is one example of a markup language. In fact most of the underlying languages used on the Internet is based on SGML.
If you want to read more about SGML vs. HTML you can read the first section of the SGML tutorial at
http://www.w3.org/TR/html4/
intro/sgmltut.html.
We are using the SP validator's executable - named nsgml.exe - to do the validation for us. It may be downloaded at http://www.jclark.com/sp/howtoget.htm. If you want to validate XHTML documents you must also retrieve the SGML definitions for XHTML 1.0 from W3 at http://www.w3.org/TR/xhtml1/xhtml1.zip (you may unzip them to SP's pubtext directory).
For our need we must supply nsgmls.exe with a few arguments:
The nsgml.exe takes a lot more arguments which will not be describe here. Full documentation of arguments etc. can be found at http://www.jclark.com/sp/nsgmls.htm
Note that nsgmls.exe will try connecting to the Internet to retrieve the DTD from W3 if you do not have it locally - usually within the pubtext sub folder.
Example of command prompt call validating a XHTML 1.0 file named myfile.xhtml:
c:\program files\validator\nsgmls -s -c pubtext\xhtml.soc -f outputfile.log myfile.xhtml
You may omit the -f option if you want the output in the console.
Likewise you may validate one of your HTML documents; assuming here that you're using HTML 4.1:
c:\program files\validator\nsgmls -s -c pubtext\html4.soc myfile.html
At http://ktmatu.com/info/do-it-yourself-offline-html-validator/ there is an excellent description of how you easily can incorporate validation into the Windows explorer
SP's error messages is at first hand quite cryptic, but after a while not that hard to understand. Usually they have information enough to find the troublesome lines. Numbers like "3:12" indicate that there is something wrong in line 3, column 12. We can always build a basic parser to fetch the line, column and error message. I guess this is exactly what they have done at W3's validation service.
Now let's integrate it with FrontPage...
Microsoft FrontPage (version 2000 or 2002) both support Visual Basic for Applications that we can utilize for integrating the validation into the menu system so a simple menu activation will validate our web document.
I have made a single Visual Basic file available (Validate.bas) that essentially wraps the call to the validator within VBA. You may follow the guide in section 3.2 that describes how to incorporate it into FrontPage.
I will not comment the code in details just mention that the basic principle is that the nsgmls.exe executable is called with the current active FrontPage document saved in a temporary file and that we present the results from nsgmls in a dialog.
' ' Validate.bas - Integrating James Clark's SP in Microsoft FrontPage 2000/2002 ' ' Option Explicit ' Specifies path to where you have installed NSGML Const NSGML_PATH = "C:\Program Files\Validator\" ' Remember trailing backslash ' Specifies Path to the SP NSGML executable... Const NSGML_PROGRAM_FILE = NSGML_PATH & "nsgmls.exe" ' Specifies path to a temporary file... Const TEMP_INPUT_FILE = NSGML_PATH & "input.tmp" Const TEMP_OUTPUT_FILE = NSGML_PATH & "output.tmp" ' Specifies the input files to SP (nsgml.exe) Const XHTML1_SOC_FILE = NSGML_PATH & "Pubtext\XHTML.soc" Const HTML4_SOC_FILE = NSGML_PATH & "Pubtext\HTML4.soc"' ' '************************************ ' VALIDATE_FILE ' ' Sub Validate_File() Dim bFlipToHTMLSource As Boolean bFlipToHTMLSource = False If ActivePageWindow Is Nothing Then MsgBox "Please open a file in the Frontpage Editor.", _ vbOKOnly Or vbCritical Exit Sub End If If Not ActivePageWindow.ViewMode = fpPageViewNormal Then bFlipToHTMLSource = True ActivePageWindow.ViewMode = fpPageViewNormal End If Dim doc As FPHTMLDocument Set doc = ActivePageWindow.Document Dim fs Set fs = CreateObject("Scripting.FileSystemObject") Dim ts Set ts = fs.CreateTextFile(TEMP_INPUT_FILE) ' Write the current frontpage document into a temporary file ts.Write doc.DocumentHTML ts.Close Dim sSocFile ' As String Dim sLine ' As String Dim nLine ' As Integer ' Assume that we're going to validate XHTML 1.0 sSocFile = XHTML1_SOC_FILE ' The following code tries (primatively) to find ' which DTD that is specified in the input file, so ' that we can choose which SOC file to give nsgml.exe Set ts = fs.OpenTextFile(TEMP_INPUT_FILE) nLine = 1 While nLine < 4 And Not ts.AtEndOfStream ' We're only looking in the 4 first lines sLine = ts.ReadLine() If InStr(sLine, "DTD HTML 4") > 0 Then sSocFile = HTML4_SOC_FILE End If nLine = nLine + 1 Wend ts.Close Dim strCmd As String ' Build command line strCmd = NSGML_PROGRAM_FILE & " -s" & _ " -c """ & sSocFile & """" & _ " -f """ & TEMP_OUTPUT_FILE & """" & _ " """ & TEMP_INPUT_FILE & """" 'MsgBox strCmd ' Excecute the command line... ' For more information see ' <http://support.microsoft.com/support/kb/articles/Q129/7/96.asp> ExecCmd strCmd If bFlipToHTMLSource Then ActivePageWindow.ViewMode = fpPageViewHtml End If Dim es 'Read the TEMP_OUTPUT_FILE and copy the content into the Form_output Set es = fs.OpenTextFile(TEMP_OUTPUT_FILE, 1) ' 1=ForReading If es.AtEndOfStream Then Dim sOutput sOutput = "Document successfully validated" If sSocFile = XHTML1_SOC_FILE Then sOutput = sOutput & " as XHTML 1.0" Else sOutput = sOutput & " as HTML 4.0" End If sOutput = sOutput & ". No errors reported." Form_tidy_output.TextBox_tidy_output.Text = sOutput Else Form_tidy_output.TextBox_tidy_output.Text = es.ReadAll End If Form_tidy_output.Caption = "Validation Result" Form_tidy_output.Show Exit Sub ValidationError: MsgBox "Validation could not execute correctly. " & _ "No changes have been carried out." & Chr(10) & _ "Error # " & CStr(Err.Number) & " " & Err.Description, _ vbOKOnly Or vbCritical End Sub
The VBA code cannot be used directly but must be customized to the location of where you have SP located. Five string constants should be defined. If you have installed everything in the a single folder with the 'pubtext' sub folder beneath it should enough to fix up the NSGML_PATH constant.
' Specifies path to where you have installed NSGML Const NSGML_PATH = "C:\Program Files\Validator\" ' Remember trailing backslash ' Specifies Path to the SP NSGML executable... Const NSGML_PROGRAM_FILE = NSGML_PATH & "nsgmls.exe" ' Specifies path to a temporary file... Const TEMP_INPUT_FILE = NSGML_PATH & "input.tmp" Const TEMP_OUTPUT_FILE = NSGML_PATH & "output.tmp" ' Specifies the input files to SP (nsgml.exe) Const XHTML1_SOC_FILE = NSGML_PATH & "Pubtext\XHTML.soc" Const HTML4_SOC_FILE = NSGML_PATH & "Pubtext\HTML4.soc"
You may add extra error level check after an execution of ExecCmd. ExecCmd() returns the error level from the executed file. For Tidy, "0" means "OK", "1" means "There are warnings", "2" means "There are errors". When errors occur, Tidy can't continue. One could simply add something like this:
If ExecCmd(strCmd) > 1 Then ... Exit Sub End If
This section shows you how to customize the FrontPage menu with an extra menu with the call to our VBA function.
How to guide:
I have shown you how you can integrate the SP validator into FrontPage and thereby improve the overall quality of your web documents in an easy manner.
There are of course things that could be improved. Among other things it would be nice to:
I would like to thanks James Clark, jjc@jclark.com for making the SGML Conforming Parser (SP) available free for use for everybody.
Nice authoring.
This appendix describes some help topics that may be of use if you want to put the validation into an ASP-based web server - like Microsoft's Internet Information Server (IIS). In this way you can provide validation on pages in your local Intranet.
First install the SP validator in a new folder of your main folder of your Intranet Web. For example you could name it 'executables'. See figure below for basic folder layout.
Ensure that the new folder is setup as an executable folder.
Remember to setup folder/file permissions for the IWAM_<server name> "user", so that "he" can access the files to validate. Furthermore the IWAM_<server name> user most also have execute permissions to the folder where the nsgml.exe file is located.
I'm not a web administrator myself so be carefully with tampering with the webs security setup.
Create a validate.asp ASP server file that you store in another folder, e.g. in the 'validate' folder as indicated in the figure above. The validate.asp may look something like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <meta http-equiv="Content-Language" content="en-us"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <title>Validation</title> <meta name="author" content="Michael Suodenjoki" /> <meta name="description" content="Validate Service" /> <meta name="keywords" content="validate,validation,service" /> <meta name="keywords" content="correctness,wellformness" /> <meta name="robots" content="index,follow" /> <!--#include virtual="/include/styles.inc"--> </head> <body> <!--#include virtual="/include/header.inc"--> <h1>Validation Service</h1> <div class="box180"> Usefull links: <ul> <li><a href="http://www.htmlhelp.com/reference/html40/alist.html"> HTML 4.0 Elements (Alphabetical list)</a> </li> </ul> </div> <p>The following page has been validated for XHTML :</p> <% Response.Write("<p>http://" & Request.ServerVariables("SERVER_NAME") & Request.QueryString("url") & "</p>") Dim sValidateFile sValidateFile = Server.MapPath(Request.QueryString("url")) ' Debug messages (currently outcommented) 'Response.Write("<p>" & Request.QueryString("url") & "</p>") 'Response.Write("<p>" & sValidateFile & "</p>") 'Response.Write("<p>" & Request.ServerVariables("SCRIPT_NAME")& "</p>") 'Response.Write("<p>" & Server.MapPath(Request.ServerVariables("SCRIPT_NAME")) & "</p>") Dim oWSH,sCmdLine Set oWSH = Server.CreateObject("WScript.Shell") ' Debug message (currently outcommented) 'Response.Write("Executable: " & Server.MapPath("/executables/nsgmls.exe") & "<br/>" ) ' Create a temporary file name (that is used for output) Dim oFSO ' As Object - FileSystemObject. Dim sTempFile ' As String - temporary file name Set oFSO = Server.CreateObject("Scripting.FileSystemObject") sTempFile = Server.MapPath("/executables") & "\" & oFSO.GetTempName() ' Build command line ' ' Example: ' C:\Validator\SP\bin\nsgmls -s -c C:\Validator\SP\pubtext\xhtml.soc ' -f %TEMP%\validation-results.txt %1 ' sCmdLine = Server.MapPath("/executables/nsgmls.exe") & " -s" & _ " -c " & Server.MapPath("/executables/pubtext/xhtml.soc") & _ " -f " & sTempFile & _ " """ & sValidateFile & """" ' Debug output - write command line... 'Response.Write("<p>Command line: "" & sCmdLine & ""</p>" ) ' Execute the command line Call oWSH.Run(sCmdLine,1,True) 'Call oWSH.Run("notepad.exe",5,False) %> <% ' Read the result and present it Dim oFile Set oFile= oFSO.OpenTextFile(sTempFile, 1) '1=ForReading Dim sResult If oFile.AtEndOfStream <> True Then sResult = oFile.ReadAll() End If 'Response.Write("<div>" & sResult & "</div>" ) oFile.Close Set oFile = oFSO.OpenTextFile(sValidateFile,1) '1=ForReading Dim sSource sSource = oFile.ReadAll() oFile.Close ' Delete the temporary file... oFSO.DeleteFile(sTempFile) Set oFSO = Nothing Set oWSH = Nothing %> <h2>Validation Result</h2> <% Dim sSourceLines sSourceLines = Split(sSource,vbCrLf) Dim sLines, nLastLine sLines = Split(sResult, vbCrLf) nLastLine = ubound(sLines) ' ' 'Report Error' ' ' This function parses a SGML error line and reports it in a more nice way. ' Function ReportError( sErrorLine ) Dim sElems sElems = Split(sErrorLine, ":" ) ' 0 = sElems(0) = Drive of exe ' 1 = sElems(1) = File name of exe ' 2 = sElems(2) = Drive of web page ' 3 = sElems(3) = File name of web page file ' 4 = sElems(4) = Line number ' 5 = sElems(5) = Column ' 6 = sElems(6) = Type of error 'E' = error ' 7 = sElems(7) = Error Message If sElems(2)="E" Then Response.Write( "<p>Error: " & sElems(3) & ":" & sElems(4) & "</p>" ) Else If sElems(6)="E" Then Response.Write( "<li>Line <a href='#" & sElems(4) & "'>" & sElems(4) & _ "</a> Column " & sElems(5) & " - " ) Response.Write( "Error: " ) If 7 <= ubound(sElems) Then Response.Write( sElems(7) ) End If Response.Write( "</li>" ) Response.Write( "<blockquote class='syntax'>" ) Response.Write( Replace(sSourceLines(sElems(4)-1),"<","<") & "<br/>" ) Response.Write( Replace(Space(sElems(5))," "," ") & _ "<span style='color:red'>^</span>" ) Response.Write( "</blockquote>" ) End If End If End Function ' ' 'ReportErrors' ' ' Function to reports errors from a SGML output file (assuming ' these a present in sLines array. ' Function ReportErrors() nLine = 0 While nLine < nLastLine Call ReportError(sLines(nLine)) 'Response.Write nLine & ": " & sLines(nLine) & "<br/>" nLine=nLine+1 Wend End Function If nLastLine > 0 Then Call ReportErrors() Else Response.Write("<p>No Errors Found.</p>") End If %> <h2>Source Listing</h2> <p>Below is the source input used for this validation:</p> <pre class="syntax"> <% nLastLine = ubound(sSourceLines) nLine = 0 While nLine < nLastLine Response.Write "<a name=" & nLine+1 & ">" & nLine+1 & "</a>: " & _ Replace(sSourceLines(nLine),"<","<") & "<br/>" nLine=nLine+1 Wend %> </pre> <!--webbot bot="PurpleText" PREVIEW="Footer Included Here..." --> <!--#include virtual="/include/footer.inc"--> </body> </html>
To validate a web page you should give the validate.asp a parameter named url, e.g. as:
http://www.myweb.com/validate.asp?url=/files_to_check/mypage.html
Hope you can use it.