/*                     MPARSER.C - A CSDGM METADATA PARSING PROGRAM 
*
*   The following program is able to parse a CSDGM metadata text file into a format
* that can be easily read into a relational database management system.  Accompanying 
* the program are three text files, which includes FIELD.TXT, SECTION.TXT and EXCL.TXT.
* The field.txt file contains the CSDGM field parameters mparser will look for in the
* metadata file.  The field.txt file uses the string, "--", as a line separator that is 
* ignored by the parsing routine however provides you with a simple way to clarify the 
* contents of this file.  Also notice that all section names within the field.txt file 
* are prefixed by the '**' characters.  This is required.  
*   The section.txt file contains the CSDGM section parameters mparser uses to keep tract
* of its location within the metadata file.  The excl.txt file indicates those sections 
* you would like to entirely exclude as the metadata file is parsed. Note, section 
* parameters must be included in both the section.txt and excl.txt file in order to 
* exclude a particular section.  Failure to properly indicate those sections you want
* parsed verses those sections you want excluded will put more information than you 
* originally required in the database.  All three files can be edited depending on what 
* fields and sections you would like to include or exclude in your RDBMS.  
*   I HAVE APPENDED EXAMPLE FIELD.TXT, SECTION.TXT AND EXCL.TXT FILES TO THE BOTTOM OF 
* THIS PROGRAM. THE DATABASE I USED THEM FOR DOES NOT IMPLEMENT THE ENTIRE CSDGM 
* STANDARD, AND THE FILES BELOW REFLECTS THIS FACT. USE THESE BY CUTTING AND PASTING 
* THEM INTO SEPARATE FILES OR CREATE YOUR OWN. JUST MAKE SURE THERE ARE NO BLANK
* SPACES TRAILING FROM THE END OF EACH LINE AS WELL AS BLANK LINES FROM THE END OF EACH
* FILE. 
*   The program works by scanning a metadata text file line by line and concatenating
* all of the lines into one large string.  The string is then repeatedly traversed
* character by character until all the section and field parameters indicated in the
* field.txt files are completed.  It does this systematically.  The results of the 
* parsing routine are outputted to both standard I/O and a text file indicated by 
* argv[1]_loadfile.txt file name. This 'loadfile' (name taken from ORACLE's loader 
* utility) will have a format defined as follows:
*                            Title                       
*                            Identification Information  
*                            1                           
*                            TIGER/Line Files, 1995
*                            **********
*                            .
*                            .
*                            . 
* Title                       <<<CSDGM field name, a database column
* Identification Information  <<<CSDGM section name, a database table
* 1                           <<<occurrence value in the loadfile, i.e. could 
*                                have more than one row in a particular table
* TIGER/Line Files, 1995      <<<the field value, a data value 
* **********                  <<<a delimiter, you would probably need another program to
*                                parse this file again prior to database entry
*
*
*   The successful operation of this parsing program depends on the 2 program delimiters
* you use in your CSDGM metadata file.  The necessary parameters to this program can be 
* changed in the \*Declarations*\ section under \*Change these to preference*\.  The 
* parameters are as follows: 
*                            int database=10000;
*                            char field_delimiter = ':';
*                            char value_delimiter = '|';
* The field_delimiter marks the end of a field or section name and the value_delimiter
* marks the end of a particular data value. So your entire metadata file should
* look something like this:
*                           Identification_Information:
*                             Citation
*                               Citation_Information
*                                 Originator:
*                                   U.S. Department of Commerce
*                                   Bureau of the Census
*                                   Geography Division|
* 
*   Notice, Citation and Citation_Information do not have to be delimited because, in my case,
* I didn't INCLUDE them in my field.txt and section.txt files below.  In practice, you should
* delimit only the stuff you want parsed, anything more and you might end up with more
* information than you originally needed parsed.  Choice of delimiters is also crucial.  A
* ':' delimiter may or may not create problems in parsing the document because strings
* such as "http://www..." and "8:30pm" use this particular delimiter.  However, usually mparser
* will not have a problem with these because the program looks at the entire field and/or
* section name including the delimiter, (i.e. Originator:).  Also, all the string comparisons
* that are done by mparser are CASE INSENSITIVE.
*   It is good practice within your CSDGM metadata file to use a '_' character between section
* and field names that span more than one word such as  Identification_Information:, 
* South_Bounding_Coordinate: or State_or_Province:.  The parameter, int database=10000;
* indicated above controls the size of the FIELD VALUE.  For example, the largest possible
* data_type for the RDBMS that I use (ORACLE 7) is called VARCHAR2 and has a maximum length of
* 2000 characters. VARCHAR2 will not accept anything more than 2000 characters.  So in my
* case it make sense to set the database variable above to 2000, i.e. int database=2000;.  
* This ensures that only 2000 characters are parsed for each data value in the metadata file.
*   Most CSDGM metadata files are in excess of 20000 characters, so it could take mparser 
* anywhere from 3 to 5 minutes to process the document.  As a general rule of thumb, THE
* MORE COMPLICATED YOUR DATABASE (IE. THE MORE ONE-TO-MANY AND MANY-TO-MANY RELATIONSHIPS)
* THE LESS ACCRURATE IS THIS PARSING PROGRAM.  One of the most common problems associated 
* with parsing a relatively complex metadata document will be in assigning the OCCURANCE 
* values (described above) to a one-to-many relationship.  Sometimes the columns won't 
* line up with their proper data value.  Here is what happen as indicated in this 
* example:
*
*          SDTS_Terms_Description:
*                 SDTS_Point_and_Vector_Object_Type: Node, network|
*                 Point_and_Vector_Object_Count: 570 to 56,000|
*                 SDTS_Point_and_Vector_Object_Type: Entity point|
*                 SDTS_Point_and_Vector_Object_Type: Complete chain|
*                 Point_and_Vector_Object_Count: 790 to 83,000|
*                 SDTS_Point_and_Vector_Object_Type: GT-polygon composed of chains|
*                 Point_and_Vector_Object_Count: 290 to 33,000|
*
* The above metadata data will look something like this after being parsed:
*
*          SDTS Point and Vector Object Type
*          Spatial Data Organization Information
*          1
*          Node, network
*          **********
*          SDTS Point and Vector Object Type
*          Spatial Data Organization Information
*          2
*          Entity point
*          **********
*          SDTS Point and Vector Object Type
*          Spatial Data Organization Information
*          3
*          Complete chain
*          **********
*          SDTS Point and Vector Object Type
*          Spatial Data Organization Information
*          4
*          GT-polygon composed of chains
*          **********
*          Point and Vector Object Count
*          Spatial Data Organization Information
*          1
*          570 to 56,000
*          **********
*          Point and Vector Object Count
*          Spatial Data Organization Information
*          2
*          790 to 83,000
*          **********
*          Point and Vector Object Count
*          Spatial Data Organization Information
*          3
*          290 to 33,000
*          **********
*
* Notice 'Entity point' at row 3, which doesn't have a value in the metadata file was
* was given one after parsing.  The values is '290 to 33,000' also on row 3.  Therefore,
* row number 3 of the table 'Spatial Data Organization Information' under column
* 'Point and Vector Object Count' will contain a data value of '290 to 33,000' when
* in fact the field should be empty.  At present this is an unavoidable problem. So it
* is essential that you go back and check the row/data values of all your one-to-many and
* many-to-many relationships with the resulting filename_loadfile.txt file. Or alternatively,
* you can enter 'DUMMY' values for fields within the metadata file so that the row and
* column values will line up after the document is parsed.   
*   The program is not easy to use but hopefully it will be able to cut the time
* it takes you to enter a CSDGM metadata file into a RDBMS.  Given these metadata files
* are usually several pages long, typing one in yourself would literally take hours and 
* asking someone to type it into a WWW forms or Java interface would be just as intolerable.
*
*   In addition to the MPARSER.C program file, the three text files below and this 
* mparser_readme.txt file, I have included the original TIGER metadata file, 
* the modified TIGER metadata file and the TIGER_LOADFILE file for your comparison.
* The zip file should include:
*                               MPARSER_README.TXT 
*                               MPARSER.C
*                               FIELD.TXT
*                               SECTION.TXT
*                               EXCL.TXT
*                               TIGER_ORIGINAL.TXT
*                               TIGER_MODIFIED.TXT
*                               TIGER_LOADFILE.TXT
*                               STANDARDI0.TXT 
*
*/




/*                                 EXCL.TXT file
-----------------------00----------------------------------00--------------------------
                       /\                                  /\

Process Step

                       \/                                  \/
-----------------------00----------------------------------00--------------------------
*/

/*                                SECTION.TXT file
-----------------------00----------------------------------00--------------------------
                       /\                                  /\

Identification Information
Lineage
Process Step
Spatial Data Organization Information
Distribution Information
Metadata Reference Information

                       \/                                  \/
-----------------------00----------------------------------00--------------------------
*/

/*                                FIELD.TXT file
-----------------------00----------------------------------00--------------------------
                       /\                                  /\

**Identification Information
Originator
Publication Date
Publication Time
Title
Edition
Geospatial Data Presentation Form
Series Name
Issue Identification
Publisher
Publication Place
Online Linkage
Other Citation Details
Source Scale Denominator
Type of Source Media
Source Currentness Reference
Source Citation Abbreviation
Source Contribution
--
Abstract
Purpose
Currentness Reference
Progress
Maintenance Update Frequency
Access Constraints
Use Constraints
Data Set Credit
Native Data Set Environment
West Bounding Coordinate
East Bounding Coordinate
North Bounding Coordinate
South Bounding Coordinate
Logical Consistency Report
Completeness Report
Percent Cloud Cover
--
Calendar Date
Time of Day
Beginning Date
Beginning Time
Ending Date
Ending Time
--
Theme Keyword
Place Keyword
Stratum Keyword
Temporal Keyword
--
Browse Graphic File Name
Browse Graphic File Description
Browse Graphic File Type
Browse Graphic File Online Linkage
--
**Lineage
Originator
Publication Date
Publication Time
Title
Edition
Geospatial Data Presentation Form
Series Name
Issue Identification
Publisher
Publication Place
Online Linkage
Other Citation Details
Source Scale Denominator
Type of Source Media
Source Currentness Reference
Source Citation Abbreviation
Source Contribution
--
Calendar Date
Time of Day
Beginning Date
Beginning Time
Ending Date
Ending Time
--
**Process Step
--
**Spatial Data Organization Information
Indirect Spatial Reference 
Direct Spatial Reference Method
SDTS Point and Vector Object Type
Point and Vector Object Count
VPF Topology Level
VPF Point and Vector Object Type
Raster Object Type
Row Count
Column Count
Vertical Count
Latitude Resolution
Longitude Resolution
Geographic Coordinate Units
Map Projection Name
Other Projection's Definition
Grid Coordinate System Name
UTM Zone Number
UPS Zone Identifier
SPCS Zone Identifier
ARC System Zone Identifier
Other Grid System Definition
Local Planar Description
Local Planar Georeference Information
Local Description
Local Georeference Information
Altitude Datum Name
Altitude Resolution
Altitude Distance Units
Altitude Encoding Method
Depth Datum Name
Depth Resolution
Depth Distance Units
Depth Encoding Method
Entity And Attribute Overview
Entity and Attribute Detail Citation
--
**Distribution Information
Resource Description
Distribution Liability
Custom Order Process
Technical Prerequisites
Format Name
Format Version Number
Format Version Date
Format Specification
Format Information Content
File Decompression Technique
Transfer Size
Network Resource Name
Network Address
Access Instructions
Host Operating System
Fees
Ordering Instructions
Turnaround Time
--
Contact Person
Contact Organization
Contact Position
Address Type
Address
City
State or Province
Postal Code
Country
Contact Voice Telephone
Contact Facsimile Telephone
Contact Electronic Mail Address
Hours of Service
Contact Instructions
--
**Metadata Reference Information
Metadata Title
Metadata Date
Metadata Review Date
Metadata Future Review Date
Metadata Standard Name
Metadata Standard Version
Metadata Time Convention
Metadata Access Constraints
Metadata Use Constraints
--
Contact Person
Contact Organization
Contact Position
Address Type
Address
City
State or Province
Postal Code
Country
Contact Voice Telephone
Contact Facsimile Telephone
Contact Electronic Mail Address
Hours of Service
Contact Instructions

                       \/                                  \/
-----------------------00----------------------------------00--------------------------
*/
