Simple Offline USENET Packet Format (SOUP) Version 1.2Simple Offline USENET Packet Format (SOUP) Version 1.2Copyright (c) 1992-1993 Rhys Weatherleyrhys@cs.uq.oz.auLast Update: 14 August 1993Formatted into HTML by Ben Combee on 29 June 1994 DISTRIBUTIONPermission to use, copy, and distribute this material for anypurpose and without fee is hereby granted, provided that the abovecopyright notice and this permission notice appear in all copies, andthat the name of Rhys Weatherley not be used in advertising or publicitypertaining to this material without specific, prior written permission. RHYS WEATHERLEY MAKES NO REPRESENTATIONS ABOUT THE ACCURACY ORSUITABILITY OF THIS MATERIAL FOR ANY PURPOSE. IT IS PROVIDED "AS IS",WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.NOTE: This document is NOT in the public domain. It is copyrighted.However, the free distribution of this document is unlimited.If you create a product which uses this packet format, it issuggested that you include an UNMODIFIED copy of this document to informyour users as to the packet format. All queries about this format, orrequests for the latest version should be directed to Rhys Weatherley atthe above e-mail address. INTRODUCTION>For many years, the FidoNet community has been using QWK and otherformats to enable users to download their mail and conferences to beread while off-line. This not only saves phone charges and preventstying up BBS lines for long periods of time; it also allows a user touse much more powerful tools on their own machine to process thedownloaded "packets" than what can be made available in an on-lineenvironment.To date however, very little work has been done in the USENET anddial-in Unix community to facilitate the same user operations. Someattempts have been made to use QWK, but due to QWK's limitations andunsuitability for the USENET message formats, such efforts have not beenvery successful.Within USENET, the tendency seems to be either "dial-in to someother machine and put up with it", or "set up your own USENET site". The former keeps the user at the mercy of whatever user interfaces theadmin of the other machine sees fit to install, and the latter requiresfar more computing knowledge than the average computer user is expectedto have. Both of these can serve to lock out large portions of thecomputer-literate public from experiencing USENET. The latter optioncan also give rise to security problems in the form of forged USENETmessages, which a more controlled dial-in system avoids.The purpose of this document is to define a new packet format whichis aware of the conventions used in the USENET community, forming amiddle ground between dial-in user interfaces and full USENETconnectivity. It is not limited to downloading USENET news however. The same format could be used to enable a Unix user to package up theirUnix mailbox and download it for later perusal. The format isextensible to other kinds of news or conference systems, so it isfeasible, although not yet defined, that QWK or FidoNet messages couldbe accomodated within the same packet as USENET messages. REVISION HISTORY>1.2Add COMMANDS and ERRORS files. Renamed to "Simple Offline USENETPacket Format". A few extra fields and type codes for the AREAS andLIST files. Message area summaries. 1.1Add description of the LIST file. Everything else is identical to1.0. 1.0Original version of the document. Previously, this document was known as the "Helldiver Packet Format"(HDPF). A variant of HDPF, called the "Simple Local News Packet format"(SLNP) was created by Philippe Goujard (ppg@oasis.icl.co.uk). Thisdocument combines the features of both previous formats and the name waschanged to make it less product-oriented. TERMINOLOGYPacketa set of files, collected into a compressed archive. Message packetthe primary kind of packet which contains messages for the user toread. Reply packeta special kind of packet which contains replies composed by theuser, usually in response to the messages in a message packet. Packet generatora program which generates packets to be downloaded and read, andwhich processes uploaded reply packets. Packet readera program which reads packets, usually by presenting the messagesin a packet to the user, and which generates reply packets. Packet processoreither a packet generator or a packet reader. Generating hostthe computer on which the packet generator executes. Reading hostthe computer on which the packet reader executes. Downloadthe transfer of a packet from the generating host to the readinghost. This transfer may take place in any fashion, although the mostcommon method is through the use of a file transfer protocol such asZmodem or Kermit. Uploadthe transfer of a packet from the reading host to the generatinghost. Packet streama logical link between the generating and reading hosts over whichdownloads and uploads of packets take place. Message areaa collection of messages which are related by a common topic orpurpose. Examples of message areas include USENET newsgroups, Unixmailboxes, and FidoNet conferences. Reply message areaa special kind of message area which contains replies beinguploaded to a generating host. Text filean ASCII file consisting of lines terminated by linefeed characters(LF, 10 decimal). Some operating systems terminate lines in a text fileby CRLF pairs: such files must be converted to LF-terminated lines fortransmission in a packet. ANATOMY OF A PACKETA packet is a group of files, collected into a compressed archive. The standard compression technique defined by this document is ZIP. Other techniques such as ARJ, ZOO, ARC, LZH, etc can also be used. Itis also possible for Unix's tar.Z format to be used to transmit packets. The minimum requirement is a method to collect a group of files into asingle packet, and a method to expand the packet back into the originalfiles. ZIP is specified to provide a common compression format forpacket processors. Each of the filenames in a packet should be stored inupper case on those systems where case matters (e.g. Unix). The following file specifications may appear in a packet: INFOOptional textual information.LISTList of message areas on the generating host.AREASIndex of the message areas within the packet.REPLIESIndex of the reply message areas from the reading host.*.MSGText of the messages in a particular message area.*.IDXIndex information for messages in a message area.COMMANDSExtra commands sent along with a packet.ERRORSErrors that occurred during the execution of commands. Other filenames may also appear in the packet, but are not definedby this specification, so they should be avoided by generating software,and ignored by receiving software.The INFO file is an optional text file which may contain any kind oftextual information from the generating system. Typically this filewould only be present if there is some kind of urgent message that mustbe sent to the receiving user. Use of this file to store the name ofthe generating host and other such static information is possible, butdiscouraged to save space and transmission time. If such information isrequired, then the COMMANDS file can be used to transfer it.The LIST file is an optional text file which contains a list of allmessage areas that are available on the generating host, together withthe format of the messages. It is specified further in the section "LIST FILE".The AREAS file is a text file which contains an index of the messageareas present within the packet, specifying the name of the messagearea, the filename the messages may be found in, and the message format. This is specified further in the next section.The REPLIES file is a text file which contains an index of themessage areas present within the packet that contain replies from theuser which should be mailed or posted on the generating host. In mostcases, a packet will contain either an AREAS file or a REPLIES file, butboth may be present. See the section "REPLIES FILE" below for more information.The *.MSG files contain the text of the messages from a singlemessage area. The actual format of this file depends on the type ofmessage area specified in the AREAS file. See the section "MESSAGE FILES" below for more information.The *.IDX files provide an index into the *.MSG files, usuallyspecifying where each message starts and the contents of some of thecommon message header fields. These files are intended for use byreading software on the recipient's system to quickly display anoverview of the messages present in a message area. See the section "INDEX FILES" below for more information.The COMMANDS file is a text file which contains commands to beexecuted on the reading or generating hosts to change the behaviour ofthe hosts at each end of a packet stream. The ERRORS file containstextual error messages to report to a human at the host the packet isdestined for. These two files are explained further in the section "SENDING COMMANDS BETWEEN SYSTEMS" below. AREAS FILEThe AREAS file is a text file containing zero or more lines, each ofwhich specifies a single message area, its encoding and the name of themessage/index file pair in which the messages appear. In particular,each line has the following form:prefix<TAB>area name<TAB>encoding[<TAB>description[<TAB>number]]where "prefix" specifies the name of the message/index file pair,"area name" is the name of the message area, "encoding" specifies theformats of the message and index files and the type of message area,"description" is a descriptive name for the message area, and "number"is the number of messages in the message file. The last two fields areoptional. Additional fields may be added in a future version of thisspecification. The message and index files corresponding to the message area havethe names "prefix.MSG" and "prefix.IDX" respectively. If "prefix"contains alphabetic characters, they must be upper case.The message area name may be any sequence of printable ASCIIcharacters (space through tilde). Under USENET, this is typically adotted name like "comp.lang.c". Other networks may include spaces orother unusual characters in the area names, so the receiving softwaremust be aware of this fact, and act accordingly. Also, receivingsoftware must deal gracefully with characters that have the high bitset, or names that contain control characters, since people in othercountries that speak a language other than English may wish to use theircountry's native encoding for the message area name. The only hard ruleis that the name may not contain TAB, CR or LF. Receiving softwareshould treat the name as an indivisible string to be displayed to theuser.The encoding field consists of two or three ASCII characters(usually alphabetic). The first specifies the format of the messagefile, the second specifies the format of the index file, and theoptional third specifies the kind of area (private or public). Thefollowing message file formats are currently defined (case issignificant):uUSENET news articlesmUnix mailbox articlesMMailbox articles in the MMDF formatbBinary 8-bit clean mail formatBBinary 8-bit clean news formatiIndex file onlyThe individual message file encodings are explained further in thenext section. The format 'i' indicates that no message file is present,and the index file should be used as a summary of the messages in themessage area. This is explained further in the section "MESSAGE AREA SUMMARIES". The following indexfile formats are currently defined (again, case is significant):nNo index filecC-news overview database formatCShorter C-news overview database formatiOffset/length pairs delineating the messagesThese types are explained further in the section "INDEX FILES" below.See the section "MINIMAL CONFORMANCE" for information on theminimal number of message and index formats that should be supported bypacket generators and packet readers.The following kind of message areas are currently defined (again,case is significant):mThe message area contains private mailnThe message area contains public messages, or newsuThe message area kind is unknown (the default)This third letter is optional. If it is not present or unknown, thekind of area depends on the message file type. Message types 'm', 'M',and 'b' default to kind 'm', and message types 'u', 'B' and 'i' defaultto kind 'n'. It is not recommended that the value 'u' for this thirdletter be used, although future versions of this specification may addadditional letters, necessitating 'u' to be placed in the third letterif the kind is unknown. If the message area kind can be solelydetermined from the message file type, it is recommended that the thirdletter be omitted to save space and transmission time.Further types may be defined in future versions of thisspecification. If the packet processor does not recognise a messagefile type, it should ignore the corresponding message and index files. If the packet processor does not recognise a index file type, it caneither ignore the message file, or attempt to break down the messagefile into separate messages by some other means. If the packetprocessor does not recognise a message area kind, the kind should betreated as unknown. The user should be warned if a message area hasbeen ignored.The optional message area description in the AREAS file consists ofany sequence of printable ASCII characters. This may be used to inserta "readable" name for the message area. It may not contain TAB, CR orLF.A message area may appear more than once in the AREAS file, eachtime with a different prefix, but this is discouraged. This could beused to split large message areas across more than one message file, butthis is more conveniently handled by generating a separate packetcontaining the area contination.The following examples demonstrate the capabilities of the AREASfile:0000000 Email mn0000001 comp.lang.c uc C Programming Language Discussions 1250000002 news.future Bc Future of USENET 38EMAIL /usr/spool/mail/fred unm Private e-mail for fredU000001 comp.bbs.misc MCnU000002 comp.bbs.waffle ui MESSAGE FILESThe format of the message file depends on the message file formatspecified in the AREAS file. This version of the specification definesthree formats, which are in common use in the USENET and Unix community,and two additional binary formats which permit messages to be storedwith no modification or assumptions about line lengths and byte values.For each of these formats, lines are terminated with LF characters. Any CR characters in the messages should be considered as datacharacters, or ignored on receipt. In particular, MS-DOS systems shouldstrip CR characters from text messages before writing them to a packet.A 'u' (USENET) message file is a text file consisting of one or moremessages prefixed with an rnews header. This header has the form "#!rnews n" where "n" is the number of bytes in the message that followsthe header, excluding the line-feed character which terminates theheader. If the number in the header is followed by white space andother characters, these other characters should be ignored, until theterminating LF character is encountered.A note about the rnews header: although a terser separator could beused, the rnews header has the following advantages: (a) the messagescan be extracted in the absense of index files, or where the index fileshave an unknown type, and (b) the message files can be imported into aUSENET system as standard rnews batches. Thus, if the user wishes toset up a real USENET site, or simply use dedicated USENET software toread packets, they can use their existing packet provider as aconvenient read-only newsfeed, with no extra burden placed on the systemadministrator of the generating system.A 'm' (Unix mailbox) message file is a text file consisting of oneor more messages. The first line of each message must start with thecharacter sequence "From ". Any remaining lines in the message whichstart with "From " should have the character '>' prepended. Thus the"From " lines delimit the message file into separate messages.A 'M' (MMDF mailbox) message file is a sequence of one or moremessages, separated by at least 4 Control-A characters. The messagefile may optionally start and end with a sequence of such characters. If a sequence of 4 or more Control-A characters occurs in a message, itshould be "adjusted" by the insertion of spaces to split the sequence. The use of Control-A characters within a message is discouraged.The 'm' and 'M' formats were chosen for mail because of their commonoccurrence in the Unix community. The generating system may elect toinstead convert a mailbox into the USENET format if it wishes, and setthe area kind to 'm' to inform the packet reader that the message areacontains private e-mail rather than news.The 'b' (binary mail) and 'B' (binary news) formats are identical. The contents of each message must conform to RFC-822/1036 and maycontain content information compatible with RFC-1341 (MIME). The onlydifference between the messages of these formats and the precedingformats is that no assumption is made about line lengths, and any of the256 values for a byte may be used in any position. Each message ispreceded by a 4-byte value which indicates the length of the message inbytes, stored in big-endian order (i.e. high byte first, low byte last). The difference between 'b' and 'B' is a semantic one: message files oftype 'b' are expected to contain mail messages, and message files oftype 'B' are expected to contain news messages. Thus, reader softwarecan make a distinction between the two if it desires.For most practical purposes, 'u', 'm' and 'M' should be sufficient. The binary 'b' and 'B' types should be used for articles that contain8-bit binary data. It is possible to use type 'u' for binary data aswell, but 'm' and 'M' cannot be because the message contents may bemodified. When MIME becomes more wide-spread, it is expected thatbinary messages containing programs, sound, pictures and video willbecome popular, necessitating these binary types.Note that MIME messages can be stored in 'u', 'm' and 'M' messagefiles, but any binary components should be encoded with quoted-printableor base64 (which is expected to be the most common usage of MIME in thenear future). It is not required that 'b' or 'B' be used for MIMEmessages: only those containing raw unencoded binary data (as indicatedby the Content-transfer-encoding header value "binary"). INDEX FILESThis specification defines four index file types, which providevarying degrees of support for packet readers.Type 'n' indicates that no index file is present, and it is up tothe packet reader to extract messages from the message file. It isuseful where the generating system is providing a USENET newsfeed usingpackets, and the receiving system is not interested in the indexinformation.A type 'c' index file is a text file (LF terminated lines), with oneline per message that occurs in the message file. The lines in theindex file should be in the same order as the corresponding messages. Each line has the following form:offset<TAB>subject<TAB>author<TAB>date<TAB>mesgid<TAB>refs<TAB>bytes<TAB>lines[<TAB>selector]The fields have the following semantics:offsetSeek position in the message file of where the correspondingmessage starts. The first seek position is 0. For the 'u' format, thisindicates the start of the line following the rnews header line. Forthe 'm' format, this indicates the start of the "From " line and for the'M' format, this indicates the start of the article after the Control-Asequence. For the 'b' and 'B' formats, this indicates the first byte ofthe message after the 4-byte message length. subjectThe "Subject:" line from the message. authorThe "From:" line from the message. dateThe "Date:" line from the message. mesgidThe "Message-Id:" line from the message. refsThe "References:" line from the message. bytesThe number of bytes in the message. If this field is zero, then itindicates that there is no corresponding message in the message file. This is used for summaries: see the section"MESSAGE AREA SUMMARIES" for more details.linesThe "Lines:" line from the message. Note that this field is prettyuseless these days on USENET, but is still popular. It is meant toindicate the number of lines in the body of the message. Generatingsoftware may elect to re-generate this value if it is not present in theoriginal message, but this is not required. selectorA string used for summaries to request that a message be sent in afuture packet. See the section "MESSAGE AREA SUMMARIES" for moredetails. This string will usually be a number, but other values such asMessage-ID's could be used. Packet readers should treat this string asan indivisible string to be sent in a "sendme" command in the COMMANDSfile. A zero-length string indicates that there is no selector string.If any of these fields contained TAB's, newlines or other whitespace in the original articles, they should be converted into singlespaces. All fields must be present, but some may be empty. The "bytes"field must not be empty, since it provides necessary information forpacket readers. Each field must conform to the Internet RFC documentsRFC-822 or RFC-1036.Optionally, a header line may end with one or more extraTAB-separated fields for other RFC-compliant header fields, togetherwith the header field names. e.g. "Supersedes: <1234@foovax>". These fields are not defined by this version of the specification, andare by arrangement between the generating host and the reading hostonly.This format is compatible with the news overview (NOV) databaseformat of C-news. The only difference being the substitution of anoffset for the article number used by C-news, and the addition of the"selector" field. The C-news format was designed to assist threadingnewsreaders, so this packet format should provide similar assistance tothreading packet readers.The 'C' format is similar to 'c', except that the "mesgid" and"refs" fields are dropped. These fields can commonly be quite long andare mainly of use to packet readers which perform Message-ID basedmessage threading. Packet readers which perform subject threading (i.e.sort on the subject line and then on the date and/or arrival order) donot require such information. The format of the header lines in thiscase is as follows:offset<TAB>subject<TAB>author<TAB>date<TAB>bytes<TAB>lines[<TAB>selector]Further TAB-separated fields may be added in future versions of thisspecification.The "author" field is slightly different to the 'c' format. Insteadof an RFC-822 format address, it is just the author's name, extractedfrom the "From:" line of the message. Most RFC-822 and RFC-1036 "From:"lines have one of the following forms:addressaddress (name)name <address>Names may sometimes be surrounded by double-quote characters, haveembedded "(...)" sequences, or contain "useless" information after acomma (",") or slash ("/"). The main requirement is that the generatingsoftware produce some kind of (more or less) meaningful string for thename of the author which can be displayed to the user by a packetreader. See RFC-822 and RFC-1036 for more information on the syntax ofthe "From:" line in messages.The 'i' index format is purely binary, using 8 bytes for eachmessage in the corresponding message file. The first 4 bytes specifythe offset into the message file of the message and the remaining 4bytes specify the number of bytes in the message. Each 4-byte quantityis stored in big-endian order (high byte first). This format issupplied to provide a trade-off between transmission time and easyextraction of messages from a message file.REPLIES FILEOne of the requirements for an off-line reading system is amechanism for a user to upload replies or new messages to a generatingsystem for mailing or posting. While it is possible to re-use the AREASfile for this purpose, keeping the download and upload sections separatewill help prevent messages being fed back into a network erroneously.The REPLIES file has a similar format to the AREAS file. Each linehas the following form:prefix<TAB>reply kind<TAB>encodingThe "prefix" and "encoding" fields are as before. The "reply kind"field indicates the mechanism to use when transmitting the messages inthe message file. The following values are currently defined:mailTransmit an RFC-822 compliant personal mail messagenewsTransmit an RFC-1036 compliant USENET news postingOn a Unix system, transmission of mail and news is usually performedwith the "sendmail" and "inews" programs respectively. Additional kindsmay be specified in a future version of this specification for othermessage formats. Note: it is discouraged that the kinds "mail" and"news" be used for anything other than RFC-compliant messages. Inparticular, FidoNet or QWK messages should use a different reply kind. Messages of the same reply kind can be placed in the same message file,or in separate message files.Further TAB-separated fields may be added to the lines in theREPLIES file in a future version of this specification.It is recommended that a message file type of 'b' or 'B' be used forsending replies to minimise the chance of message corruption. Therecommended index file types for replies are 'i' and 'n'. The indextypes 'c' and 'C' are discouraged because they do not provide usefulinformation for reply purposes.The format of the messages in the message files should follow therelevant RFC standards, with the following restriction: any "From:","Sender:", "Control:", "Approved:" or other similar "dangerous" headerlines should be ignored by the system transmitting the replies toprevent forgeries from occuring. In particular, the "From:" headershould be determined from the user's login name, or some other similarmeans, rather than from any data supplied in the user's message.In most cases, mail messages will contain "To:", "Subject:", "Cc:","Bcc:" and "Reply-To:" header lines, and news messages will contain"Newsgroups:", "Subject:", "Followup-To:", "Keywords:", "Summary:" and"Reply-To:" header lines. Other optional headers (especially MIMEcontent headers) may also be present.The automatic addition of a signature by the generating host whichreceives the reply packet is discouraged. Signatures should be added bythe user's packet reading software instead, if desired.A method for allowing replies from more than one person to be storedin the same packet was considered, but was rejected for securityreasons.The following example demonstrates the capabilities of the REPLIESfile:R001 mail bnR002 mail biR003 news BnR004 news Bi LIST FILEThe LIST file may be used to send a list of available message areasto the receiving system. Its format is similar to the AREAS file, withthe prefix field deleted. Each line has the following form:area name<TAB>encoding[<TAB>description]where "area name" is the name of the message area, "encoding" is a2, 3 or 4 letter message, index, area kind, and subscription code, and"description" is an optional message area description. Further optionalfields may be added in a future version of this specification.The message, index, and area kind codes are the same as for theAREAS file. The subscription code has one of the following values:yThe user is subscribed to the message areanThe user is not subscribed to the message areaIf this field is not present, it defaults to 'n'.Note that the message areas in the LIST file should only be thosethat can be subscribed to or unsubscribed from using a request in theCOMMANDS file. Private e-mail message areas will normally not appear inthe list.The following example demonstrates the capabilities of the LISTfile:alt.flame ucnncomp.bbs.misc ucnycomp.bbs.waffle ucnycomp.lang.c ucnn C Programming Language Discussionsnews.future ucny Future of USENET SENDING COMMANDS BETWEEN SYSTEMSThe COMMANDS and ERRORS files contain information for changing thebehaviour of each end of a packet stream, or for reporting errors in theexecution of commands or the generation of packets. Each is a text filewith LF-terminated lines.The ERRORS file is the simplest: it consists of error messages fromthe program which generated the packet to report on the progress ofpreviously executed commands. The format of these error messages is notdefined, but they should be human readable so that packet readers maypresent the errors to the user for perusal.The COMMANDS file consists of a sequence of commands, one per line,which modify the behaviour of the packet processor at the other end ofthe packet stream. Usually these commands are sent from the packetreader to the packet generator to change the subscribed message areas,send files, etc. The names of the commands are NOT case significant,but SHOULD be sent in lower case. Any commands that are not understoodby a program should be ignored.version n.m The command specifies the version of this specification that thepacket conforms to. For this document the version is "1.2". date dd mmm ccyy hh:mm:ss [zone]The date and time when the packet was created. To preventconfusion with different country's date formats, the date MUST alwaysappear as "dd mmm ccyy". For example, "25 Jul 1993". This date formatcan be converted to local conventions if desired. "hh:mm:ss" is a24-hour clock time value. The "zone" field is the number of hours andminutes that the timezone is offset from Greenwich Mean Time as "+HHMM"or "-HHMM". For example, US Eastern Standard Time (EST) is "-0500", andAustralian Eastern Standard Time is "+1000". If the zone is omitted, itdefaults to "local time", however the zone should only be omitted ifthere is no way to determine it.subscribe nameThis command requests the packet generating program to subscribe toa new message area. The area name may contain spaces, but not TABs.Additional fields may be added in a future version of this specificationafter a separating TAB. For now, ignore anything after a TAB. Thiscommand may generate an error message if the message area does notexist, or cannot be subscribed to.unsubscribe nameThis command requests the packet generating program to unsubscribefrom a message area. The same remarks about TABs and errors above alsoapply to this command.catchup [name]This command requests the packet generating program to catchup onthe nominated message area. That is, to mark all messages in the areaas read and continue batching from the next message received. If thearea name is not present, the packet generating program should catchupon all message areas.list [always|never]This command requests the packet generating program to send a fulllist of all available message areas as a LIST file in the next packet. If the argument "always" is present, then the LIST file should be sentin every packet. The argument value "never" reverses this. For minimalcompliance, "list always" should be treated as "list", and "list never"should be ignored.hostname stringThis command specifies the name of the host or BBS the packet wasgenerated on. It serves an informational role only. The string can beany sequence of printable ASCII characters.software stringThis command specifies the name and version of the software whichgenerated the packet. It servers an informational role only. Thestring can be any sequence of printable ASCII characters.sendme<TAB>area<TAB>selector[<TAB>selector[...]]This command requests that the packet generator send a number ofmessages from the nominated message area. The "selector" arguments aretaken from the "selector" fields in a 'c' or 'C' index file. Multiple"sendme" commands for the same message area may be present in a COMMANDSfile. The maximum length for this command is 500 characters. Note thatother commands use spaces to separate arguments, but this command usesTAB's.mail y mail nThis command changes whether or not private e-mail should be sentin generated packets.deletemail y deletemail nThis command changes whether or not the user's private mailboxshould be deleted after being batched into a packet.mailindex xSet the preferred mail index format, where 'x' is one of the values'n', 'c', 'C' or 'i'.newsindex xSet the preferred news index format, where 'x' is one of the values'n', 'c', 'C' or 'i'.get filename [putname]Request that a file on the generating side be placed into a packetand sent to the packet reader. "putname" specifies the "filename"argument for the corresponding "put" command. If "putname" is notspecified, the default is to use the base name of "filename". Ifdirectory paths are specified, the separator must be '/'. It should benoted that security could be breached through the use of this command,so programs which support this command should be very careful,preferably restricting requests to a particular directory tree.put pktname filenameThis command is usually sent in response to a "get" command,although it can be sent on its own. "pktname" specifies the name of thefile in the packet which contains the requested file's contents. The"filename" argument specifies destination file to write the contents to. Note that security could be breached with this command, so thedestination filename should be checked, or restricted to a particulardirectory tree. It is also recommended that the user be prompted forconfirmation before writing the file. If directory paths are specifiedin "filename", the separator must be '/'. It is recommended that theextension "FIL" be used for files in a packet which contain data sentwith this command. For example, "put 001.FIL abc.zip"supported cmd ...This command is usually sent from a packet generator to inform apacket reader as to which commands are supported by the generatingprogram. The argument is a space-separated list of command names. Forexample, "supported subscribe unsubscribe list", or "supported subscribeunsubscribe catchup list mail deletemail".It is recommended that at least "subscribe", "unsubscribe" and"list" (with no arguments) be supported. Packet generators arerecommended to add a "supported" line to all packets generated to informthe packet reader which commands can be used. In the absence of a"supported" line, only "subscribe", "unsubscribe" and "list" should beassumed to be supported.If more than one command is received for the same item (e.g."subscribe", "unsubscribe", "list", "mail", ...), then the last commandin the COMMANDS file takes precedence over any previous commands.The following example demonstrates a typical COMMANDS file sent froma packet generator:version 1.2date 25 Jul 1993 12:34:38 +1000hostname frobozz.domain.comsoftware Fubar 1.3supported subscribe unsubscribe catchup list sendme getput 001.FIL abc.zipput 002.FIL def.txtThe following example demonstrates a typical COMMANDS file sent froma packet reader:subscribe comp.lang.csubscribe comp.lang.miscunsubscribe alt.swedish.chef.bork.bork.borklistget xyzzy.zipget /usr/local/lib/fubar.txt frobozz.txtMESSAGE AREA SUMMARIESThe preceding sections have described a number of features forsupporting message area summaries. This section provides greaterdetail.Since some message areas, notably USENET newsgroups, can get quitelarge, the user may want to download a summary of a message area insteadof all of the messages, and then request that messages of interest besent at some later time for reading. Usually the summary will list themessages' subjects, authors, and other similar "header information". Optionally, the user may request that the first few lines of themessages also be sent so that the user may peruse the beginning of themessage and decide whether to retrieve the rest of the message.This activity is supported in the following fashion in this packetformat: summary information is sent in an index file of type 'c' or 'C',usually with no accompanying message file. Therefore, the message fileformat in the AREAS file will be set to 'i'. Each line in the indexfile has its "bytes" field set to 0 to indicate that the message is notpresent in the message file, and the "selector" field is set to somestring that can be used to request the message by way of a "sendme"command. Usually this selection string will be the message number ofthe message on the generating host, but other values such asMessage-ID's are allowable.If the first few lines of each message are also desired, the messagefile format is set to something other than 'i', and the "offset" and"bytes" fields in the index file may be used to extract the trimmed-downmessages for perusal. The "selector" field is once again used torequest that an entire message be sent at some later time, by way of a"sendme" command.It is possible to create a message area which contains both ordinarymessages and summary messages. If the "selector" field is not present,or is zero-length, then the message should be processed in the usualway, and if the "selector" field is present and not zero-length, then itis a summary message and the "bytes" field can be used to determine ifthe first few lines of a message exist in the message file or not. Thismixture can be useful in some situations where the user wishes todownload all messages less than a certain length, and download thelarger messages as summaries, so that the larger messages can beexplicitly requested only if the user really wants them.MINIMAL CONFORMANCEThis section describes the minimal amount of work that a packetprocessor must do to be compliant with this specification.Packet generators should be able to generate message areas for the'b' and 'u' message formats for private and public message areasrespectively, and process replies for the 'b' and 'B' message formats. For minimal conformance, index format 'n' must be supported, and ifmessage area summaries are required, one of index formats 'c' or 'C'should be supported. It is recommended that either 'c' or 'C' besupported in all packet generators, even when message summaries are notrequired. If message summaries are supported, the minimal requirementis to send an index file with the message file format set to 'i'. Packet generators should support the "subscribe", "unsubscribe" and"list" commands, and also the "sendme" command if message area summariesare required.Packet readers should be able to read all message and index formats,and generate replies for the 'b' and 'B' message formats. If messagearea summaries are not supported, all areas with message format 'i'should be flagged to the user as not understood. Packet readers shouldalso be able to display the INFO and LIST files if they are present in apacket and be able to prompt the user for "subscribe" and "unsubscribe"requests to be sent to the packet generator.FUTURE ENHANCEMENTSThe obvious enhancement that can be made is to support other messageformats, especially FidoNet formats. Currently the message area filecode 'q' is reserved for QWK-format messages. This will be defined in afuture version of this specification if demand warrants.Experimentation with other formats and auxillary files isencouraged, but please contact the author first to prevent double-upsfrom occurring. The author may be contacted via e-mail atrhys@cs.uq.oz.au. |
|