PUBLIC unsigned char* INTL_FormatNNTPXPATInNonRFC1522Format ( int16 winCharSetID, unsigned char* searchString )
Return a (hacky) XPAT pattern for NNTP server for searching pre RFC 1522 message header
Return a (hacky) XPAT pattern for NNTP server for searching pre RFC 1522 message header. This is a hacky function which try to work around another HACK!!! The problem it tries to solve is to search on NNTP, internet newsgroup server. Unfortunately, the NNTP server does not have non-ASCII text searching command. The only functionality in the NNTP protocol we could use is the XPAT extension of NNTP (see ftp://ds.internic.net/internet-drafts/draft-ietf-nntpext-imp-01.txt or ftp://ds.internic.net/internet-drafts/draft-barber-nntp-imp-07.txt ). XPAT use wildmat regular expression (see http://oac.hsc.uth.tmc.edu/oac_sysadmin/services/INN/man/wildmat.3.html for details) to provide string matching. Unfortunately, wildmat is not designed to support non-ASCII text. It work for English header but not for header in other language like Japanese, French, or German. The problem is the XPAT/wildmat cannot deal with (1) ISO-2022-xx encoding nor (2) RFC 1522 header. To work around the limitation in the protocol, we put together this function to support the first limitation as possible as we can. This function take one search string, and return a XPAT pattern which could then be used to send to NNTP XPAT as search argument. However, there are some limitation here. (1) It may cause NNTP return more message than it should, the reason is the XPAT won't respect to the multibyte character boundary when it try to match the string. To improve this in the future, the client double check the header after it receive message from the server and narrow it down to the correct case. (2) The pattern it generated won't match RFC 1522 header so it could return less message than it should. This is because there are more than one XPAT could match the sting in the case of RFC 1522 header. To improve this in the future, the client side should send several possible XPAT patterns (with the patterned return by this function), collect the result, and then double checking in the client side. Of course, improve the NNTP protocol itself is the real solution. But the improvement stated above is also needed for the server support the current NNTP protocol. This function (1) convert the text from the encoding the argument specified into the encoding used in the corresponding internet newsgroup, (2) strip out leading or trailing ISO-2022 escape sequence if present, (3) escape the wildmat special characters (any characters which is not from 0-9, a-z, A-Z), and return.
- the pattern should be send to NNTP XPAT command for searching non-ASCII header. The caller need to free this by calling XP_FREE when the result is no longer needed.
- winCharSetID - Specifies the encoding of searchString.
searchString - Specifies the string to be search through NNTP XPAT command.
generated by doc++