Notice: Comments you submit will be routed for moderation. If you have an account, please log in first.

Ticket #16 (closed Bug: Fixed)

Opened 8 years ago

Last modified 42 years ago

Collection file parsing fatal error

Reported by: pjlehtim Owned by: bflorat
Priority: 8, high Milestone:
Component: Core Version: R 0.1
Keywords: Cc:

Description

After first run I shhut down the software and restarted
it and encountered following problem:
XML character:  .
        at org.jajuk.base.Collection.fatalError(Unknown
Source)
        at
org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1106)
        at
org.apache.crimson.parser.InputEntity.getc(InputEntity.java:360)
        at
org.apache.crimson.parser.Parser2.getc(Parser2.java:3203)
        at
org.apache.crimson.parser.Parser2.parseLiteral(Parser2.java:879)
        at
org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1583)
        at
org.apache.crimson.parser.Parser2.content(Parser2.java:1963)
        at
org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1691)
        at
org.apache.crimson.parser.Parser2.content(Parser2.java:1963)
        at
org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1691)
        at
org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:667)
        at
org.apache.crimson.parser.Parser2.parse(Parser2.java:337)
        at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
        at
javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
        at
javax.xml.parsers.SAXParser.parse(SAXParser.java:223)
        at org.jajuk.base.Collection.load(Unknown Source)
        at org.jajuk.Main.main(Unknown Source)
2004/03/23 00:46:02 [ERROR] (005) Collection file
parsing error
org.jajuk.util.error.JajukException: nullnull
        at org.jajuk.base.Collection.load(Unknown Source)
        at org.jajuk.Main.main(Unknown Source)
2004/03/23 00:46:02 [DEBUG] Exit with code: 1

System: MDK Linux 10.0 CE, Sun jre 1.4.

Attachments

Change History

Changed 8 years ago by bflorat

Logged In: YES 
user_id=363565

Fixed in 0.1.1

Changed 8 years ago by bflorat

Logged In: YES 
user_id=363565

Always some problems with some special characters like
, these special charcters are invalide using JRE 1.5 (
It was OK with JRE 1.4.2 but 1.5 is right, I have to fix that). 

in XML specs:

 parsed entity contains text, a sequence of characters,
which may represent markup or character data. A character is
an atomic unit of text as specified by ISO/IEC 10646
[ISO/IEC 10646]. Legal characters are tab, carriage return,
line feed, and the legal graphic characters of Unicode and
ISO/IEC 10646. The use of "compatibility characters", as
defined in section 6.8 of [Unicode], is discouraged.
Character Range
[2] 	Char 	::= 	#x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF] 	/* 	any Unicode
character, excluding the surrogate blocks, FFFE, and FFFF. */

The mechanism for encoding character code points into bit
patterns may vary from entity to entity. All XML processors
must accept the UTF-8 and UTF-16 encodings of 10646; the
mechanisms for signaling which of the two is in use, or for
bringing other encodings into play, are discussed later, in
"4.3.3 Character Encoding in Entities". 


I am looking for an API to do the job...

Changed 8 years ago by bflorat

Logged In: YES 
user_id=363565

Should be definitively fixed in 0.1.2 ( I added more
checkups, see below ): 
But it is pretty hard to test this strnage cases. Please
tell me if the pbm yet occurs.

public static String formatXML(String s){
		String sOut = s.replaceAll("&","&"); //$NON-NLS-1$
//$NON-NLS-2$
		sOut = sOut.replaceAll("\'","'"); //$NON-NLS-1$
//$NON-NLS-2$
		sOut = sOut.replaceAll("\"","""); //$NON-NLS-1$
//$NON-NLS-2$
		sOut = sOut.replaceAll("<","<"); //$NON-NLS-1$
//$NON-NLS-2$
		sOut = sOut.replaceAll(">",">"); //$NON-NLS-1$
//$NON-NLS-2$
		StringBuffer sbOut = new StringBuffer(sOut);
		/* Transform String to XML-valid characters. XML 1.0 specs ; 
		 * Character Range
		[2]     Char    ::=     #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]  
		any Unicode character, excluding the surrogate blocks,
FFFE, and FFFF. */
		for (int i=0;i<sbOut.length();i++){
			char c = sbOut.charAt(i);
			if (c!='&' && c!='.' && c!='_' && c!='-' && c!=';' &&
c!='#' && c!=' ' && !Character.isLetterOrDigit(c)){
//tranform all special charcters but some very current ones
like "." to gain space
				sbOut.deleteCharAt(i); //remove this char, it will be
replaced by the XML format &#x?; or by a space if it is invalid
				if ( (c =='\u0009'  ||   (c>='\u0020' && c<='\uD7FF') ||
(c>='\uE000' && c<='\uFFFD')) && (c!='\uFFFE' && c!='\uFFFF')){
					//some unicode described in XML specs like xA, xD and
x10000 and over are not tested because java can't handle
them, so we can't get these chars in the incoming string
					sbOut.insert(i,"&#x");
					sbOut.insert(i+3,Integer.toHexString((int)c)+";");
				}
				else{
					sbOut.insert(i,' '); //replace invalid character by a space
				}
			}
		}
		return sbOut.toString();
	}

Changed 8 years ago by bflorat

  • status changed from assigned to closed

Add/Change #16 (Collection file parsing fatal error)

Author



Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.