I am extracting the text from an Excel cell doing some manipulation and then writing it out as a line to an XML file.
When I try and validate it using http://xmlvalidation.com/ it tells me there is a problem with a character.... The validator tells me it is "Unicode: 0x1"
I have tried pasting the character into the script and just using StringReplace - but that did not work.
I can manually fix the file - but I would prefer to handle this in the script.
Anybody have any ideas??
Unicode character - how to replace? XML validation error
Moderators: Dorian (MJT support), JRL
- andrewbarbour
- Newbie
- Posts: 14
- Joined: Thu Aug 26, 2010 7:09 pm
- Marcus Tettmar
- Site Admin
- Posts: 7393
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Re: Unicode character - how to replace? XML validation error
How are you writing it out? Can you share your code?
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
- andrewbarbour
- Newbie
- Posts: 14
- Joined: Thu Aug 26, 2010 7:09 pm
Re: Unicode character - how to replace? XML validation error
Thanks Marcus....
here is an extract of the relevant lines.
A few caveats:
- it is probably way to complicated - as I am a business guy - not a coder
- I know I should use subroutines - I tend to write my first proof of concept in a linear format - then start fresh and write it tight
The following are some of the character related issues I am having problem extracting from the Excel file.
RefID 404 - Authors
Non printing character between n and T of MartinT
accented character a in Fernandez
accented n in Penas
G. Bodes-Pardo, D. Pecos-MartinT. Gallego-Izquierdo, J. Salom-Moreno, C. Fernández-de-Las-Peñas, R. Ortega-Santiago
RefID 404 - Abstract
The +/- symbol does not get picked up properly
SD age, 39 ± 13 years),
RefID 505 - Authors
accented u and s
F. Büyükyılmaz, T. Aştı
Ref1077 - Adverse Events
Character between "P" and "0.73"
test p 0.73).
RefID 3571 - Authors
accented g
N. Yağci, F. Uygur and N. Bek
here is an extract of the relevant lines.
A few caveats:
- it is probably way to complicated - as I am a business guy - not a coder
- I know I should use subroutines - I tend to write my first proof of concept in a linear format - then start fresh and write it tight
The following are some of the character related issues I am having problem extracting from the Excel file.
RefID 404 - Authors
Non printing character between n and T of MartinT
accented character a in Fernandez
accented n in Penas
G. Bodes-Pardo, D. Pecos-MartinT. Gallego-Izquierdo, J. Salom-Moreno, C. Fernández-de-Las-Peñas, R. Ortega-Santiago
RefID 404 - Abstract
The +/- symbol does not get picked up properly
SD age, 39 ± 13 years),
RefID 505 - Authors
accented u and s
F. Büyükyılmaz, T. Aştı
Ref1077 - Adverse Events
Character between "P" and "0.73"
test p 0.73).
RefID 3571 - Authors
accented g
N. Yağci, F. Uygur and N. Bek
Code: Select all
Let>R=R-2
Let>ColToExtract=R
//Select extract file
Input>varDataExtract,Browse to the extracted data file,NA
Input>strURL,Which column contains a URL? If none - put "0",0
If>varDataExtract=NA,END
XLOpen>%varDataExtract%,0,xlBook2
//Go through the data extract file and insert each row into the XML
Let>R=2
Let>ID=0
Label>DataExtractLoop
Let>ID=ID+1
Let>C=1
XLGetCell>xlBook2,Sheet1,R,C,strExt
Message>Processing line %R% of file.
If>strExt=##END##,ENDDataExtractLoop
LTrim>strExt,strExt
RTrim>strExt,strExt
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Reference>
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Head%C%>%strExt%</Head%C%>
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<id>%ID%</id>
Label>WithinItemLoop
Let>C=C+1
If>C>ColToExtract,NextRow
XLGetCell>xlBook2,Sheet1,R,C,strExt
//Get rid of leading and trailing spaces
LTrim>strExt,strExt
RTrim>strExt,strExt
//Clean up some characters that cause issues in XML
StringReplace>strExt,",",strExt
StringReplace>strExt,<,<,strExt
StringReplace>strExt,>,>,strExt
StringReplace>strExt,',',strExt
//Tried the following replace for non-printing character - did not work
//StringReplace>strExt,,,strExt
// Check to see if currently at the URL column
If>C=StrURL,WriteURL,WriteNonURL
Label>WriteNonURL
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Head%C%>%strExt%</Head%C%>
//ADDED THE FOLLOWING FOR URLS
goto>WithinItemLoop
Label>WriteURL
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,<Head%C%><![CDATA[<a href="%strExt%">Link</a>]]></Head%C%>
goto>WithinItemLoop
Label>NextRow
Let>R=R+1
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,</Reference>
goto>DataExtractLoop
Label>ENDDataExtractLoop
// Close off the XML Report
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,</Intro>
WriteLn>%varPath%\%varShortName%.txt,nWLNRes,</Refreport>
//rename file as XML
copyfile>%varPath%\%varShortName%.txt,%varPath%\%varShortName%.xml
Label>END
- Dorian (MJT support)
- Automation Wizard
- Posts: 1378
- Joined: Sun Nov 03, 2002 3:19 am
- Contact:
Re: Unicode character - how to replace? XML validation error
Hi Andrew,
I just replied to your ticket, but for the benefit of others, have attached a test script which is successfully using StringReplace to replace those Unicode characters, including the non printing ones. I simply copied and pasted the characters from your Excel file into the script editor.
Kind regards,
Dorian Ellis
MJT Support.
I just replied to your ticket, but for the benefit of others, have attached a test script which is successfully using StringReplace to replace those Unicode characters, including the non printing ones. I simply copied and pasted the characters from your Excel file into the script editor.
Kind regards,
Dorian Ellis
MJT Support.
Yes, we have a Custom Scripting Service. Message me or go here
- andrewbarbour
- Newbie
- Posts: 14
- Joined: Thu Aug 26, 2010 7:09 pm
Re: Unicode character - how to replace? XML validation error
Works like a charm! Thanks.