development

XML 문서에서 이스케이프하려면 어떤 문자가 필요합니까?

big-blog 2020. 9. 28. 09:29
반응형

XML 문서에서 이스케이프하려면 어떤 문자가 필요합니까?


XML 문서에서 어떤 문자를 이스케이프 처리해야합니까? 아니면 그러한 목록을 어디에서 찾을 수 있습니까?


적절한 클래스 또는 라이브러리를 사용하면 이스케이프가 수행됩니다. 많은 XML 문제는 문자열 연결로 인해 발생합니다.

XML 이스케이프 문자

다섯 가지만 있습니다.

"   "
'   '
<   &lt;
>   &gt;
&   &amp;

이스케이프 문자는 특수 문자가 사용되는 위치에 따라 다릅니다.

예제는 W3C Markup Validation Service 에서 확인할 수 있습니다 .

본문

안전한 방법은 텍스트의 다섯 개 문자를 탈출하는 것입니다, 그러나, 세 문자 ", '>텍스트가 아닌 이스케이프 할 필요가 :

<?xml version="1.0"?>
<valid>"'></valid>

속성

안전한 방법은 속성에서 5 개의 문자를 모두 이스케이프하는 것이지만, 속성에서 문자를 이스케이프 >할 필요는 없습니다.

<?xml version="1.0"?>
<valid attribute=">"/>

'따옴표가있는 경우 문자 속성에서 탈출 할 필요가 없다 ":

<?xml version="1.0"?>
<valid attribute="'"/>

마찬가지로 "따옴표가 '다음과 같으면 속성에서 이스케이프 할 필요가 없습니다 .

<?xml version="1.0"?>
<valid attribute='"'/>

코멘트

주석에서 5 개의 특수 문자 모두 이스케이프하면 안됩니다 .

<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>

CDATA

5 개의 특수 문자 모두 CDATA 섹션 에서 이스케이프 되지 않아야 합니다.

<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>

처리 지침

XML 처리 명령어에서 5 개의 특수 문자 모두 이스케이프하면 안됩니다 .

<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>

XML과 HTML

HTML에는 더 많은 문자를 포함하는 자체 이스케이프 코드 세트가 있습니다.


아마도 이것이 도움이 될 것입니다.

XML 및 HTML 문자 엔티티 참조 목록 :

In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference and a character entity reference. This article lists the character entity references that are valid in HTML and XML documents.

That article lists the following five predefined XML entities:

quot  "
amp   &
apos  '
lt    <
gt    >

According to the specifications of the World Wide Web Consortium (w3C), there are 5 characters that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. In all the other cases, these characters must be replaced either using the corresponding entity or the numeric reference according to the following table:

Original CharacterXML entity replacementXML numeric replacement
<                              &lt;                                    &#60;                                    
>                              &gt;                                   &#62;                                    
"                               &quot;                               &#34;                                    
&                              &amp;                               &#38;                                    
'                               &apos;                               &#39;                                    

Notice that the aforementioned entities can be used also in HTML, with the exception of &apos;, that was introduced with XHTML 1.0 and is not declared in HTML 4. For this reason, and to ensure retro-compatibility, the XHTML specification recommends the use of &#39; instead.


Escaping characters is different for tags and attributes.

For tags:

 < &lt;
 > &gt; (only for compatibility, read below)
 & &amp;

For attributes:

" &quot;
' &apos;

http://www.w3.org/TR/2008/REC-xml-20081126/#syntax

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively. The right angle bracket (>) may be represented using the string " &gt; ", and must, for compatibility, be escaped using either " &gt; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " &apos; ", and the double-quote character (") as " &quot; ".


New, simplified answer to an old, commonly asked question...

Simplified XML Escaping (prioritized, 100% complete)

  1. Always (90% important to remember)

    • Escape < as &lt; unless < is starting a <tag/>.
    • Escape & as &amp; unless & is starting an &entity;.
  2. Attribute Values (9% important to remember)

    • attr=" 'Single quotes' are ok within double quotes."
    • attr=' "Double quotes" are ok within single quotes.'
    • Escape " as &quot; and ' as &apos; otherwise.
  3. Comments, CDATA, and Processing Instructions (0.9% important to remember)

    • <!-- Within comments --> nothing has to be escaped but no -- strings are allowed.
    • <![CDATA[ Within CDATA ]]> nothing has to be escaped, but no ]]> strings are allowed.
    • <?PITarget Within PIs ?> nothing has to be escaped, but no ?> strings are allowed.
  4. Esoterica (0.1% important to remember)

    • Escape ]]> as ]]&gt; unless ]]> is ending a CDATA section.
      (This rule applies to character data in general – even outside a CDATA section.)

in addition to the commonly known five characters [<, >, &, ", '] I would also escape the vertical tab character (0x0B). It is valid UTF-8, but not valid XML 1.0, and even many libraries (including libxml2) miss it and silently output invalid XML.


Abridged from: http://en.wikipedia.org/wiki/XML#Escaping

There are five predefined entities:

&lt; represents "<"
&gt; represents ">"
&amp; represents "&"
&apos; represents '
&quot; represents "

"All permitted Unicode characters may be represented with a numeric character reference. " For example:

&#20013;

Most of the control characters and other unicode ranges are specifically excluded, meaning (I think) they can't occur either escaped or direct:

http://en.wikipedia.org/wiki/Valid_characters_in_XML


It depends on the context. For the content, it is < and &, and ]]>(though string of 3 instead of one char). For attribute values, it is < and & and " and '. For CDATA, it is ]]>.


Only < and & are required to be escaped if the are to be treated character data and not markup:

http://www.w3.org/TR/xml11/#syntax

참고URL : https://stackoverflow.com/questions/1091945/what-characters-do-i-need-to-escape-in-xml-documents

반응형