자바로 DOM 파싱 정규화-어떻게 작동합니까?

development

자바로 DOM 파싱 정규화-어떻게 작동합니까?

big-blog 2020. 4. 16. 08:22

자바로 DOM 파싱 정규화-어떻게 작동합니까?

이 튜토리얼 에서 DOM 파서 코드의 아래 줄을 보았습니다 .

doc.getDocumentElement().normalize();

왜이 정규화를 수행합니까?
나는 문서를 읽었 지만 단어를 이해할 수 없었다.

모든 Text 노드를이 노드 아래에있는 하위 트리의 전체 깊이에 둡니다.

그렇다면 누군가이 나무가 어떻게 보이는지 보여줄 수 있습니까 (사진과 함께)?

왜 정규화가 필요한지 설명해 줄 수 있습니까?
정규화하지 않으면 어떻게됩니까?

나머지 문장은 다음과 같습니다.

여기서 구조 (예를 들어, 요소, 주석, 처리 명령, CDATA 섹션 및 엔티티 참조) 만 텍스트 노드를 분리합니다. 즉, 인접한 텍스트 노드 나 빈 텍스트 노드가 없습니다.

이것은 기본적으로 다음 XML 요소를 의미합니다.

<foo>hello 
wor
ld</foo>

비정규 화 된 노드에서 다음과 같이 나타낼 수 있습니다.

Element foo
    Text node: ""
    Text node: "Hello "
    Text node: "wor"
    Text node: "ld"

정규화되면 노드는 다음과 같습니다

Element foo
    Text node: "Hello world"

속성 <foo bar="Hello world"/>, 주석 등도 마찬가지입니다 .

더 많은 기술적 인 사용자를위한 @JBNizet의 답변에 대한 확장으로 여기에 org.w3c.dom.Node인터페이스 구현이 com.sun.org.apache.xerces.internal.dom.ParentNode어떻게 보이는지, 실제로 어떻게 작동하는지에 대한 아이디어를 제공합니다.

public void normalize() {
    // No need to normalize if already normalized.
    if (isNormalized()) {
        return;
    }
    if (needsSyncChildren()) {
        synchronizeChildren();
    }
    ChildNode kid;
    for (kid = firstChild; kid != null; kid = kid.nextSibling) {
         kid.normalize();
    }
    isNormalized(true);
}

모든 노드를 재귀 적으로 순회하고 호출합니다. kid.normalize()
이 메커니즘은org.apache.xerces.dom.ElementImpl

public void normalize() {
     // No need to normalize if already normalized.
     if (isNormalized()) {
         return;
     }
     if (needsSyncChildren()) {
         synchronizeChildren();
     }
     ChildNode kid, next;
     for (kid = firstChild; kid != null; kid = next) {
         next = kid.nextSibling;

         // If kid is a text node, we need to check for one of two
         // conditions:
         //   1) There is an adjacent text node
         //   2) There is no adjacent text node, but kid is
         //      an empty text node.
         if ( kid.getNodeType() == Node.TEXT_NODE )
         {
             // If an adjacent text node, merge it with kid
             if ( next!=null && next.getNodeType() == Node.TEXT_NODE )
             {
                 ((Text)kid).appendData(next.getNodeValue());
                 removeChild( next );
                 next = kid; // Don't advance; there might be another.
             }
             else
             {
                 // If kid is empty, remove it
                 if ( kid.getNodeValue() == null || kid.getNodeValue().length() == 0 ) {
                     removeChild( kid );
                 }
             }
         }

         // Otherwise it might be an Element, which is handled recursively
         else if (kid.getNodeType() == Node.ELEMENT_NODE) {
             kid.normalize();
         }
     }

     // We must also normalize all of the attributes
     if ( attributes!=null )
     {
         for( int i=0; i<attributes.getLength(); ++i )
         {
             Node attr = attributes.item(i);
             attr.normalize();
         }
     }

    // changed() will have occurred when the removeChild() was done,
    // so does not have to be reissued.

     isNormalized(true);
 }

시간이 절약되기를 바랍니다.

간단히 말해서 정규화는 중복을 줄이는 것입니다.
중복의 예 :
a) 루트 / 문서 태그 외부의 공백 ( ... <document> </ document> ... )
b) 시작 태그 (< ... >) 및 끝 태그 (</ ... >)
c) 속성과 값 사이의 공백 (예 : 키 이름 과 = " 사이의 공백 )
d) 불필요한 네임 스페이스 선언
e) 속성 및 태그 텍스트의 줄 바꿈 / 공백
f) 주석 등 ...

참고 URL : https://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work

'development' 카테고리의 다른 글

PHP PDO 문은 테이블 또는 열 이름을 매개 변수로 사용할 수 있습니까? (0)	2020.04.16
정수 나누기 : 이중을 어떻게 생성합니까? (0)	2020.04.16
다른 선택적 매개 변수를 생략하면서 선택적 매개 변수를 전달하는 방법은 무엇입니까? (0)	2020.04.16
Windows 응용 프로그램 아이콘에는 어떤 아이콘 크기가 포함되어야합니까? (0)	2020.04.16
Django가 다운로드 가능한 파일을 제공하게 함 (0)	2020.04.16

현재글자바로 DOM 파싱 정규화-어떻게 작동합니까?

big-blog

자바로 DOM 파싱 정규화-어떻게 작동합니까?

자바로 DOM 파싱 정규화-어떻게 작동합니까?

'development' 카테고리의 다른 글

'development'의 다른글

티스토리툴바

자바로 DOM 파싱 정규화-어떻게 작동합니까?

자바로 DOM 파싱 정규화-어떻게 작동합니까?

'development' 카테고리의 다른 글

'development'의 다른글

관련글

티스토리툴바