development

컴파일러가 do-while 루프에 대해 다른 유형의 루프에 비해 더 나은 코드를 생성합니까?

big-blog 2020. 9. 7. 21:12
반응형

컴파일러가 do-while 루프에 대해 다른 유형의 루프에 비해 더 나은 코드를 생성합니까?


zlib 압축 라이브러리 (Chromium 프로젝트에서 사용됨)에는 C의 do-while 루프가 대부분의 컴파일러에서 "더 나은"코드를 생성 함을 암시 하는 주석이 있습니다 . 다음은 표시되는 코드 스 니펫입니다.

do {
} while (*(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
         *(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
         *(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
         *(ushf*)(scan+=2) == *(ushf*)(match+=2) &&
         scan < strend);
/* The funny "do {}" generates better code on most compilers */

https://code.google.com/p/chromium/codesearch#chromium/src/third_party/zlib/deflate.c&l=1225

대부분의 (또는 모든) 컴파일러가 더 나은 (예 : 더 효율적인) 코드를 생성 할 것이라는 증거가 있습니까?

업데이트 : 원래 저자 중 한 명인 Mark Adler 는 의견에 약간의 맥락 을 제공했습니다.


가장 먼저:

do-while루프는 동일하지 않다 while-loop 또는 for-loop.

  • while그리고 for루프는 전혀 루프 본문을 실행되지 않을 수 있습니다.
  • do-while루프는 항상 적어도 한 번 루프 본문을 실행 - 그것은 초기 조건 확인을 건너 뜁니다.

이것이 논리적 차이입니다. 즉, 모든 사람이 이것을 엄격하게 고수하는 것은 아닙니다. 항상 적어도 한 번은 반복되는 것이 보장되는 경우에도 for while또는 for루프를 사용하는 것은 매우 일반적입니다 . (특히 foreach 루프가 있는 언어에서 )

따라서 사과와 오렌지를 비교하지 않기 위해 루프가 항상 한 번 이상 실행된다고 가정하겠습니다. 또한 for루프는 본질적 while으로 루프 카운터에 대해 약간의 구문 설탕이있는 루프 이므로 다시 언급하지 않겠습니다 .

그래서 나는 질문에 답할 것입니다.

while루프가 한 번 이상 루프되는 것이 보장되는 경우 do-while대신 루프 를 사용하여 성능이 향상됩니까 ?


A do-while는 첫 번째 조건 검사를 건너 뜁니다. 따라서 평가할 분기와 조건이 하나 더 적습니다.

조건을 확인하는 데 비용이 많이 들고 한 번 이상 반복 할 수 있다는 것을 알고 있다면 do-while루프가 더 빠를 수 있습니다.

그리고 이것은 기껏해야 마이크로 최적화로 간주되지만 컴파일러가 항상 할 수있는 것은 아닙니다. 특히 컴파일러가 루프가 항상 적어도 한 번 입력된다는 것을 증명할 수없는 경우입니다.


즉, while-loop :

while (condition){
    body
}

다음과 같이 효과적으로 동일합니다.

if (condition){
    do{
        body
    }while (condition);
}

항상 적어도 한 번은 반복 할 것이라는 것을 알고 있다면 해당 if 문은 관련이 없습니다.


마찬가지로 어셈블리 수준에서 다음과 같이 서로 다른 루프가 다음과 같이 컴파일됩니다.

do-while 루프 :

start:
    body
    test
    conditional jump to start

while-loop:

    test
    conditional jump to end
start:
    body
    test
    conditional jump to start
end:

Note that the condition has been duplicated. An alternate approach is:

    unconditional jump to end
start:
    body
end:
    test
    conditional jump to start

... which trades away the duplicate code for an additional jump.

Either way, it's still worse than a normal do-while loop.

That said, compilers can do what they want. And if they can prove that the loop always enters once, then it has done the work for you.


But things are bit weird for the particular example in the question because it has an empty loop body. Since there is no body, there's no logical difference between while and do-while.

FWIW, I tested this in Visual Studio 2012:

  • With the empty body, it does actually generate the same code for while and do-while. So that part is likely a remnant of the old days when compilers weren't as great.

  • But with a non-empty body, VS2012 manages to avoid duplication of the condition code, but still generates an extra conditional jump.

So it's ironic that while the example in the question highlights why a do-while loop could be faster in the general case, the example itself doesn't seem to give any benefit on a modern compiler.

Considering how old the comment was, we can only guess at why it would matter. It's very possible that the compilers at the time weren't capable of recognizing that the body was empty. (Or if they did, they didn't use the information.)


Is there any evidence that most (or any) compilers would generate better (e.g. more efficient) code?

Not much, unless you look at the actual generated assembly of an actual, specific compiler on a specific platform with some specific optimization settings.

This was probably worth worrying about decades ago (when ZLib has been written), but certainly not nowadays, unless you found, by real profiling, that this removes a bottleneck from your code.


In a nutshell (tl;dr):

I'm interpreting the comment in OPs' code a little differently, I think the "better code" they claim to have observed was due to moving the actual work into the loop "condition". I completely agree however that it's very compiler specific and that the comparison they made, while being able to produce a slightly different code, is mostly pointless and probably obsolete, as I show below.


Details:

It's hard to say what the original author meant by his comment about this do {} while producing better code, but i'd like to speculate in another direction than what was raised here - we believe that the difference between do {} while and while {} loops is pretty slim (one less branch as Mystical said), but there's something even "funnier" in this code and that's putting all the work inside this crazy condition, and keeping the internal part empty (do {}).

I've tried the following code on gcc 4.8.1 (-O3), and it gives an interesting difference -

#include "stdio.h" 
int main (){
    char buf[10];
    char *str = "hello";
    char *src = str, *dst = buf;

    char res;
    do {                            // loop 1
        res = (*dst++ = *src++);
    } while (res);
    printf ("%s\n", buf);

    src = str;
    dst = buf;
    do {                            // loop 2
    } while (*dst++ = *src++);
    printf ("%s\n", buf);

    return 0; 
}

After compiling -

00000000004003f0 <main>:
  ... 
; loop 1  
  400400:       48 89 ce                mov    %rcx,%rsi
  400403:       48 83 c0 01             add    $0x1,%rax
  400407:       0f b6 50 ff             movzbl 0xffffffffffffffff(%rax),%edx
  40040b:       48 8d 4e 01             lea    0x1(%rsi),%rcx
  40040f:       84 d2                   test   %dl,%dl
  400411:       88 16                   mov    %dl,(%rsi)
  400413:       75 eb                   jne    400400 <main+0x10>
  ...
;loop 2
  400430:       48 83 c0 01             add    $0x1,%rax
  400434:       0f b6 48 ff             movzbl 0xffffffffffffffff(%rax),%ecx
  400438:       48 83 c2 01             add    $0x1,%rdx
  40043c:       84 c9                   test   %cl,%cl
  40043e:       88 4a ff                mov    %cl,0xffffffffffffffff(%rdx)
  400441:       75 ed                   jne    400430 <main+0x40>
  ...

So the first loop does 7 instructions while the second does 6, even though they're supposed to do the same work. Now, I can't really tell if there's some compiler smartness behind this, probably not and it's just coincidental but I haven't checked how it interacts with other compiler options this project might be using.


On clang 3.3 (-O3) on the other hand, both loops generate this 5 instructions code :

  400520:       8a 88 a0 06 40 00       mov    0x4006a0(%rax),%cl
  400526:       88 4c 04 10             mov    %cl,0x10(%rsp,%rax,1)
  40052a:       48 ff c0                inc    %rax
  40052d:       48 83 f8 05             cmp    $0x5,%rax
  400531:       75 ed                   jne    400520 <main+0x20>

Which just goes to show that compilers are quite different, and advancing at a far faster rate than some programmer may have anticipated several years ago. It also means that this comment is pretty meaningless and probably there because no one had ever checked if it still makes sense.


Bottom line - if you want to optimize to the best possible code (and you know how it should look like), do it directly in assembly and cut the "middle-man" (compiler) from the equation, but take into account that newer compilers and newer HW might make this optimization obsolete. In most cases it's far better to just let the compiler do that level of work for you, and focus on optimizing the big stuff.

Another point that should be made - instruction count (assuming this is what the original OPs' code was after), is by no means a good measurement for code efficiency. Not all instructions were created equal, and some of them (simple reg-to-reg moves for e.g.) are really cheap as they get optimized by the CPU. Other optimization might actually hurt CPU internal optimizations, so eventually only proper benchmarking counts.


A while loop is often compiled as a do-while loop with an initial branch to the condition, i.e.

    bra $1    ; unconditional branch to the condition
$2:
    ; loop body
$1:
    tst <condition> ; the condition
    brt $2    ; branch if condition true

whereas the compilation of a do-while loop is the same without the initial branch. You can see from that that it's inherently less efficient by the cost of the initial branch, which is however only paid once. [Compare to the naive way of implementing while, which requires both a conditional branch and an unconditional branch per iteration.]

Having said that, they aren't really comparable alternatives. It is painful to transform a while loop into a do-while loop and vice versa. They do different things. And in this case the several method calls would totally dominate whatever the compiler did with while as against do-while.


The remark is not about the choice of the control statement (do vs. while), it is about the loop unrolling !!!

As you can see, this is a string comparison function (string elements probably being 2 bytes long), which could have been written with a single comparison rather than four in a shortcut-and expression.

This latter implementation is for sure faster, as it does a single check of the end-of-string condition after every four element comparisons, whereas standard coding would involve one check per comparison. Said differently, 5 tests per 4 element vs. 8 tests per 4 element.

Anyway, it will work only if the string length is a multiple of four or has a sentinel element (so that the two strings are guaranteed to differ past the strend border). Pretty risky !


This discussion of while vs. do efficiency is completely pointless in this case, as there is no body.

while (Condition)
{
}

and

do
{
}
while (Condition);

are absolutely equivalent.

참고URL : https://stackoverflow.com/questions/20172402/do-compilers-produce-better-code-for-do-while-loops-versus-other-types-of-loops

반응형