"x <y <z"가 "x <y 및 y <z"보다 빠릅니까?
에서 이 페이지에 , 우리는 것을 알고있다 :
연쇄 비교는
and
연산자를 사용하는 것보다 빠릅니다 .x < y < z
대신 쓰십시오x < y and y < z
.
그러나 다음 코드 스 니펫을 테스트하는 다른 결과가 있습니다.
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y < z"
1000000 loops, best of 3: 0.322 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.8" "x < y and y < z"
1000000 loops, best of 3: 0.22 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y < z"
1000000 loops, best of 3: 0.279 usec per loop
$ python -m timeit "x = 1.2" "y = 1.3" "z = 1.1" "x < y and y < z"
1000000 loops, best of 3: 0.215 usec per loop
그 것 x < y and y < z
보다 더 빨리이다 x < y < z
. 왜?
(같은이 사이트의 일부 게시물을 검색 한 후 이 중 하나 ) 나는 그 "한 번만 평가"에 대한 핵심을 알고 x < y < z
하지만 난 여전히 혼란스러워하고 있습니다. 추가 연구를 위해 다음 두 가지 기능을 사용하여 분해했습니다 dis.dis
.
import dis
def chained_compare():
x = 1.2
y = 1.3
z = 1.1
x < y < z
def and_compare():
x = 1.2
y = 1.3
z = 1.1
x < y and y < z
dis.dis(chained_compare)
dis.dis(and_compare)
그리고 출력은 다음과 같습니다
## chained_compare ##
4 0 LOAD_CONST 1 (1.2)
3 STORE_FAST 0 (x)
5 6 LOAD_CONST 2 (1.3)
9 STORE_FAST 1 (y)
6 12 LOAD_CONST 3 (1.1)
15 STORE_FAST 2 (z)
7 18 LOAD_FAST 0 (x)
21 LOAD_FAST 1 (y)
24 DUP_TOP
25 ROT_THREE
26 COMPARE_OP 0 (<)
29 JUMP_IF_FALSE_OR_POP 41
32 LOAD_FAST 2 (z)
35 COMPARE_OP 0 (<)
38 JUMP_FORWARD 2 (to 43)
>> 41 ROT_TWO
42 POP_TOP
>> 43 POP_TOP
44 LOAD_CONST 0 (None)
47 RETURN_VALUE
## and_compare ##
10 0 LOAD_CONST 1 (1.2)
3 STORE_FAST 0 (x)
11 6 LOAD_CONST 2 (1.3)
9 STORE_FAST 1 (y)
12 12 LOAD_CONST 3 (1.1)
15 STORE_FAST 2 (z)
13 18 LOAD_FAST 0 (x)
21 LOAD_FAST 1 (y)
24 COMPARE_OP 0 (<)
27 JUMP_IF_FALSE_OR_POP 39
30 LOAD_FAST 1 (y)
33 LOAD_FAST 2 (z)
36 COMPARE_OP 0 (<)
>> 39 POP_TOP
40 LOAD_CONST 0 (None)
x < y and y < z
보다 명령이 덜 유사 해 보입니다 x < y < z
. x < y and y < z
보다 빨리 고려해야 x < y < z
합니까?
Tested with Python 2.7.6 on an Intel(R) Xeon(R) CPU E5640 @ 2.67GHz.
The difference is that in x < y < z
y
is only evaluated once. This does not make a large difference if y is a variable, but it does when it is a function call, which takes some time to compute.
from time import sleep
def y():
sleep(.2)
return 1.3
%timeit 1.2 < y() < 1.8
10 loops, best of 3: 203 ms per loop
%timeit 1.2 < y() and y() < 1.8
1 loops, best of 3: 405 ms per loop
Optimal bytecode for both of the functions you defined would be
0 LOAD_CONST 0 (None)
3 RETURN_VALUE
because the result of the comparison is not used. Let's make the situation more interesting by returning the result of the comparison. Let's also have the result not be knowable at compile time.
def interesting_compare(y):
x = 1.1
z = 1.3
return x < y < z # or: x < y and y < z
Again, the two versions of the comparison are semantically identical, so the optimal bytecode is the same for both constructs. As best I can work it out, it would look like this. I've annotated each line with the stack contents before and after each opcode, in Forth notation (top of stack at right, --
divides before and after, trailing ?
indicates something that might or might not be there). Note that RETURN_VALUE
discards everything that happens to be left on the stack underneath the value returned.
0 LOAD_FAST 0 (y) ; -- y
3 DUP_TOP ; y -- y y
4 LOAD_CONST 0 (1.1) ; y y -- y y 1.1
7 COMPARE_OP 4 (>) ; y y 1.1 -- y pred
10 JUMP_IF_FALSE_OR_POP 19 ; y pred -- y
13 LOAD_CONST 1 (1.3) ; y -- y 1.3
16 COMPARE_OP 0 (<) ; y 1.3 -- pred
>> 19 RETURN_VALUE ; y? pred --
If an implementation of the language, CPython, PyPy, whatever, does not generate this bytecode (or its own equivalent sequence of operations) for both variations, that demonstrates the poor quality of that bytecode compiler. Getting from the bytecode sequences you posted to the above is a solved problem (I think all you need for this case is constant folding, dead code elimination, and better modeling of the contents of the stack; common subexpression elimination would also be cheap and valuable), and there's really no excuse for not doing it in a modern language implementation.
Now, it happens that all current implementations of the language have poor-quality bytecode compilers. But you should ignore that while coding! Pretend the bytecode compiler is good, and write the most readable code. It will probably be plenty fast enough anyway. If it isn't, look for algorithmic improvements first, and give Cython a try second -- that will provide far more improvement for the same effort than any expression-level tweaks you might apply.
Since the difference in the output seem to be due to lack of optimization I think you should ignore that difference for most cases - it could be that the difference will go away. The difference is because y
only should be evaluated once and that is solved by duplicating it on the stack which requires an extra POP_TOP
- the solution to use LOAD_FAST
might be possible though.
The important difference though is that in x<y and y<z
the second y
should be evaluated twice if x<y
evaluates to true, this has implications if the evaluation of y
takes considerable time or have side effects.
In most scenarios you should use x<y<z
despite the fact it's somewhat slower.
First of all, your comparison is pretty much meaningless because the two different constructs were not introduced to provide a performance improvement, so you shouldn't decide whether to use one in place of the other based on that.
The x < y < z
construct:
- Is clearer and more direct in its meaning.
- Its semantics is what you'd expect from the "mathematical meaning" of the comparison: evalute
x
,y
andz
once and check if the whole condition holds. Usingand
changes the semantics by evaluatingy
multiple times, which can change the result.
So choose one in place of the other depending on the semantics you want and, if they are equivalent, whether one is more readable than the other.
This said: more disassembled code does does not imply slower code. However executing more bytecode operations means that each operation is simpler and yet it requires an iteration of the main loop. This means that if the operations you are performing are extremely fast (e.g. local variable lookup as you are doing there), then the overhead of executing more bytecode operations can matter.
But note that this result does not hold in the more generic situation, only to the "worst case" that you happen to profile. As others have noted, if you change y
to something that takes even a bit more time you'll see that the results change, because the chained notation evaluates it only once.
Summarizing:
- Consider semantics before performance.
- Take into account readability.
- Don't trust micro benchmarks. Always profile with different kind of parameters to see how a function/expression timing behave in relation to said parameters and consider how you plan to use it.
참고URL : https://stackoverflow.com/questions/34014906/is-x-y-z-faster-than-x-y-and-y-z
'development' 카테고리의 다른 글
경고 : [옵션] 부트 스트랩 클래스 경로가 -source 1.5와 함께 설정되지 않았습니다. (0) | 2020.07.03 |
---|---|
이미 평가중인 약속 : 재귀적인 기본 인수 참조 또는 이전 문제? (0) | 2020.07.03 |
웹 페이지에서 사용 된 정의 된 글꼴 중 하나를 감지하는 방법은 무엇입니까? (0) | 2020.07.03 |
파이썬 버퍼 타입은 무엇입니까? (0) | 2020.07.03 |
Visual Studio 2012를 사용하여 MSI 또는 설치 프로젝트 만들기 (0) | 2020.07.03 |