development

파이썬에서 객체의 크기를 어떻게 결정합니까?

big-blog 2020. 10. 3. 11:20

파이썬에서 객체의 크기를 어떻게 결정합니까?

C에서, 우리는의 크기를 찾을 수 있습니다 int, char내가 파이썬 등 문자열, 정수, 같은 객체의 크기를 얻는 방법을 알고 싶은 등.

값의 크기를 지정하는 크기 필드가 포함 된 XML 파일을 사용하고 있습니다. 이 XML을 구문 분석하고 코딩을해야합니다. 특정 필드의 값을 변경하고 싶을 때 해당 값의 크기 필드를 확인합니다. 여기서 입력하려는 새 값이 XML과 동일한 크기인지 비교하고 싶습니다. 새로운 가치의 크기를 확인해야합니다. 문자열의 경우 길이라고 말할 수 있습니다. 그러나 int, float 등의 경우 혼란 스럽습니다.

모듈에 정의 된 sys.getsizeof 함수를 사용하십시오 sys.

sys.getsizeof(object[, default]):

객체의 크기를 바이트 단위로 반환합니다. 개체는 모든 유형의 개체가 될 수 있습니다. 모든 기본 제공 개체는 올바른 결과를 반환하지만 구현에 따라 다르기 때문에 타사 확장에 대해 참일 필요는 없습니다.

default인수는 오브젝트 유형이 크기를 검색하는 방법을 제공하지 않는과를 야기하는 경우 반환되는 값을 정의 할 수 있습니다 TypeError.

getsizeof개체의 __sizeof__메서드를 호출하고 개체가 가비지 수집기에서 관리되는 경우 추가 가비지 수집기 오버 헤드를 추가합니다.

Python 3.0의 사용 예 :

>>> import sys
>>> x = 2
>>> sys.getsizeof(x)
24
>>> sys.getsizeof(sys.getsizeof)
32
>>> sys.getsizeof('this')
38
>>> sys.getsizeof('this also')
48

파이썬 2.6 미만이고 가지고 있지 않다면 이 광범위한 모듈을 대신 sys.getsizeof사용할 수 있습니다 . 그래도 사용하지 않았습니다.

파이썬에서 객체의 크기를 어떻게 결정합니까?

"Just use sys.getsizeof"라는 대답은 완전한 대답이 아닙니다.

이 대답 은 내장 객체에 대해 직접 작동하지만 이러한 객체에 포함될 수있는 내용, 특히 사용자 정의 객체, 튜플, 목록, 사전 및 집합과 같은 유형에 대해 설명하지 않습니다. 숫자, 문자열 및 기타 개체는 물론 서로 인스턴스를 포함 할 수 있습니다.

더 완전한 답변

Anaconda 배포판의 64 비트 Python 3.6을 sys.getsizeof와 함께 사용하여 다음 객체의 최소 크기를 결정했으며 세트 및 딕셔너리가 공간을 미리 할당하므로 빈 객체가 설정된 양이 될 때까지 다시 커지지 않습니다. 언어 구현에 따라 다름) :

파이썬 3 :

Empty
Bytes  type        scaling notes
28     int         +4 bytes about every 30 powers of 2
37     bytes       +1 byte per additional byte
49     str         +1-4 per additional character (depending on max width)
48     tuple       +8 per additional item
64     list        +8 for each additional
224    set         5th increases to 736; 21nd, 2272; 85th, 8416; 341, 32992
240    dict        6th increases to 368; 22nd, 1184; 43rd, 2280; 86th, 4704; 171st, 9320
136    func def    does not include default args and other attrs
1056   class def   no slots 
56     class inst  has a __dict__ attr, same scaling as dict above
888    class def   with slots
16     __slots__   seems to store in mutable tuple-like structure
                   first slot grows to 48, and so on.

이것을 어떻게 해석합니까? 10 개의 항목이 들어있는 세트가 있다고 가정 해 보겠습니다. 각 항목이 각각 100 바이트 인 경우 전체 데이터 구조는 얼마나 큽니까? 세트는 736 바이트로 한 번 커졌기 때문에 자체적으로 736입니다. 그런 다음 항목의 크기를 추가하면 총 1736 바이트가됩니다.

함수 및 클래스 정의에 대한 몇 가지주의 사항 :

각 클래스 정의에는 __dict__클래스 속성에 대한 프록시 (48 바이트) 구조가 있습니다. 각 슬롯에는 property클래스 정의에 설명자 (예 :)가 있습니다.

슬롯 형 인스턴스는 첫 번째 요소에서 48 바이트로 시작하여 추가 할 때마다 8 씩 증가합니다. 빈 슬롯 객체 만 16 바이트를 가지며 데이터가없는 인스턴스는 거의 의미가 없습니다.

또한 각 함수 정의에는 코드 객체, 아마도 독 스트링 및 기타 가능한 속성, 심지어 __dict__.

Python 2.7 분석, guppy.hpy및 확인 sys.getsizeof:

Bytes  type        empty + scaling notes
24     int         NA
28     long        NA
37     str         + 1 byte per additional character
52     unicode     + 4 bytes per additional character
56     tuple       + 8 bytes per additional item
72     list        + 32 for first, 8 for each additional
232    set         sixth item increases to 744; 22nd, 2280; 86th, 8424
280    dict        sixth item increases to 1048; 22nd, 3352; 86th, 12568 *
120    func def    does not include default args and other attrs
64     class inst  has a __dict__ attr, same scaling as dict above
16     __slots__   class with slots has no dict, seems to store in 
                   mutable tuple-like structure.
904    class def   has a proxy __dict__ structure for class attrs
104    old class   makes sense, less stuff, has real dict though.

사전 ( 하지만 세트가 아님)은 Python 3.6에서 더 간결하게 표현 됩니다.

참조 할 추가 항목 당 8 바이트는 64 비트 시스템에서 많은 의미가 있다고 생각합니다. 이 8 바이트는 포함 된 항목이있는 메모리의 위치를 가리 킵니다. 올바르게 기억하면 파이썬 2에서는 4 바이트의 유니 코드 너비가 고정되어 있지만 Python 3에서는 str이 문자의 최대 너비와 동일한 너비의 유니 코드가됩니다.

(그리고 슬롯에 대한 자세한 내용은이 답변을 참조하십시오 )

더 완벽한 기능

우리는리스트, 튜플, 세트, 딕셔너리, obj.__dict__'s, 그리고 obj.__slots__우리가 아직 생각하지 못한 다른 것들에 있는 요소들을 검색하는 함수를 원합니다 .

gc.get_referents이 검색은 C 레벨에서 작동하기 때문에 (매우 빠르기 때문에) 의존하고 싶습니다 . 단점은 get_referents가 중복 멤버를 반환 할 수 있으므로 이중 계산하지 않도록해야합니다.

클래스, 모듈 및 함수는 단일 항목이며 메모리에 한 번 존재합니다. 우리는 그들에 대해 우리가 할 수있는 일이 많지 않기 때문에 그들의 크기에 그다지 관심이 없습니다. 그들은 프로그램의 일부입니다. 따라서 참조되는 경우 계수를 피할 것입니다.

우리는 크기 수에 전체 프로그램을 포함하지 않도록 유형의 블랙리스트를 사용할 것입니다.

import sys
from types import ModuleType, FunctionType
from gc import get_referents

# Custom objects know their class.
# Function objects seem to know way too much, including modules.
# Exclude modules as well.
BLACKLIST = type, ModuleType, FunctionType


def getsize(obj):
    """sum size of object & members."""
    if isinstance(obj, BLACKLIST):
        raise TypeError('getsize() does not take argument of type: '+ str(type(obj)))
    seen_ids = set()
    size = 0
    objects = [obj]
    while objects:
        need_referents = []
        for obj in objects:
            if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:
                seen_ids.add(id(obj))
                size += sys.getsizeof(obj)
                need_referents.append(obj)
        objects = get_referents(*need_referents)
    return size

이것을 다음 화이트리스트 함수와 대조하기 위해 대부분의 개체는 가비지 수집 (특정 개체의 메모리 비용이 얼마나 비싼 지 알고 싶을 때 우리가 찾고있는 대략적인 것) 목적으로 자신을 순회하는 방법을 알고 있습니다.이 기능은 다음에서 사용됩니다. gc.get_referents.) 그러나이 조치는 우리가주의하지 않으면 의도 한 것보다 훨씬 더 범위가 넓어 질 것입니다.

예를 들어, 함수는 생성 된 모듈에 대해 많은 것을 알고 있습니다.

또 다른 대조 점은 사전의 키인 문자열이 일반적으로 인턴되어 중복되지 않는다는 것입니다. 확인 id(key)하면 다음 섹션에서 수행하는 중복 계산을 피할 수도 있습니다. 블랙리스트 솔루션은 문자열 인 키 계산을 모두 건너 뜁니다.

화이트리스트 유형, 재귀 방문자 (이전 구현)

이러한 유형의 대부분을 직접 다루기 위해 gc 모듈에 의존하는 대신 대부분의 내장, collections 모듈의 유형 및 사용자 정의 유형 (슬롯 및 기타)을 포함하여 대부분의 Python 객체의 크기를 추정하기 위해이 재귀 함수를 작성했습니다. .

이러한 종류의 함수는 메모리 사용량으로 계산할 유형에 대해 훨씬 더 세밀한 제어를 제공하지만 유형을 제외 할 위험이 있습니다.

import sys
from numbers import Number
from collections import Set, Mapping, deque

try: # Python 2
    zero_depth_bases = (basestring, Number, xrange, bytearray)
    iteritems = 'iteritems'
except NameError: # Python 3
    zero_depth_bases = (str, bytes, Number, range, bytearray)
    iteritems = 'items'

def getsize(obj_0):
    """Recursively iterate to sum size of object & members."""
    _seen_ids = set()
    def inner(obj):
        obj_id = id(obj)
        if obj_id in _seen_ids:
            return 0
        _seen_ids.add(obj_id)
        size = sys.getsizeof(obj)
        if isinstance(obj, zero_depth_bases):
            pass # bypass remaining control flow and return
        elif isinstance(obj, (tuple, list, Set, deque)):
            size += sum(inner(i) for i in obj)
        elif isinstance(obj, Mapping) or hasattr(obj, iteritems):
            size += sum(inner(k) + inner(v) for k, v in getattr(obj, iteritems)())
        # Check for custom object instances - may subclass above too
        if hasattr(obj, '__dict__'):
            size += inner(vars(obj))
        if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
            size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
        return size
    return inner(obj_0)

그리고 나는 그것을 약간 우연히 테스트했습니다 (나는 그것을 단위 테스트해야합니다) :

>>> getsize(['a', tuple('bcd'), Foo()])
344
>>> getsize(Foo())
16
>>> getsize(tuple('bcd'))
194
>>> getsize(['a', tuple('bcd'), Foo(), {'foo': 'bar', 'baz': 'bar'}])
752
>>> getsize({'foo': 'bar', 'baz': 'bar'})
400
>>> getsize({})
280
>>> getsize({'foo':'bar'})
360
>>> getsize('foo')
40
>>> class Bar():
...     def baz():
...         pass
>>> getsize(Bar())
352
>>> getsize(Bar().__dict__)
280
>>> sys.getsizeof(Bar())
72
>>> getsize(Bar.__dict__)
872
>>> sys.getsizeof(Bar.__dict__)
280

이 구현은 클래스 정의와 함수 정의를 세분화합니다. 그 이유는 모든 속성을 추적하지 않기 때문입니다.하지만 프로세스를 위해 메모리에 한 번만 존재해야하기 때문에 크기는 실제로 그다지 중요하지 않습니다.

For numpy arrays, getsizeof doesn't work - for me it always returns 40 for some reason:

from pylab import *
from sys import getsizeof
A = rand(10)
B = rand(10000)

Then (in ipython):

In [64]: getsizeof(A)
Out[64]: 40

In [65]: getsizeof(B)
Out[65]: 40

Happily, though:

In [66]: A.nbytes
Out[66]: 80

In [67]: B.nbytes
Out[67]: 80000

The Pympler package's asizeof module can do this.

Use as follows:

from pympler import asizeof
asizeof.asizeof(my_object)

Unlike sys.getsizeof, it works for your self-created objects. It even works with numpy.

>>> asizeof.asizeof(tuple('bcd'))
200
>>> asizeof.asizeof({'foo': 'bar', 'baz': 'bar'})
400
>>> asizeof.asizeof({})
280
>>> asizeof.asizeof({'foo':'bar'})
360
>>> asizeof.asizeof('foo')
40
>>> asizeof.asizeof(Bar())
352
>>> asizeof.asizeof(Bar().__dict__)
280
>>> A = rand(10)
>>> B = rand(10000)
>>> asizeof.asizeof(A)
176
>>> asizeof.asizeof(B)
80096

As mentioned,

The (byte)code size of objects like classes, functions, methods, modules, etc. can be included by setting option code=True.

And if you need other view on live data, Pympler's

module muppy is used for on-line monitoring of a Python application and module Class Tracker provides off-line analysis of the lifetime of selected Python objects.

This can be more complicated than it looks depending on how you want to count things. For instance, if you have a list of ints, do you want the size of the list containing the references to the ints? (ie. list only, not what is contained in it), or do you want to include the actual data pointed to, in which case you need to deal with duplicate references, and how to prevent double-counting when two objects contain references to the same object.

You may want to take a look at one of the python memory profilers, such as pysizer to see if they meet your needs.

Having run into this problem many times myself, I wrote up a small function (inspired by @aaron-hall's answer) & tests that does what I would have expected sys.getsizeof to do:

https://github.com/bosswissam/pysize

If you're interested in the backstory, here it is

EDIT: Attaching the code below for easy reference. To see the most up-to-date code, please check the github link.

    import sys

    def get_size(obj, seen=None):
        """Recursively finds size of objects"""
        size = sys.getsizeof(obj)
        if seen is None:
            seen = set()
        obj_id = id(obj)
        if obj_id in seen:
            return 0
        # Important mark as seen *before* entering recursion to gracefully handle
        # self-referential objects
        seen.add(obj_id)
        if isinstance(obj, dict):
            size += sum([get_size(v, seen) for v in obj.values()])
            size += sum([get_size(k, seen) for k in obj.keys()])
        elif hasattr(obj, '__dict__'):
            size += get_size(obj.__dict__, seen)
        elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
            size += sum([get_size(i, seen) for i in obj])
        return size

Python 3.8 (Q1 2019) will change some of the results of sys.getsizeof, as announced here by Raymond Hettinger:

Python containers are 8 bytes smaller on 64-bit builds.

tuple ()  48 -> 40       
list  []  64 ->56
set()    224 -> 216
dict  {} 240 -> 232

This comes after issue 33597 and Inada Naoki (methane)'s work around Compact PyGC_Head, and PR 7043

This idea reduces PyGC_Head size to two words.

Currently, PyGC_Head takes three words; gc_prev, gc_next, and gc_refcnt.

gc_refcnt is used when collecting, for trial deletion.

gc_prev is used for tracking and untracking.

So if we can avoid tracking/untracking while trial deletion, gc_prev and gc_refcnt can share same memory space.

See commit d5c875b:

Removed one Py_ssize_t member from PyGC_Head.
All GC tracked objects (e.g. tuple, list, dict) size is reduced 4 or 8 bytes.

Here is a quick script I wrote based on the previous answers to list sizes of all variables

for i in dir():
    print (i, sys.getsizeof(eval(i)) )

If you don't need the exact size of the object but roughly to know how big it is, one quick (and dirty) way is to let the program run, sleep for an extended period of time, and check the memory usage (ex: Mac's activity monitor) by this particular python process. This would be effective when you are trying to find the size of one single large object in a python process. For example, I recently wanted to check the memory usage of a new data structure and compare it with that of Python's set data structure. First I wrote the elements (words from a large public domain book) to a set, then checked the size of the process, and then did the same thing with the other data structure. I found out the Python process with a set is taking twice as much memory as the new data structure. Again, you wouldn't be able to exactly say the memory used by the process is equal to the size of the object. As the size of the object gets large, this becomes close as the memory consumed by the rest of the process becomes negligible compared to the size of the object you are trying to monitor.

First: an answer.

import sys

try: print sys.getsizeof(object)
except AttributeError:
    print "sys.getsizeof exists in Python ≥2.6"

Discussion:
In Python, you cannot ever access "direct" memory addresses. Why, then, would you need or want to know how many such addresses are occupied by a given object?? It's a question that's entirely inappropriate at that level of abstraction. When you're painting your house, you don't ask what frequencies of light are absorbed or reflected by each of the constituent atoms within the paint, you just ask what color it is -- the details of the physical characteristics that create that color are beside the point. Similarly, the number of bytes of memory that a given Python object occupies is beside the point.

So, why are you trying to use Python to write C code? :)

참고URL : https://stackoverflow.com/questions/449560/how-do-i-determine-the-size-of-an-object-in-python

'development' 카테고리의 다른 글

Mipmaps vs. 드로어 블 폴더 (0)	2020.10.03
OAuth 2는 OAuth 1과 어떻게 다릅니 까? (0)	2020.10.03
배열을 인라인으로 선언하는 방법이 있습니까? (0)	2020.10.03
What characters are allowed in an email address? (0)	2020.10.03
자바 스크립트에서 배열을 복제하는 가장 빠른 방법-슬라이스 대 'for'루프 (0)	2020.10.03

현재글파이썬에서 객체의 크기를 어떻게 결정합니까?

big-blog

파이썬에서 객체의 크기를 어떻게 결정합니까?

파이썬에서 객체의 크기를 어떻게 결정합니까?

파이썬에서 객체의 크기를 어떻게 결정합니까?

더 완전한 답변

더 완벽한 기능

화이트리스트 유형, 재귀 방문자 (이전 구현)

'development' 카테고리의 다른 글

'development'의 다른글

티스토리툴바

파이썬에서 객체의 크기를 어떻게 결정합니까?

파이썬에서 객체의 크기를 어떻게 결정합니까?

파이썬에서 객체의 크기를 어떻게 결정합니까?

더 완전한 답변

더 완벽한 기능

화이트리스트 유형, 재귀 방문자 (이전 구현)

'development' 카테고리의 다른 글

'development'의 다른글

관련글

티스토리툴바