development

HttpWebRequest 및 기본 GZip 압축

big-blog 2020. 12. 7. 20:10
반응형

HttpWebRequest 및 기본 GZip 압축


Gzip 압축으로 페이지를 요청할 때 다음과 같은 오류가 많이 발생합니다.

System.IO.InvalidDataException : GZip 바닥 글의 CRC가 압축 해제 된 데이터에서 계산 된 CRC와 일치하지 않습니다.

기본 GZipStream을 사용하여 압축을 풀고이 문제를 해결하려고합니다. 이를 염두에두고이 문제를 적절하게 처리 할이 또는 다른 GZip 라이브러리 (무료?)를 해결하기위한 해결 방법이 있습니까?

webResponse ContentEncoding이 GZIP인지 확인하고 있습니다.

업데이트 5/11 단순화 된 스 니핏

//Caller
public void SOSampleGet(string url) 
{
    // Initialize the WebRequest.
    webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = WebRequestMethods.Http.Get;
    webRequest.KeepAlive = true;
    webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
    webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
    webRequest.Referer = WebUtil.GetDomain(url);

    HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();    

    using (Stream stream = GetStreamForResponse(webResponse, READTIMEOUT_CONST))
    {
        //use stream
    }
}

//Method
private static Stream GetStreamForResponse(HttpWebResponse webResponse, int readTimeOut)
{
    Stream stream;
    switch (webResponse.ContentEncoding.ToUpperInvariant())
    {
        case "GZIP":
            stream = new GZipStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;
        case "DEFLATE":
            stream = new DeflateStream(webResponse.GetResponseStream(), CompressionMode.Decompress);
            break;

        default:
            stream = webResponse.GetResponseStream();
            stream.ReadTimeout = readTimeOut;
            break;
        }    
    return stream;
}

.net 2 이후로 사용 가능한 webrequest AutomaticDecompression 속성은 어떻습니까? 간단히 추가 :

webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;

또한 승인 인코딩 헤더에 gzip, deflate를 추가합니다.

http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.automaticdecompression.aspx 참조


For .NET Core things are a little more involved. A GZipStream is needed as there isn't a property (as of writing) for AutomaticCompression. See my answer here: https://stackoverflow.com/a/44508724/2421277

Code from answer:

var req = WebRequest.CreateHttp(uri);

/*
 * Headers
 */
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";

/*
 * Execute
 */
try
{
    using (var resp = await req.GetResponseAsync())
    {
        using (var str = resp.GetResponseStream())
        using (var gsr = new GZipStream(str, CompressionMode.Decompress))
        using (var sr = new StreamReader(gsr))

        {
            string s = await sr.ReadToEndAsync();  
        }
    }
}
catch (WebException ex)
{
    using (HttpWebResponse response = (HttpWebResponse)ex.Response)
    {
        using (StreamReader sr = new StreamReader(response.GetResponseStream()))
        {
            string respStr = sr.ReadToEnd();
            int statusCode = (int)response.StatusCode;

            string errorMsh = $"Request ({url}) failed ({statusCode}) on, with error: {respStr}";
        }
    }
}

Are you flushing and closing the stream? Try wrapping your GZipStream with a Using Statement.


I found some sample code that shows the entire request/response for GZip encoded pages. It uses GZipStream.

http://www.know24.net/blog/Decompress+GZip+Deflate+HTTP+Responses.aspx


See my comment above, but this usually is a symptom of a corrupted file. If the site is your own, replace the file you are trying to access.


The native GZipStream can read a compressed GZIP (RFC 1952) stream, but it can't handle the ZIP file format.

From http://www.geekpedia.com/tutorial190_Zipping-files-using-GZipStream.html:

The disadvantage of using the GZipStream class over a 3rd party product is that it has limited capabilities. One of the limitations is that you cannot give a name to the file that you place in the archive. When GZipStream compresses the file into a ZIP archive, it takes the sequence of bytes from that file and uses compression algorithms that create a smaller sequence of bytes. The new sequence of bytes is put into the new ZIP file. When you open the ZIP file you will open the archived file itself; most popular ZIP extractors (WinZip, WinRar, etc.) will show you the content of the ZIP as a file that has the same as the archive itself.


EDIT: The above note is incorrect. GZipStream does not produce a ZIP file. It is not a "Single file ZIP stream". It is a GZIP Stream. They are different things. There's no guarantee that tools that handle ZIP archives will handle a .gz file.


For an implementation that can read ZIP archives, as opposed to single-file ZIP streams, try #ziplib (SharpZipLib, formerly NZipLib).

참고URL : https://stackoverflow.com/questions/839888/httpwebrequest-native-gzip-compression

반응형