development

WebView에서 웹 페이지 콘텐츠를 가져 오려면 어떻게합니까?

big-blog 2020. 10. 4. 11:57
반응형

WebView에서 웹 페이지 콘텐츠를 가져 오려면 어떻게합니까?


Android에서는 WebView페이지를 표시하는 것이 있습니다.

페이지를 다시 요청하지 않고 페이지 소스를 얻으려면 어떻게합니까?

보인다 WebView어떤 종류가 있어야 getPageSource()문자열을 반환하는 방법을하지만, 슬프게도 그렇지 않습니다.

JavaScript를 활성화하면 콘텐츠를 가져 오기 위해이 호출에 넣을 적절한 JavaScript는 무엇입니까?

webview.loadUrl("javascript:(function() { " +  
    "document.getElementsByTagName('body')[0].style.color = 'red'; " +  
    "})()");  

나는 이것이 늦은 대답이라는 것을 알고 있지만 같은 문제가 있었기 때문에이 질문을 찾았습니다. lexandera.com 의이 게시물 에서 답을 찾은 것 같습니다. 아래 코드는 기본적으로 사이트에서 잘라내어 붙여 넣은 것입니다. 트릭을하는 것 같습니다.

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface
{
    @JavascriptInterface
    @SuppressWarnings("unused")
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */
browser.getSettings().setJavaScriptEnabled(true);

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    @Override
    public void onPageFinished(WebView view, String url)
    {
        /* This call inject JavaScript into the page which just finished loading. */
        browser.loadUrl("javascript:window.HTMLOUT.processHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
    }
});

/* load a web page */
browser.loadUrl("http://lexandera.com/files/jsexamples/gethtml.html");

문제 12987 , Blundell은의 대답은 (내 2.3 VM에 적어도) 충돌합니다. 대신 특수 접두사가있는 console.log 호출을 가로 챕니다.

// intercept calls to console.log
web.setWebChromeClient(new WebChromeClient() {
    public boolean onConsoleMessage(ConsoleMessage cmsg)
    {
        // check secret prefix
        if (cmsg.message().startsWith("MAGIC"))
        {
            String msg = cmsg.message().substring(5); // strip off prefix

            /* process HTML */

            return true;
        }

        return false;
    }
});

// inject the JavaScript on page load
web.setWebViewClient(new WebViewClient() {
    public void onPageFinished(WebView view, String address)
    {
        // have the page spill its guts, with a secret prefix
        view.loadUrl("javascript:console.log('MAGIC'+document.getElementsByTagName('html')[0].innerHTML);");
    }
});

web.loadUrl("http://www.google.com");

This is an answer based on jluckyiv's, but I think it is better and simpler to change Javascript as follows.

browser.loadUrl("javascript:HTMLOUT.processHTML(document.documentElement.outerHTML);");

Have you considered fetching the HTML separately, and then loading it into a webview?

String fetchContent(WebView view, String url) throws IOException {
    HttpClient httpClient = new DefaultHttpClient();
    HttpGet get = new HttpGet(url);
    HttpResponse response = httpClient.execute(get);
    StatusLine statusLine = response.getStatusLine();
    int statusCode = statusLine.getStatusCode();
    HttpEntity entity = response.getEntity();
    String html = EntityUtils.toString(entity); // assume html for simplicity
    view.loadDataWithBaseURL(url, html, "text/html", "utf-8", url); // todo: get mime, charset from entity
    if (statusCode != 200) {
        // handle fail
    }
    return html;
}

I managed to get this working using the code from @jluckyiv's answer but I had to add in @JavascriptInterface annotation to the processHTML method in the MyJavaScriptInterface.

class MyJavaScriptInterface
{
    @SuppressWarnings("unused")
    @JavascriptInterface
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

You also need to annotate the method with @JavascriptInterface if your targetSdkVersion is >= 17 - because there is new security requirements in SDK 17, i.e. all javascript methods must be annotated with @JavascriptInterface. Otherwise you will see error like: Uncaught TypeError: Object [object Object] has no method 'processHTML' at null:1


If you are working on kitkat and above, you can use the chrome remote debugging tools to find all the requests and responses going in and out of your webview and also the the html source code of the page viewed.

https://developer.chrome.com/devtools/docs/remote-debugging

참고URL : https://stackoverflow.com/questions/2376471/how-do-i-get-the-web-page-contents-from-a-webview

반응형