0

I am trying to read this https page using java

https://www.hkex.com.hk/eng/stat/smstat/dayquot/d250602e.htm

but always hit Read timed out. Here is my code

try {
    URL url = new URL("https://www.hkex.com.hk/eng/stat/smstat/dayquot/d250620e.htm");
    HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
    conn.setRequestMethod("GET");
    BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
    String inputLine;
    while ((inputLine = in.readLine()) != null) {
        System.out.println(inputLine);
    }
    in.close();
} catch (Exception e) {
    e.printStackTrace();
}

I've tried to set the read timeout conn.setReadTimeout(240000); but it still timeout. I've tried other https pages and they work very well. Only this one failed. I've tried using curl and Invoke-WebRequest, no response is received, only browser works. Would appreciate if anyone can give me some ideas.

I've tried using Selenium getSource as suggested but ending with page is down or moved. But if I do a page source in Chrome it allows me to do it. I further try Selenium using Network.getResponseBody like this

   devTools.addListener(Network.responseReceived(),responseReceived -> {
        String responseUrl = responseReceived.getResponse().getUrl();
        RequestId requestId = responseReceived.getRequestId();
        if (responseUrl.contains("smstat/dayquot")) {
            System.out.println("Url: " + responseUrl);
            System.out.println("Response headers: " + responseReceived.getResponse().getHeaders().toString());
25 ->       System.out.println("Response body: " + devTools.send(Network.getResponseBody(requestId)).getBody());
        }

    });

A Chrome browser pop-up with the info I want. However, the program aborted with

Jul 03, 2025 8:20:24 AM org.openqa.selenium.devtools.Connection$Listener lambda$onText$0
WARNING: Unable to process: {"method":"Network.responseReceived","params":{"requestId":"D32E463F42A6CAC6BE6A7E39FA5C1023","loaderId":"D32E463F42A6CAC6BE6A7E39FA5C1023","timestamp":4344.313103,"type":"Document","response":{"url":"https://www.hkex.com.hk/eng/stat/smstat/dayquot/d250618e.htm","status":200,"statusText":"","headers":{"accept-ranges":"bytes","content-encoding":"gzip","content-type":"text/html","date":"Thu, 03 Jul 2025 00:20:23 GMT","etag":"\"07ab78648e0db1:0\"","last-modified":"Wed, 18 Jun 2025 12:00:04 GMT","strict-transport-security":"max-age=480","vary":"Accept-Encoding","x-akamai-transformed":"0 - 0 -"},"mimeType":"text/html","charset":"","connectionReused":false,"connectionId":52,"remoteIPAddress":"23.35.148.143","remotePort":443,"fromDiskCache":false,"fromServiceWorker":false,"fromPrefetchCache":false,"encodedDataLength":1281,"timing":{"requestTime":4342.845654,"proxyStart":-1,"proxyEnd":-1,"dnsStart":-1,"dnsEnd":-1,"connectStart":-1,"connectEnd":-1,"sslStart":-1,"sslEnd":-1,"workerStart":-1,"workerReady":-1,"workerFetchStart":-1,"workerRespondWithSettled":-1,"sendStart":1110.392,"sendEnd":1110.758,"pushStart":0,"pushEnd":0,"receiveHeadersStart":1462.685,"receiveHeadersEnd":1463.48},"responseTime":1.751502024562685e+12,"protocol":"h2","alternateProtocolUsage":"unspecifiedReason","securityState":"secure","securityDetails":{"protocol":"TLS 1.3","keyExchange":"","keyExchangeGroup":"X25519","cipher":"AES_256_GCM","certificateId":0,"subjectName":"*.hkex.com.hk","sanList":["*.hkex.com.hk","hkex.com.hk"],"issuer":"Sectigo RSA Organization Validation Secure Server CA","validFrom":1728432000,"validTo":1761609599,"signedCertificateTimestampList":[{"status":"Verified","origin":"Embedded in certificate","logDescription":"Google 'Xenon2025h2' log","logId":"DDDCCA3495D7E11605E79532FAC79FF83D1C50DFDB003A1412760A2CACBBC82A","timestamp":1728445455641,"hashAlgorithm":"SHA-256","signatureAlgorithm":"ECDSA","signatureData":"3044022042FB38EF079A0BE8355ABEA16524CCE9FB1F4433806E00D735D097683638B512022015F93705DECE739B4EC30CCDEA42F9791B86F1B4D6634BC6BF045279785784D3"},{"status":"Verified","origin":"Embedded in certificate","logDescription":"Cloudflare 'Nimbus2025'","logId":"CCFB0F6A85710965FE959B53CEE9B27C22E9855C0D978DB6A97E54C0FE4C0DB0","timestamp":1728445455647,"hashAlgorithm":"SHA-256","signatureAlgorithm":"ECDSA","signatureData":"304502200689FBEB54868A58DB0EBCB9241A617E814CE0A9339AB7D2CDDD7AECF91497AD022100EAB140F15E621C1ECAB97A2152AF795BE73C2C0CCFC4C4306AF48D121D9CF8AD"},{"status":"Verified","origin":"Embedded in certificate","logDescription":"Google 'Argon2025h2' log","logId":"12F14E34BD53724C840619C38F3F7A13F8E7B56287889C6D300584EBE586263A","timestamp":1728445455617,"hashAlgorithm":"SHA-256","signatureAlgorithm":"ECDSA","signatureData":"304502203C2D26137292013CC4BEE5CDCE99587A9375E47A68D984C2B2DECCBECEB3EA7502210096A74DF6E7E2F0CE1D2B2BC2D897419266E1579187395DE59A22A43952854453"}],"certificateTransparencyCompliance":"compliant","serverSignatureAlgorithm":2052,"encryptedClientHello":false}},"hasExtraInfo":true,"frameId":"FDDF984E462E75D6E90D0327769D2CF5"},"sessionId":"BAB7F6480E49DF21284E1F9BF9B0D16C"}
org.openqa.selenium.devtools.DevToolsException: {"id":6,"error":{"code":-32000,"message":"No data found for resource with given identifier"},"sessionId":"BAB7F6480E49DF21284E1F9BF9B0D16C"}
Build info: version: '4.34.0', revision: '2a4c61c498'
System info: os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.1'
Driver info: driver.version: unknown
    at org.openqa.selenium.devtools.Connection.sendAndWait(Connection.java:195)
    at org.openqa.selenium.devtools.DevTools.send(DevTools.java:94)
    at org.openqa.selenium.devtools.DevTools.send(DevTools.java:89)
    at testselenium.TestResponse.lambda$main$0(TestResponse.java:25)
    at org.openqa.selenium.devtools.DevTools.lambda$addListener$0(DevTools.java:108)
    at org.openqa.selenium.devtools.Connection.lambda$handle$5(Connection.java:348)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
    at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
    at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:442)
    at java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1746)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
    at org.openqa.selenium.devtools.Connection.handle(Connection.java:311)
    at org.openqa.selenium.devtools.Connection$Listener.lambda$onText$0(Connection.java:239)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.openqa.selenium.WebDriverException: {"id":6,"error":{"code":-32000,"message":"No data found for resource with given identifier"},"sessionId":"BAB7F6480E49DF21284E1F9BF9B0D16C"}

Any further help is much appreciated.

2
  • If you only observe browsers to successfully load the page then perhaps the server is discriminating by user agent. Perhaps it's even doing so badly. Commented Jun 24 at 2:51
  • 2
    Assuming that this is some kind of anti-scraper defense, you should probably read the site's Terms of Use ... and conform to them. Attempting to evade the existing blocks may lead to escalating counter-measures. Commented Jun 24 at 4:21

1 Answer 1

0

https://www.hkex.com.hk/eng/stat/smstat/dayquot/d250602e.htm uses security measures (e.g., bot protection, user-agent filtering, header inspection, JavaScript challenges) that block non-browser clients like HttpsURLConnection.

You might wanna try adding a browser header.

import javax.net.ssl.HttpsURLConnection;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

public class HKEXFetcher {
    public static void main(String[] args) {
        try {
            URL url = new URL("https://www.hkex.com.hk/eng/stat/smstat/dayquot/d250620e.htm");
            HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
            conn.setRequestMethod("GET");
            conn.setReadTimeout(240000);
            conn.setConnectTimeout(15000);

            // Spoof browser headers
            conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36");
            conn.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
            conn.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
            conn.setRequestProperty("Connection", "keep-alive");
            conn.setInstanceFollowRedirects(true);

            int responseCode = conn.getResponseCode();
            if (responseCode != 200) {
                System.out.println("Failed : HTTP error code : " + responseCode);
                return;
            }

            BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8"));
            String inputLine;
            while ((inputLine = in.readLine()) != null) {
                System.out.println(inputLine);
            }
            in.close();
            conn.disconnect();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

If it doesn't work use Selenium Driver instead, so it would be something like this:

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;

public class HKEXSeleniumFetcher {
    public static void main(String[] args) {
        System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver"); // Update path

        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless", "--disable-gpu");

        WebDriver driver = new ChromeDriver(options);
        driver.get("https://www.hkex.com.hk/eng/stat/smstat/dayquot/d250620e.htm");

        String pageSource = driver.getPageSource();
        System.out.println(pageSource);

        driver.quit();
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

In your list the most important measure is missing: TLS fingerprinting. Important because it works way earlier that the other measures and it explains why some clients work but others not even if he request is identical.
Thank you Altxxr0, I've tried both. Adding request properties didn't make any difference. Selenium didn't timeout, but it only return a html with an error message saying that "The web page at ...... might be temporarily down or it may have moved permanently to a new web address".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.