How to do pagination in postgres sql?

Question

I have a python script that I am using using to make sql queries. The problem is that my VM only has 2GBs of RAM and some of the sql queries are too RAM intensive and therefore the kernel automatically kills the script. How can I make this code more RAM efficient? I would like to implement pagination in my postgres sql code. How would I do that? Does anyone know an easy implementation of that? I would greatly appreciate your help!

Updated Code

from __future__ import print_function

try:
    import psycopg2
except ImportError:
    raise ImportError('\n\033[33mpsycopg2 library missing. pip install psycopg2\033[1;m\n')
    sys.exit(1)


import re
import sys
import json
import pprint
import time

outfilepath = "crtsh_output/crtsh_flat_file"

DB_HOST = 'crt.sh'
DB_NAME = 'certwatch'
DB_USER = 'guest'

# DELAY = 0


def connect_to_db():
    start = 0
    offset = 10
    flag = True
    while flag:
        filepath = 'forager.txt'
        with open(filepath) as fp:
            unique_domains = ''
            try:
                conn = psycopg2.connect("dbname={0} user={1} host={2}".format(DB_NAME, DB_USER, DB_HOST))
                cursor = conn.cursor()
                cursor.itersize = 10000
                for cnt, domain_name in enumerate(fp):
                    print("Line {}: {}".format(cnt, domain_name))
                    print(domain_name)
                    domain_name = domain_name.rstrip()

                    cursor.execute('''SELECT c.id, x509_commonName(c.certificate), x509_issuerName(c.certificate), x509_notBefore(c.certificate), x509_notAfter(c.certificate), x509_issuerName(c.certificate), x509_keyAlgorithm(c.certificate), x509_keySize(c.certificate), x509_publicKeyMD5(c.certificate), x509_publicKey(c.certificate), x509_rsaModulus(c.certificate), x509_serialNumber(c.certificate), x509_signatureHashAlgorithm(c.certificate), x509_signatureKeyAlgorithm(c.certificate), x509_subjectName(c.certificate), x509_name(c.certificate), x509_name_print(c.certificate), x509_commonName(c.certificate), x509_subjectKeyIdentifier(c.certificate), x509_extKeyUsages(c.certificate), x509_certPolicies(c.certificate), x509_canIssueCerts(c.certificate), x509_getPathLenConstraint(c.certificate), x509_altNames(c.certificate), x509_altNames_raw(c.certificate), x509_cRLDistributionPoints(c.certificate), x509_authorityInfoAccess(c.certificate), x509_print(c.certificate), x509_anyNamesWithNULs(c.certificate), x509_extensions(c.certificate), x509_tbscert_strip_ct_ext(c.certificate), x509_hasROCAFingerprint(c.certificate)
                    FROM certificate c, certificate_identity ci WHERE
                    c.id= ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) =
                    lower(%s) AND x509_notAfter(c.certificate) > statement_timestamp()''', (domain_name,))


                # query db with start and offset
                unique_domains = cursor.fetchall()
                if not unique_domains:
                    flag = False
                else:
                        # do processing with your data

                    pprint.pprint(unique_domains)

                    outfilepath = "crtsh2" + ".json"
                    with open(outfilepath, 'a') as outfile:
                            outfile.write(json.dumps(unique_domains, sort_keys=True, indent=4, default=str, ensure_ascii = False))
                    offset += limit


            except Exception as error:
                print(str(error))

if __name__ == "__main__":
    connect_to_db()

use something like cur.fetchmany(n). Returns the next 'n' rows from your query. — Gurmokh
– Gurmokh, Commented Aug 8, 2018 at 13:18
@Mokadillion Thank you for your response! In which section of my code should I implement cur.fetchmany(n)? — bedford
– bedford, Commented Aug 8, 2018 at 13:41
One odd thing is that the cursor.fetchall() is called outside the loop - which means the database never gets a chance to close resources consumed by prior runs of the query. You should process query results for each execute() call - append to a list, update a set, etc. — bimsapi
– bimsapi, Commented Aug 8, 2018 at 13:59
@bimsapi Thank you for your help! For cursor.fetchall() to be inside the loop, does it need to be indented once? — bedford
– bedford, Commented Aug 8, 2018 at 14:05
Yes - indent to the same level as the other statements in the loop. I would also avoid opening and closing crtsh2.json on each iteration. For simplicity, manage both files at the same time via with open(filepath) as fp, open('crtsh2.json') as outfile: — bimsapi
– bimsapi, Commented Aug 8, 2018 at 14:17

Ajay Gupta · Accepted Answer · 2018-08-08 12:18:15Z

3

may be something like this:

limit = 10
offset = 0
flag = True
while flag:
    # query db with start and offset, example: select * from domains limit %start% offset %offset%
    unique_domains = cursor.fetchall()
    if not unique_domains:
        flag = False
    else:
        # do processing with your data
        offset += limit

answered Aug 8, 2018 at 12:18

Ajay Gupta

1,2858 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

bedford Over a year ago

@Ajay_Gupta Thank you for your response. Are you saying I should put this code in place of where I have "unique_domains = cursor.fetchall()"?

Ajay Gupta Over a year ago

@bedford this is a pseudo code, you have to query based on number of records you want to fetch defined in limit.

bedford Over a year ago

@Ajay_Gupta Thank you again for your response! What exactly do you mean by this being pseudo code?

Ajay Gupta Over a year ago

@bedford it just mean that you have to insert your code into this logical code

bedford Over a year ago

@Ajay_Gupta I updated my code above. However, my code does not work. It just stalls on the first query and does nothing. Any help would be appreciated!

AS Mackay · Accepted Answer · 2019-07-23 19:52:08Z

1

I found a link to paginate in Postgres. Five ways to paginate in Postgres, from the basic to the exotic

Here's an example: Keyset Pagination The techniques above can paginate any kind of query, including queries without order clauses. If we are willing to forgo this generality we reap optimizations. In particular when ordering by indexed column(s) the client can use values in the current page to choose which items to show in the next page. This is called keyset pagination.

For example let’s return to the medley example:

-- Add an index for keyset pagination (btrees support inequality)
CREATE INDEX n_idx ON medley USING btree (n);
SELECT * FROM medley ORDER BY n ASC LIMIT 5;

edited Jul 23, 2019 at 19:52

AS Mackay

2,8559 gold badges21 silver badges27 bronze badges

answered Jul 23, 2019 at 19:26

li Anna

3814 silver badges8 bronze badges

Collectives™ on Stack Overflow

How to do pagination in postgres sql?

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related