0

I have the below function to pull the required columns from dynamodb, it is working fine.

The problem is, it pulling only few rows from the table.

For eg: Table has 26000+ rows but I'm able to get only 3000 rows here. Did I missed anything?

def get_columns_dynamodb():
    try:
        response = table.query(
            ProjectionExpression= " id, name, date",
            KeyConditionExpression=
            Key('opco_type').eq('cwc') and Key('opco_type').eq('cwp')
        )
        return (response['Items'])
    except Exception as error:
        logger.error(error)
4
  • 1
    Google "DynamoDB 1MB limit" & "DynamoDB Boto3 pagination" & you'll figure out why (: Commented Jan 4, 2022 at 11:09
  • yes I checked.. Is there is any way to get only required columns from db with all the rows @ErmiyaEskandary. Commented Jan 4, 2022 at 11:40
  • By using pagination,I wll get all the columns right? But in my case I need only few columns with all the rows(items) Commented Jan 4, 2022 at 11:41
  • Pagination is separate from your projection expression, they stack on top of each other so yes - if you paginate the results, you will then be able to get all the items. However, if you are pulling 26k items regularly, I'd be thinking of a more suitable DB solution. Commented Jan 4, 2022 at 12:03

1 Answer 1

1

In DynamoDB, there's no such thing as "select only these columns". Or, there sort of is, but that happens after data is fetched from storage. The entire item is always fetched, and the entire item will count towards the various limits in DynamoDB, such as 1mb max for each response, etc.

One way to solve this, is to write your data in a way that's more optimized for this query. Generally speaking, in DynamoDB, you optimize "queries" (in quotes, since they're more of a key/value read than a dynamic query with joins and selects etc) by writing optimized data.

So, when you write data to your table, you can either use a transaction to write companion items to the same or a separate table, or you can use DynamoDB streams to write the same data in a similar fashion, except async (i.e. eventually consistent).

Let's say you roll with two tables: have one table, my_things, which contains full items. Then another table, my_things_for_query_x that only has the exact data you need for that query, which will allow you to read more data in each chunk, since the data in storage only contains the data you actually need in your situation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.