Newest 'chunking' Questions

1 vote

0 answers

107 views

Closing a channel from inside an FnMut callback

I'm trying to fill a certain number of samples from a cpal microphone source (about 10 seconds worth). I want to process those samples as they come in with low latency and in regularly sized blocks of ...

Jason Kleban

21k

asked Jul 11 at 20:53

1 vote

0 answers

62 views

Scaling RAG QA with Large Docs, Tables, and 30K+ Chunks (No LangChain)

I'm building a RAG-based document QA system using Python (no LangChain), LLaMA (50K context), PostgreSQL with pgvector, and Docling for parsing. Users can upload up to 10 large documents (300+ pages ...

Anton Lee

11

asked Jun 2 at 16:30

3 votes

1 answer

153 views

Reading through multiple files in chunks in R

I'm trying to read through multiple compressed tables that are 5GB+ in size in R, and because I have insufficient memory to read them into memory all at once I need to process them one chunk at a time,...

Alex Petty

80

asked Apr 5 at 2:05

0 votes

0 answers

39 views

How to implement StreamResponse in ASP.NET

This is ASP.NET API function to process real-time chatbot with Qwen2.5 local setup with FastAPI. This is current code, but StreamResponse doesn't work correctly. How to optimize this code? [Authorize] ...

MONATE

23

asked Feb 12 at 10:01

0 votes

0 answers

93 views

Perlin noise procedural generation - issue with generating chunks within unity

I am trying to develop a map generation system in Unity. I want to have a system for chunks and have simplified the problem down to a small amount of code, which is not attempting a great deal. I just ...

Seb

11

asked Dec 27, 2024 at 0:41

0 votes

0 answers

53 views

Finding a way to iterate using the input of two xarray dataarrays when chunked

I am developing a relatively large model using xarray and therefore want to make use of chunks. Most of my operations run a lot faster when chunked but there is one that keeps running (a lot) slower ...

Rogier Westerhoff

1

asked Dec 12, 2024 at 2:11

0 votes

1 answer

170 views

Remote Partitioning in Spring Batch - Job completion for small number of records can't de done until Job completes for large number of records

I am trying Spring Batch with remote partitioning [ master-salve approach]. I have one master step which sends records to worker nodes via KAFKA. All was working fine until parallel job executions ...

Mahantesh Masali

1

asked Dec 6, 2024 at 10:41

0 votes

1 answer

287 views

how to set a proper chunk size in hdf5

according to this answer, a proper chunk size is important for optimizing I/O perfromance. I am 3000 jpg images, whose size vary from 180kB to 220kB. I am going to save them as bytes. I know 2 methods ...

zheyuanWang

1,462

asked Sep 10, 2024 at 3:28

0 votes

2 answers

375 views

RAG with LlamaIndex SubDocument, how to persist embeddings

Im doing a RAG model with some documents. Testing Llamaindex SubDocSummaryPack, seems to be a good choice for documents chunking instead of simple chunking the original information. After using ...

Diego

11

asked Jun 7, 2024 at 16:56

2 votes

0 answers

183 views

System.Text.Json.JsonSerializer DeserializeAsyncEnumerable works slowly with large json files (10+mb)

I'm using .NET 8, System.Text.Json 8 HttpClient to make request, HttpConnection.ChunkedEncodingReadStream as input stream from the response, and JsonSerializer.DeserializeAsyncEnumerable to ...

Oleh Hrechukh

463

asked Apr 3, 2024 at 11:00

1 vote

0 answers

465 views

Is AI21SemanticTextSplitter (from langchain_ai21) Deprecated?

Has anyone tried using Langchain's AI21 integration AI21SemanticTextSplitter? There is a mention of it on Langchain's Text Splitters Page. This is its documentation. I tried the examples given there ...

Akshitha Rao

61

asked Mar 28, 2024 at 11:37

-1 votes

1 answer

2k views

Semantic Chunking with Langchain on FAISS vectorstore

I have this Langchain code for my own dataset: from langchain_community.vectorstores import FAISS from langchain_openai import ChatOpenAI, OpenAIEmbeddings vectorstore = FAISS.from_texts( docs, ...

user17811469

1

asked Mar 25, 2024 at 23:42

1 vote

2 answers

7k views

How to select chunk size of data for embedding with an LLM?

I have structured data (CSV) that has a column of semantically rich text of variable length. I could mine the data so the CSV file has a max length per row of data by using an LLM to summarize ...

Lance Kind

1,116

asked Feb 27, 2024 at 13:56

0 votes

1 answer

420 views

Explain Dask-cuDF behavior

I try to read and process the 8gb csv file using cudf. Reading all file at once doesn't fit neither into GPU memory nor into my RAM. That's why I use the dask_cudf library. Here is the code: import ...

shda

734

asked Jan 25, 2024 at 12:13

0 votes

0 answers

123 views

How to handle spring batch remote chunking in Kubernates pod?

I am trying to implement spring batch remote chunking for heavy file . What should be ideal setup for deployment of remote chunking physically in kubernetes . 1)Can we setup worker and manager in same ...

richa kumari

41

asked Jan 10, 2024 at 10:01

0 votes

1 answer

2k views

LlamaIndex small-to-big chunking strategy in RAG pipeline, limits the chunk size a lot

I am working on a RAG system using LlamaIndex. I try to adapt small-to-big chunking strategy for retrieval stage. I have numerous articles as inputs and some metadata about them. here is the list of ...

aearslan

176

asked Jan 2, 2024 at 17:39

1 vote

0 answers

393 views

Avoid gRPC message exceeds maximum size

I'm using gRPC to send messages from Java to a Python service. Recently I started getting larger messages that sometimes exceed the maximum message size for gRPC. Increasing that size is not possible ...

Noam_I

17

asked Dec 20, 2023 at 10:49

-1 votes

1 answer

307 views

I want to send a "mixed", chunked response of JSON and a string with Express

I'm working on a ChatGPT integration in Node/Express and would like to first respond to my client with some metadata in JSON, then start streaming ChatGPT's response as it streams in. Currently, there ...

E-Madd

4,592

asked Dec 12, 2023 at 17:34

0 votes

0 answers

322 views

Context based chunks using Adobe PDF Extract Python

So, recently I came across adobe pdf extraction API, I'm using python and for those who aren't aware of adobe's extraction methods, given a PDF the API returns back the extracted text with each ...

Daniel

11

asked Nov 15, 2023 at 13:47

0 votes

2 answers

2k views

How to perform document retrieval in large database to augment prompt of LLM?

I have a large database of documents (these “documents” are essentially web pages and they are all in HTML). They have information regarding the business itself and can contain a lot of similar ...

Bruno Vaz

1

asked Nov 15, 2023 at 12:29

0 votes

1 answer

245 views

Merging two chunked dataframes

Working with large datasets in Python via Pandas and initially chunked the two datasets so they could load into memory but not sure how to merge them given they are turned into TextFileReader instead ...

guanhelluh

1

asked Nov 3, 2023 at 12:22

0 votes

1 answer

499 views

DOMException Error when splitting large file blob into chunks

can someone please help me to debug this function const chunkify = async (file: Blob) => { const totalSize = file.size; const chunkSize = 1024 * 1024 * 100; // 100MB const chunks = [] as ...

Oyedeji

112

asked Sep 3, 2023 at 21:30

2 votes

1 answer

375 views

Reading large csv file in chunks with pandas

I am trying to read large csv file (84GB) in chunks with pandas, filter out necessary rows and convert it to df import pandas as pd chunk_size = 1000000 # Number of rows to read per chunk my_df = pd....

Yerassyl Pirzhanov

21

asked May 28, 2023 at 7:03

3 votes

1 answer

2k views

SQLAlchemy isn't batching rows / using server side cursor via `yield_per`

Following documentation, and the code snippet provided from https://docs.sqlalchemy.org/en/14/core/connections.html#streaming-with-a-fixed-buffer-via-yield-per (posted directly below), my query is not ...

scrollout

594

asked May 15, 2023 at 20:48

0 votes

1 answer

212 views

uploaded video file with js chunking upload and php not working for file abve 10mb

`i tried writing a chunk upload code with js and php, it works fine when you upload a video file less than a 10mb but uploading a video file 17.6mb, the file uploads with it maintains it appropriate ...

Theophilus Tetteh

3

asked May 7, 2023 at 1:43

0 votes

1 answer

2k views

Download a large file >100MB from a gRPC server in chunks using gRPC streaming? How can I track which chunk to send next on the server?

I am trying to download a large file >100MB from a gRPC server using gRPC bidirectional streaming. I need to break the file into chunks on the server and stream the bytes. I am not sure how to ...

aytida

1

asked Feb 28, 2023 at 20:02

1 vote

0 answers

186 views

efficient partitioning column-wise when converting from dask dataframe to xarray (dask array)

A common task in my daily data wrangling is converting tab-delimited text files to xarray datasets and continuing analysis on the dataset and saving to zarr or netCDF format. I have developed a data ...

officialankan

11

asked Dec 20, 2022 at 8:20

0 votes

1 answer

405 views

Spring Batch Remote Chunking Chunk Response

I have implemented Spring Batch Remote Chunking with Kafka. I have implemented both Manager and worker configuration. I want to send some DTO or object in chunkresponse from worker side to Manager and ...

akashsharma3030

23

asked Oct 16, 2022 at 8:21

0 votes

0 answers

252 views

Send arrays of objects using express?

Cant figure out how to send arrays of objects Currently I have a back-end MongoDB database with multiple collections which I have to query for matching data and send the data back to the client as ...

Muhammad Yahya Warraich

119

asked Oct 6, 2022 at 17:47

1 vote

1 answer

2k views

Howto optimize webpack chunking with vue.config.js to speed up gitlab build time

My original configuration in vue.config.js using the default chunking strategy, which takes about 5 minutes to build locally and 35 minutes in gitlab pipeline, and results in one chunk being > 50MB ...

robbyc73

23

asked Sep 9, 2022 at 0:21

0 votes

1 answer

350 views

Chunking - regular expressions and trees

I'm a total noob so sorry if I'm asking something obvious. My question is twofold, or rather it's two questions in the same topic: I'm studying nltk in Uni, and we're doing chunks. In the grammar I ...

sickboy83

37

asked Sep 2, 2022 at 16:28

6 votes

0 answers

538 views

Laravel 8 chunk memory leak

I have an artisan command that is scheduler to run every 15 minutes to update a model calculated field. The command process thousands of rows and ends up eating all of the memory. My understanding of ...

naghal

726

asked Aug 10, 2022 at 16:13

1 vote

0 answers

896 views

How to use chunking to reduce file size

I'm trying to read a ~2.3 GB file that exceeds the RAM of my hardware. I want to apply chunking to read this file. I tried chunksize of 10**2, 10**3, 10**4, and 10**5 but all still exceed the RAM ...

melolilili

247

asked Jul 22, 2022 at 20:13

3 votes

1 answer

3k views

What is the difference between chunk and pagination in Laravel?

I already read some article but I am still confused. I n pagination it will execute a query when load a page but what happen in chunk ? I read https://laravel-school.com/posts/laravel-pagination-vs-...

Parvez Hossain

88

asked Jun 15, 2022 at 10:10

3 votes

1 answer

3k views

tqdm progress for processing sequence in chunks

I am processing a sequence in chunks, where the last chunk may be shorter, and would like to show progress bar showing the number of items. The straightforward approach is import tqdm, math total=567 ...

eudoxos

19.2k

asked Jun 7, 2022 at 19:40

1 vote

0 answers

204 views

How to disable chunking in webpack 4

I am running react project developed using CRA in localhost in network tab it shows chunk.js files like shown in image I want it show original file names like which component file is rendering, so is ...

Akash Verma

11

asked Apr 1, 2022 at 9:47

1 vote

2 answers

2k views

How to overwrite file with sequential chunks in Golang

How to read a large file by chunking it and process each chunk sequentially then overwrite the resulted chunk to where it exactly came from(the same position or offset of file)? e.g: i want to read 1 ...

mohammad

31

asked Mar 31, 2022 at 8:59

0 votes

1 answer

67 views

Pandas: compute daily statistics when chunking

Consider a postgres table where for the date 2022-05-01 we have 200 values for various times: time value ...

sci9

776

asked Mar 27, 2022 at 9:19

1 vote

0 answers

145 views

Generate .rds files below 100MB to avoid Git LFS

I have a bunch of fit.objects which produce *.rds files larger than 100MB. GitHub comes with a file size limitation of 100MB and the solution of git lfs doesn't fit me since I don't want to pay for ...

mugdi

435

asked Mar 21, 2022 at 15:21

0 votes

1 answer

315 views

Partition a large list into chunks with convenient I/O

I have a large list with size of approx. 1.3GB. I'm looking for the fastest solution in R to generate chunks and save them in any convenient format so that : a) every saved file of the chunk is less ...

mugdi

435

asked Mar 17, 2022 at 12:44

-1 votes

1 answer

356 views

how can insert 1000000 row from textarea into database in laravel?

how can insert 1000000 row from textarea into database in laravel 8 ??????? i write this code and just can insert 30000 row and then browser give me HTTP ERROR 500 i set max_execution_time to 300 in ...

ProSonic

61

asked Feb 24, 2022 at 8:00

2 votes

0 answers

2k views

xarray chunk dataset PerformanceWarning: Slicing with an out-of-order index is generating more chunks

I am trying to run a simple calculation based on two big gridded datasets in xarray (around 5 GB altogether, daily data from 1850-2100). I keep running out of memory when I try it this way import ...

scriptgirl_3000

303

asked Feb 21, 2022 at 6:07

0 votes

0 answers

261 views

Process Large Data in Chunks

I have a large program where I am trying to read approximately 30000 lines of data and process it. I know that I can use the chuncksize functionality to do this, but I think I am not executing this ...

lcfields

83

asked Jan 18, 2022 at 19:43

0 votes

1 answer

259 views

Loading small amounts of data from Lumen API time

I have an API built in lumen and I plan to consume the API json response in the frontend using a single page framework like Angular. The problem is the response from some routes contain huge amount of ...

Nelson Thembeni

105

asked Jan 6, 2022 at 11:41

3 votes

0 answers

559 views

Chunking a django-import-export

I am reading this article about chunking a large database operation. I am also using django-import-export and django-import-export-celery in my admin site and I would like to integrate chunking into ...

Prosy A.

2,822

asked Jan 5, 2022 at 19:08

0 votes

2 answers

3k views

Chunk data from array of arrays

Before marking this as answered by another question please note this is an array of arrays, not a flat array, also, the number I have given are an example, I have just shown them so you can visually ...

Web Nexus

1,148

asked Jan 3, 2022 at 21:29

0 votes

1 answer

527 views

Read in large text file (~20m rows), apply function to rows, write to new text file

I have a very large text file, and a function that does what I want it to do to each line. However, when reading line by line and applying the function, it takes roughly three hours. I'm wondering if ...

chrislee

27

asked Dec 3, 2021 at 19:25

2 votes

0 answers

1k views

Reading Data in Chunks Using SQLAlchemy

I have a block of code that looks like this, entities = self.session.query(Entities).filter(Entities.parent_id == 0) index_data = {} for entity in entities: data = entity.__dict__ data['...

Minura Punchihewa

2,115

asked Nov 25, 2021 at 12:19

1 vote

1 answer

2k views

How does loading HTML5 video chunks actually work?

Basically my understanding is this: whenever a video player is playing media, it is downloading it in chunks, defined by the RANGE header. The server serves only the bytes requested from the file. ...

php_nub_qq

16.1k

asked Nov 2, 2021 at 15:01

0 votes

0 answers

357 views

Chunking >100mb file on Flask

I would like to upload files between 100-200mb using Flask on PythonAnywhere (which has a 100mb upload limit). I am trying to implement chunking but am still getting a 413 Request Entity Too Large ...

zweiss

1

asked Sep 21, 2021 at 14:26

Collectives™ on Stack Overflow