382 questions
1
vote
0
answers
107
views
Closing a channel from inside an FnMut callback
I'm trying to fill a certain number of samples from a cpal microphone source (about 10 seconds worth). I want to process those samples as they come in with low latency and in regularly sized blocks of ...
1
vote
0
answers
62
views
Scaling RAG QA with Large Docs, Tables, and 30K+ Chunks (No LangChain)
I'm building a RAG-based document QA system using Python (no LangChain), LLaMA (50K context), PostgreSQL with pgvector, and Docling for parsing. Users can upload up to 10 large documents (300+ pages ...
3
votes
1
answer
153
views
Reading through multiple files in chunks in R
I'm trying to read through multiple compressed tables that are 5GB+ in size in R, and because I have insufficient memory to read them into memory all at once I need to process them one chunk at a time,...
0
votes
0
answers
39
views
How to implement StreamResponse in ASP.NET
This is ASP.NET API function to process real-time chatbot with Qwen2.5 local setup with FastAPI.
This is current code, but StreamResponse doesn't work correctly.
How to optimize this code?
[Authorize]
...
0
votes
0
answers
93
views
Perlin noise procedural generation - issue with generating chunks within unity
I am trying to develop a map generation system in Unity. I want to have a system for chunks and have simplified the problem down to a small amount of code, which is not attempting a great deal. I just ...
0
votes
0
answers
53
views
Finding a way to iterate using the input of two xarray dataarrays when chunked
I am developing a relatively large model using xarray and therefore want to make use of chunks. Most of my operations run a lot faster when chunked but there is one that keeps running (a lot) slower ...
0
votes
1
answer
170
views
Remote Partitioning in Spring Batch - Job completion for small number of records can't de done until Job completes for large number of records
I am trying Spring Batch with remote partitioning [ master-salve approach].
I have one master step which sends records to worker nodes via KAFKA.
All was working fine until parallel job executions ...
0
votes
1
answer
287
views
how to set a proper chunk size in hdf5
according to this answer, a proper chunk size is important for optimizing I/O perfromance.
I am 3000 jpg images, whose size vary from 180kB to 220kB. I am going to save them as bytes.
I know 2 methods ...
0
votes
2
answers
375
views
RAG with LlamaIndex SubDocument, how to persist embeddings
Im doing a RAG model with some documents.
Testing Llamaindex SubDocSummaryPack, seems to be a good choice for documents chunking instead of simple chunking the original information.
After using ...
2
votes
0
answers
183
views
System.Text.Json.JsonSerializer DeserializeAsyncEnumerable works slowly with large json files (10+mb)
I'm using .NET 8, System.Text.Json 8 HttpClient to make request, HttpConnection.ChunkedEncodingReadStream as input stream from the response, and JsonSerializer.DeserializeAsyncEnumerable to ...
1
vote
0
answers
465
views
Is AI21SemanticTextSplitter (from langchain_ai21) Deprecated?
Has anyone tried using Langchain's AI21 integration AI21SemanticTextSplitter?
There is a mention of it on Langchain's Text Splitters Page.
This is its documentation.
I tried the examples given there ...
-1
votes
1
answer
2k
views
Semantic Chunking with Langchain on FAISS vectorstore
I have this Langchain code for my own dataset:
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
vectorstore = FAISS.from_texts(
docs, ...
1
vote
2
answers
7k
views
How to select chunk size of data for embedding with an LLM?
I have structured data (CSV) that has a column of semantically rich text of variable length. I could mine the data so the CSV file has a max length per row of data by using an LLM to summarize ...
0
votes
1
answer
420
views
Explain Dask-cuDF behavior
I try to read and process the 8gb csv file using cudf. Reading all file at once doesn't fit neither into GPU memory nor into my RAM. That's why I use the dask_cudf library. Here is the code:
import ...
0
votes
0
answers
123
views
How to handle spring batch remote chunking in Kubernates pod?
I am trying to implement spring batch remote chunking for heavy file .
What should be ideal setup for deployment of remote chunking physically in kubernetes .
1)Can we setup worker and manager in same ...
0
votes
1
answer
2k
views
LlamaIndex small-to-big chunking strategy in RAG pipeline, limits the chunk size a lot
I am working on a RAG system using LlamaIndex. I try to adapt small-to-big chunking strategy for retrieval stage. I have numerous articles as inputs and some metadata about them. here is the list of ...
1
vote
0
answers
393
views
Avoid gRPC message exceeds maximum size
I'm using gRPC to send messages from Java to a Python service. Recently I started getting larger messages that sometimes exceed the maximum message size for gRPC. Increasing that size is not possible ...
-1
votes
1
answer
307
views
I want to send a "mixed", chunked response of JSON and a string with Express
I'm working on a ChatGPT integration in Node/Express and would like to first respond to my client with some metadata in JSON, then start streaming ChatGPT's response as it streams in.
Currently, there ...
0
votes
0
answers
322
views
Context based chunks using Adobe PDF Extract Python
So, recently I came across adobe pdf extraction API, I'm using python and for those who aren't aware of adobe's extraction methods, given a PDF the API returns back the extracted text with each ...
0
votes
2
answers
2k
views
How to perform document retrieval in large database to augment prompt of LLM?
I have a large database of documents (these “documents” are essentially web pages and they are all in HTML). They have information regarding the business itself and can contain a lot of similar ...
0
votes
1
answer
245
views
Merging two chunked dataframes
Working with large datasets in Python via Pandas and initially chunked the two datasets so they could load into memory but not sure how to merge them given they are turned into TextFileReader instead ...
0
votes
1
answer
499
views
DOMException Error when splitting large file blob into chunks
can someone please help me to debug this function
const chunkify = async (file: Blob) => {
const totalSize = file.size;
const chunkSize = 1024 * 1024 * 100; // 100MB
const chunks = [] as ...
2
votes
1
answer
375
views
Reading large csv file in chunks with pandas
I am trying to read large csv file (84GB) in chunks with pandas, filter out necessary rows and convert it to df
import pandas as pd
chunk_size = 1000000 # Number of rows to read per chunk
my_df = pd....
3
votes
1
answer
2k
views
SQLAlchemy isn't batching rows / using server side cursor via `yield_per`
Following documentation, and the code snippet provided from https://docs.sqlalchemy.org/en/14/core/connections.html#streaming-with-a-fixed-buffer-via-yield-per (posted directly below), my query is not ...
0
votes
1
answer
212
views
uploaded video file with js chunking upload and php not working for file abve 10mb
`i tried writing a chunk upload code with js and php, it works fine when you upload a video file less than a 10mb but uploading a video file 17.6mb, the file uploads with it maintains it appropriate ...
0
votes
1
answer
2k
views
Download a large file >100MB from a gRPC server in chunks using gRPC streaming? How can I track which chunk to send next on the server?
I am trying to download a large file >100MB from a gRPC server using gRPC bidirectional streaming. I need to break the file into chunks on the server and stream the bytes. I am not sure how to ...
1
vote
0
answers
186
views
efficient partitioning column-wise when converting from dask dataframe to xarray (dask array)
A common task in my daily data wrangling is converting tab-delimited text files to xarray datasets and continuing analysis on the dataset and saving to zarr or netCDF format.
I have developed a data ...
0
votes
1
answer
405
views
Spring Batch Remote Chunking Chunk Response
I have implemented Spring Batch Remote Chunking with Kafka. I have implemented both Manager and worker configuration. I want to send some DTO or object in chunkresponse from worker side to Manager and ...
0
votes
0
answers
252
views
Send arrays of objects using express?
Cant figure out how to send arrays of objects
Currently I have a back-end MongoDB database with multiple collections which I have to query for matching data and send the data back to the client as ...
1
vote
1
answer
2k
views
Howto optimize webpack chunking with vue.config.js to speed up gitlab build time
My original configuration in vue.config.js using the default chunking strategy, which takes about 5 minutes to build locally and 35 minutes in gitlab pipeline, and results in one chunk being > 50MB ...
0
votes
1
answer
350
views
Chunking - regular expressions and trees
I'm a total noob so sorry if I'm asking something obvious. My question is twofold, or rather it's two questions in the same topic:
I'm studying nltk in Uni, and we're doing chunks. In the grammar I ...
6
votes
0
answers
538
views
Laravel 8 chunk memory leak
I have an artisan command that is scheduler to run every 15 minutes to update a model calculated field. The command process thousands of rows and ends up eating all of the memory.
My understanding of ...
1
vote
0
answers
896
views
How to use chunking to reduce file size
I'm trying to read a ~2.3 GB file that exceeds the RAM of my hardware. I want to apply chunking to read this file.
I tried chunksize of 10**2, 10**3, 10**4, and 10**5 but all still exceed the RAM ...
3
votes
1
answer
3k
views
What is the difference between chunk and pagination in Laravel?
I already read some article but I am still confused.
I n pagination it will execute a query when load a page but what happen in chunk ?
I read https://laravel-school.com/posts/laravel-pagination-vs-...
3
votes
1
answer
3k
views
tqdm progress for processing sequence in chunks
I am processing a sequence in chunks, where the last chunk may be shorter, and would like to show progress bar showing the number of items. The straightforward approach is
import tqdm, math
total=567
...
1
vote
0
answers
204
views
How to disable chunking in webpack 4
I am running react project developed using CRA in localhost in network tab it shows chunk.js files like shown in image
I want it show original file names like which component file is rendering, so is ...
1
vote
2
answers
2k
views
How to overwrite file with sequential chunks in Golang
How to read a large file by chunking it and process each chunk sequentially then overwrite the resulted chunk to where it exactly came from(the same position or offset of file)?
e.g: i want to read 1 ...
0
votes
1
answer
67
views
Pandas: compute daily statistics when chunking
Consider a postgres table where for the date 2022-05-01 we have 200 values for various times:
time value ...
1
vote
0
answers
145
views
Generate .rds files below 100MB to avoid Git LFS
I have a bunch of fit.objects which produce *.rds files larger than 100MB. GitHub comes with a file size limitation of 100MB and the solution of git lfs doesn't fit me since I don't want to pay for ...
0
votes
1
answer
315
views
Partition a large list into chunks with convenient I/O
I have a large list with size of approx. 1.3GB. I'm looking for the fastest solution in R to generate chunks and save them in any convenient format so that :
a) every saved file of the chunk is less ...
-1
votes
1
answer
356
views
how can insert 1000000 row from textarea into database in laravel?
how can insert 1000000 row from textarea into database in laravel 8 ???????
i write this code and just can insert 30000 row and then browser give me HTTP ERROR 500
i set max_execution_time to 300 in ...
2
votes
0
answers
2k
views
xarray chunk dataset PerformanceWarning: Slicing with an out-of-order index is generating more chunks
I am trying to run a simple calculation based on two big gridded datasets in xarray (around 5 GB altogether, daily data from 1850-2100). I keep running out of memory when I try it this way
import ...
0
votes
0
answers
261
views
Process Large Data in Chunks
I have a large program where I am trying to read approximately 30000 lines of data and process it. I know that I can use the chuncksize functionality to do this, but I think I am not executing this ...
0
votes
1
answer
259
views
Loading small amounts of data from Lumen API time
I have an API built in lumen and I plan to consume the API json response in the frontend using a single page framework like Angular.
The problem is the response from some routes contain huge amount of ...
3
votes
0
answers
559
views
Chunking a django-import-export
I am reading this article about chunking a large database operation. I am also using django-import-export and django-import-export-celery in my admin site and I would like to integrate chunking into ...
0
votes
2
answers
3k
views
Chunk data from array of arrays
Before marking this as answered by another question please note this is an array of arrays, not a flat array, also, the number I have given are an example, I have just shown them so you can visually ...
0
votes
1
answer
527
views
Read in large text file (~20m rows), apply function to rows, write to new text file
I have a very large text file, and a function that does what I want it to do to each line. However, when reading line by line and applying the function, it takes roughly three hours. I'm wondering if ...
2
votes
0
answers
1k
views
Reading Data in Chunks Using SQLAlchemy
I have a block of code that looks like this,
entities = self.session.query(Entities).filter(Entities.parent_id == 0)
index_data = {}
for entity in entities:
data = entity.__dict__
data['...
1
vote
1
answer
2k
views
How does loading HTML5 video chunks actually work?
Basically my understanding is this: whenever a video player is playing media, it is downloading it in chunks, defined by the RANGE header. The server serves only the bytes requested from the file.
...
0
votes
0
answers
357
views
Chunking >100mb file on Flask
I would like to upload files between 100-200mb using Flask on PythonAnywhere (which has a 100mb upload limit). I am trying to implement chunking but am still getting a 413 Request Entity Too Large ...