Skip to main content
Filter by
Sorted by
Tagged with
Best practices
1 vote
1 replies
85 views

I am trying to build an OCR feature in my Flutter app that can read hotel bills in multiple formats. The challenge is that these bills do not follow a fixed layout. From each bill, I need to extract ...
Manish sahu's user avatar
Advice
0 votes
1 replies
31 views

I'm extracting tables from financial statement PDFs (like 10-Ks) Textract has a feature which allows for extracting them in markdown format. However, some of the tables don't have headers. For example:...
Jason Pereira's user avatar
0 votes
0 answers
171 views

I'm working on extracting bank transaction data from a PDF shown below using Python. Each transaction includes two dates, amounts (Money In, Money Out, Balance), and a description. The challenge is ...
Sand's user avatar
  • 1
1 vote
0 answers
93 views

ASP.NET Core 9 MVC / C# controller extracts texts from pdf using pdfpig based on code in answer How to group text to lines if there is small difference in Y position If thousands are separated by ...
Andrus's user avatar
  • 28.3k
0 votes
0 answers
62 views

Trying to import user's daily duties which are PDF format. The PDF contains no more than 20 dates and airport/time combinations which is what I need to capture from the document. I have the text ...
Paul Wilson's user avatar
0 votes
0 answers
45 views

I'm using PyMuPDF (fitz) to search for and highlight text in a PDF. However, the PDF text contains various control characters between sentences, which makes it difficult to match multi-sentence ...
Shantanu's user avatar
0 votes
0 answers
109 views

I try to extract the table text of a PDF: With the following code code i get: page 0 of page-1-ocr.pdf Tables rowsasf 49 texysdft [['', '', 'Staatlic', 'he Fische', 'rprüfung', 'in Bayern - Prü', '...
Marc's user avatar
  • 4,049
0 votes
0 answers
55 views

I'm building a RAG system for a platform where the primary content consists of videos and slides. My approach involves extracting keyframes from videos using OpenCV diff = cv2.absdiff(prev_image, ...
Daniel's user avatar
  • 13
1 vote
0 answers
63 views

I have 25–30 different types of PDF documents, each containing tables with varying structures. My ultimate goal is to extract table data from specific headings (i.e., between certain titles) and ...
Requiet's user avatar
  • 85
0 votes
0 answers
110 views

I am working on an email extraction process using Java, Spring boot, and IMAP to read emails from Gmail. The process works fine for most emails, extracting only the text content. However, one specific ...
Irfan Abdul Salam's user avatar
0 votes
1 answer
70 views

Without getting into too much detail about how we ended up in this situation (a lot of poor business decisions), I need to find the text: "SomeID=[Integer]" from a PDF file (e.g. SomeID=...
user3121062's user avatar
0 votes
4 answers
300 views

I needed to extract the video Id and the start time from any kind of youtube url that the users can input. I have a working solution but it is not right. Questions: Could someone help me to fix the ...
Zoltán Süle's user avatar
0 votes
1 answer
110 views

I have a SQLITE DB that contains fullnames (i.e., parentpath\filename, e.g. C:\Users\Public\My Music\Classic Queen\16 - Who Wants to Live Forever.mp3 I want to query and get the filename separate from ...
Jason Blue's user avatar
0 votes
0 answers
124 views

I have created one api that calls the function whose task is to extract web content from url that has been shared as a parameter. I am facing a problem when my api is getting multiple request, the ...
Irfanali Shaikh's user avatar
0 votes
2 answers
84 views

I need to extract Wööörd_03 from this string: "https://Word01.com/Word_02/Wööörd_03/Word_04/Word_05=0" My code doesn't, cause I get different results: Sub ExtractWord() Dim sString As ...
Jasco's user avatar
  • 253
0 votes
2 answers
184 views

I have a cell (D2) in Google Sheets containing a title, and I want to extract everything up to the first punctuation mark (if any) and display it in another cell. Example For example, if D2 contains: &...
Emanuele Benatti's user avatar
0 votes
1 answer
87 views

I have a column in my table in string format that contains different types of discounts: integers decimal numbers compound discounts, i.e. whole numbers interspersed with the + symbol (e.g. 10+3, 5+3+...
Matilde's user avatar
  • 53
1 vote
1 answer
476 views

I am working with a simple custom extractor in Document AI, which tries to find the following fields in any pdf uploaded: Country Nombre Adress Country Mail Adress City And i am using the following ...
Javier Romero Garcia's user avatar
0 votes
1 answer
428 views

I have built a composed model in document intelligence studio(Formerly known as Form recognizer). It is built to extract different fields from different types of document with different patterns. ...
Aakash Nakarmi's user avatar
1 vote
2 answers
106 views

I have multiple lines like this one where I need to extract the value associated with the utm_campaign field. As you can see, the value comprises digits, letter and characters (ex"-") https:...
Alan Benlolo's user avatar
0 votes
1 answer
73 views

I'm trying to use Selenium to scroll to a specific section on a webpage and retrieve the text from that section. Context: I’m working with a webpage that disables text highlighting through CSS ...
poe trenton's user avatar
0 votes
1 answer
331 views

Column A Column B Column C Iam18yearsold Iam17yearsold 7 thereisagirl therearegirls are,s I need to compare to cells and then extract only the difference to the third cell. I want to have the result ...
jajangjaras's user avatar
0 votes
1 answer
86 views

so I have this text in excel: Wed Aug 04 00:00:00 WIB 2021 and I need to extract the date to the cell beside it like 04-Aug-21 which is for me kind of complicated, can anyone help? so I already can ...
Toya Tanaj's user avatar
-1 votes
1 answer
148 views

I am trying to extract most of the information found on a government website (CFIA-CFIT Part I and Part II) and create a table in excel. This table is to have three columns; ID, Name, and Detail. The ...
Feketenyek's user avatar
0 votes
1 answer
122 views

I am trying to extract text from docx files, where I am getting collapsed text from the document like the text present at the bottom or in a random text box is extracted first and then the texts from ...
vignesh's user avatar
1 vote
0 answers
56 views

I am trying to extract text from CV in pdf extension. I come up with this script but I have a problem. The script does not extract all the text and I have problem to identify different block of the ...
emma's user avatar
  • 363
0 votes
0 answers
202 views

I'm working on a Python project where I need to extract text from DOCX files, preserving the formatted numbering. I've encountered a peculiar issue that I'm hoping someone can help me solve. The ...
Anshuman Sharma's user avatar
0 votes
0 answers
157 views

Need some guidance on extracting large compliance items from raw PDF documents. I have csv with these compliance items and I want to fine-tune a LLM such that if it reads any new PDF documents it can ...
Daremitsu's user avatar
  • 655
3 votes
1 answer
363 views

I am trying to parse a series of mathematical formulas and need to extract variable names efficiently using Polars in Python. Regex support in Polars seems to be limited, particularly with look-around ...
Olibarer's user avatar
  • 423
-1 votes
1 answer
123 views

I have been trying to extract text from PDF files to automate a significant and tedious part of my job using Python. With the help of ChatGPT, I have written multiple lines of code. However, I am ...
MDMT's user avatar
  • 1
1 vote
1 answer
383 views

I'm trying to detect text from items, which may be rotated in various directions. I've tried using Tesseract, EasyOCR, and EAST for text detection and extraction, but I am encountering issues with ...
Agura's user avatar
  • 11
1 vote
0 answers
108 views

I have 3 credentials: host acckey secretkey That from AWS. I am using AWS Signature Ver 4 method And then i want to using textract feature from AWS with Golang. I have build the code and have a ...
Hafi Ihza Farhana's user avatar
0 votes
2 answers
93 views

I have a pandas dataframe with only one column containing symbols. I need to separate those symbols in groups of 13 and 39 inside a single string. symbol 3IINFOTECH 3MINDIA 3PLAND 20MICRONS 3RDROCK ...
Hamza Ahmed's user avatar
  • 1,841
1 vote
0 answers
130 views

def get_string(img_path): img = cv2.imread(img_path) img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC) gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ...
Myat Thet's user avatar
2 votes
0 answers
236 views

brand new to rust and am trying to read a pdf file with lopdf. trying out various examples but I am just getting characters. I need all the chars like spaces, tabs, line breaks, etc...for Regex. Is ...
diogenes's user avatar
  • 2,191
0 votes
0 answers
93 views

Tying to use iText7 to extract text from pdf file outputs question marks only: ???????? ?????????? ? ???????????????????????? ??????????????????? ???????? ???????????????????????????? ?????????????????...
Andrus's user avatar
  • 28.3k
0 votes
0 answers
138 views

Tying to use iText7 8.0.4 to extract text from pdf file outputs question marks only: ???????? ?????????? ? ???????????????????????? ??????????????????? ???????? ???????????????????????????? ???????????...
Andrus's user avatar
  • 28.3k
0 votes
0 answers
58 views

How to extract the only needed info from a web page using uipath table extraction. When I try to select the specific info the other unwanted info is also selected due to its similar pattern as the ...
Dilip's user avatar
  • 1
0 votes
0 answers
128 views

I am stuck in itext7 custom strategy. My goal is to extract data from a PDF to a text file without losing the table format. My PDF has a different table structure, some table columns are horizontal ...
Ibad Ur Rehman's user avatar
2 votes
2 answers
80 views

I have a data frame with a string column, and a list of words/phrases which I would like to extract from the column. I have used the following code. df <- data.frame(string = c("A rose is a ...
ayeh's user avatar
  • 68
0 votes
2 answers
54 views

I am trying to extract data from PDF documents in R using "str_extract_all" function. I am trying to look for a date time field, which is displayed in the document in the below format: Est ...
Ram Subramanian's user avatar
1 vote
4 answers
503 views

I have a column that has phone numbers. They are usually formatted in (555) 123-4567 but sometimes they are in a different format or they are not proper numbers. I am trying to convert this field to ...
Bijan's user avatar
  • 8,826
2 votes
1 answer
1k views

I am using AWS Textract in order to extract text and tables from a pdf document. I need code that can parse the text extracted, and tables extracted and print everything in one string in the order ...
diegofigueroa79's user avatar
0 votes
0 answers
144 views

I'm facing issues with misaligned text extraction from images. I suspect the problem lies in formatting rather than extraction. Can I utilize bounding box coordinates to improve text alignment? see ...
code_comm's user avatar
0 votes
4 answers
902 views

I have a list of address data as below. But they don't follow any pattern. Comma, dot or space was used to separate words. I applied the formula =TRIM(RIGHT(A1,FIND(" ",A1,FIND(" ",...
Ngan Huynh's user avatar
0 votes
1 answer
896 views

while using a Python library to extract text from a PDF, the order of the selected text doesn't match what you visually see on the screen? For instance, when i copy some text at top of page, then a ...
Phalgun's user avatar
0 votes
1 answer
78 views

This is my some of text: PalHebron Governorate, Palestine31°31′27″N 35°6′32″E / 31.52417°N 35.10889°E / / 31.52417; 35.10889 (Hebron/Al-Khalil Old Town) Cultural:(ii), (iv), (vi) 20.6 (51) 2017 2017– ...
Midas Estanislao's user avatar
0 votes
1 answer
39 views

I have faced with the next case: I need to extract very big tar.xc archive, which contains one .dat file with size close to 15Gb. And save file to folder. But If I'm using tarfile.open(path/to/archive)...
Amali Yarmukhametov's user avatar
1 vote
1 answer
70 views

I tried to extract the number from the attached image [ But I am not getting the number 8 as an output. I tried with different PSM values as well like 6, 10 etc. This is what I have so far: image = ...
Mukul Saini's user avatar
0 votes
0 answers
287 views

I'm using 'Fitz' library to extract text from a pdf file. Bounding boxes/rectangles will be drawn around tables from which text is supposed to be extracted. The current extraction is returning the ...
Apoorva's user avatar
  • 115

1
2 3 4 5
30