I'm building an app to serve data from a PostgreSQL database via a REST API (with Spring MVC) and a PWA (with Vaadin).
The PostgreSQL database stores files up to 2GB using Large Objects (I'm not in control of that); the JDBC driver provides streamed access to their binary content via Blob#getBinaryStream, so data does not need to be read entirely into memory.
The only requirement is that the stream from the blob must be consumed in the same transaction, otherwise the JDBC driver will throw.
The problem is that even if I retrieve the stream in a transactional repository method, both Spring MVC and Vaadin's StreamResource will consume it outside the transaction, so the JDBC driver throws.
For example, given
public interface SomeRepository extends JpaRepository<SomeEntity, Long> {
@Transactional(readOnly = true)
default InputStream getStream() {
return findById(1).getBlob().getBinaryStream();
}
}
this Spring MVC method will fail
@RestController
public class SomeController {
private final SomeRepository repository;
@GetMapping
public ResponseEntity getStream() {
var stream = repository.getStream();
var resource = new InputStreamResource(stream);
return new ResponseEntity(resource, HttpStatus.OK);
}
}
and the same for this Vaadin StreamResource
public class SomeView extends VerticalLayout {
public SomeView(SomeRepository repository) {
var resource = new StreamResource("x", repository::getStream);
var anchor = new Anchor(resource, "Download");
add(anchor);
}
}
with the same exception:
org.postgresql.util.PSQLException: ERROR: invalid large-object descriptor: 0
which means the transaction is already closed when the stream is read.
I see two possible solutions to this:
- keep the transaction open during the download;
- write the stream to disk during transaction and then serve the file from disk during download.
Solution 1 is an anti-pattern and a security risk: the transaction duration is left on the hands of the client and both a slow-reader or an attacker might block data access.
Solution 2 creates a huge delay between the client request and the server response, since the stream is first read from the database and written to disk.
One idea might be to start reading from the disk while the file is being written with data from the database, so that the transfer starts immediately but the transaction duration would be decoupled from the client download; but I don't know which side-effects this might have.
How can I achieve the goal of serving PostgreSQL large objects in a secure and performant way?