Serve PostgreSQL large objects via HTTP

Question

I'm building an app to serve data from a PostgreSQL database via a REST API (with Spring MVC) and a PWA (with Vaadin).

The PostgreSQL database stores files up to 2GB using Large Objects (I'm not in control of that); the JDBC driver provides streamed access to their binary content via Blob#getBinaryStream, so data does not need to be read entirely into memory.

The only requirement is that the stream from the blob must be consumed in the same transaction, otherwise the JDBC driver will throw.

The problem is that even if I retrieve the stream in a transactional repository method, both Spring MVC and Vaadin's StreamResource will consume it outside the transaction, so the JDBC driver throws.

For example, given

public interface SomeRepository extends JpaRepository<SomeEntity, Long> {

    @Transactional(readOnly = true)
    default InputStream getStream() {
        return findById(1).getBlob().getBinaryStream();
    }
}

this Spring MVC method will fail

@RestController
public class SomeController {

    private final SomeRepository repository;

    @GetMapping
    public ResponseEntity getStream() {
        var stream = repository.getStream();
        var resource = new InputStreamResource(stream);
        return new ResponseEntity(resource, HttpStatus.OK);
    }
}

and the same for this Vaadin StreamResource

public class SomeView extends VerticalLayout {

    public SomeView(SomeRepository repository) {
        var resource = new StreamResource("x", repository::getStream);
        var anchor = new Anchor(resource, "Download");
        add(anchor);
    }
}

with the same exception:

org.postgresql.util.PSQLException: ERROR: invalid large-object descriptor: 0

which means the transaction is already closed when the stream is read.

I see two possible solutions to this:

keep the transaction open during the download;
write the stream to disk during transaction and then serve the file from disk during download.

Solution 1 is an anti-pattern and a security risk: the transaction duration is left on the hands of the client and both a slow-reader or an attacker might block data access.

Solution 2 creates a huge delay between the client request and the server response, since the stream is first read from the database and written to disk.

One idea might be to start reading from the disk while the file is being written with data from the database, so that the transfer starts immediately but the transaction duration would be decoupled from the client download; but I don't know which side-effects this might have.

How can I achieve the goal of serving PostgreSQL large objects in a secure and performant way?

Paul Warren · Accepted Answer · 2018-10-30 16:09:14Z

We solved this problem in Spring Content by using threads + piped streams and a special inputstream wrapper ClosingInputStream that delays closes the connection/transaction until the consumer closes the input stream. Maybe something like this would help you too?

Just as an FYI. We have found using Postgres's OIDs and the Large Object API to be extremely slow when compared with similar databases.

Perhaps it is also possible that you might be able to just retrofit Spring Content JPA to your solution and therefore use its http endpoints (and the solution I just outlined) instead of creating your own? Something like this:-

pom.xml

   <!-- Java API -->
   <dependency>
      <groupId>com.github.paulcwarren</groupId>
      <artifactId>spring-content-jpa-boot-starter</artifactId>
      <version>0.4.0</version>
   </dependency>

   <!-- REST API -->
   <dependency>
      <groupId>com.github.paulcwarren</groupId>
      <artifactId>spring-content-rest-boot-starter</artifactId>
      <version>0.4.0</version>
   </dependency>

SomeEntity.java

@Entity
public class SomeEntity {
   @Id
   @GeneratedValue
   private long id;

   @ContentId
   private String contentId;

   @ContentLength
   private long contentLength = 0L;

   @MimeType
   private String mimeType = "text/plain";

   ...
}

SomeEntityContentStore.java

@StoreRestResource(path="someEntityContent")
public interface SomeEntityContentStore extends ContentStore<SomeEntity, String> {
}

Is all you need to get REST endpoints that will allow you to associate content with your entity SomeEntity. There is a working example in our examples repo here.

Roman-Stop RU aggression in UA · Accepted Answer · 2018-10-05 18:05:17Z

0

One option is to decouple reading from the database and writing response to client as you mentioned. The downside is the complexity of the solution, you would need to synchronize between the reader and the writer.

Another option is to first get the large object id in the main transaction and then read data in chunks, each chunk in the separate transaction.

byte[] getBlobChunk(Connection connection, long lobId, long start, long chunkSize) throws SQLException { 
   Blob blob = PgBlob(connection, lobId);
   InputStream is = blob.getBinaryStream(start, chunkSize);
   return IOUtils.toByteArray(is);
}

This solution is much simpler but has an overhead of establishing a new connection which shouldn't be a big deal if you use connection pooling.

answered Oct 5, 2018 at 18:05

Roman-Stop RU aggression in UA

16k3 gold badges52 silver badges59 bronze badges

1 Comment

Giovanni Lovato Over a year ago

Thank you Roman for the answer. I’d prefer reading the stream in a single transaction, since between multiple transaction the stream may change.

Collectives™ on Stack Overflow

Serve PostgreSQL large objects via HTTP

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related