Java EE: Make an @Entity “empty” using a @Transient proxy

By | February 8, 2017

In some cases while optimizing code to minimize the database footprint of a Java EE application we get to the point of trying to get rid of duplicate data.
In a lot of cases several steps of the processing flow tend to store partially processed data or states of data and in a lot of cases we generate this way unwanted duplicates. When we are talking about blobs that get duplicated this can have a huge footprint in the database and in time we discover that we waste hundreds of GB of database space. This is wasted space but also decrease in database performance due to the extra overhead introduced by this duplicate data.

Lets take the concrete case of having two entities defined as bellow:

An image entity:

@Entity
public class Image {
   private String imageName;
   @Lob
   private byte[] imageData;
}

And a container entity that refers to it.

@Entity
public class CollectionItem {
protected Image image;

public void setImage(Image image) {
	this.image = image;
}
public Image getImage() {
	return (Image) JPAUtils.getInstance().getBehindProxyObject(image);
}
....

We also know that the actual image data exists also into another entity called Message that represents in fact an incommoding data package that contains also the image file binary. This is usually the case of enterprise systems where the input data must be kept for audit purposes in the exact form it was received.

So in our case we will have 3 entities that correspond to 3 tables in the database: Message,Image,CollectionItem where we have redundant data in Message and Image tables. To eliminate the data duplication we have to do the following changes.

Define a new proxy class that will stay in front of the real Image entity.

public class TransientImage extends Image {
}

Change the container entity and add a new field marked with the @Transient annotation. This annotation specifies that the property or field is not persistent. Change the setter and getter of the image member to use instead the proxy object.

@Entity
public class CollectionItem {
protected Image image;
@Transient protected Image transientImage;

public void setImage(Image image) {
    this.transientImage = image;
}
public Image getImage() {
    if (image == null) {
	if (transientImage != null) {
		return transientImage;
		}
	// try to get it from the Message
	transientImage = getItemPicturesFromZip(sourceMessage,this);
	return transientImage;
     }
   return (Image) JPAUtils.getInstance().getBehindProxyObject(image);
}
....	
}

With the new setter we in fact “fake” save the binary data into the entity, instead we just cache the data in a transient object. When a new CollectionItem is created no data will be saved under the Image entity.

With the new getter if the image=null we are in the case no data exists in the Image table for that image data, so then we extract the real data from the input message (getItemPicturesFromZip(sourceMessage,this) ) and we cache the data in a transient field in case we need it for some operations in the current transaction.

With the new getter in case of legacy images (image!=null) for which we have an entry in the Image table we keep the old behaviour.

This is a very useful trick that can be applied to a lot of cases when duplicate data can be eliminated.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.