org.apache.tools.bzip2

Class CBZip2OutputStream

Implemented Interfaces:
BZip2Constants

public class CBZip2OutputStream
extends OutputStream
implements BZip2Constants

An output stream that compresses into the BZip2 format (without the file header chars) into another stream.

The compression requires large amounts of memory. Thus you should call the close() method as soon as possible, to force CBZip2OutputStream to release the allocated memory.

You can shrink the amount of allocated memory and maybe raise the compression speed by choosing a lower blocksize, which in turn may cause a lower compression ratio. You can avoid unnecessary memory allocation by avoiding using a blocksize which is bigger than the size of the input.

You can compute the memory usage for compressing by the following formula:

 400k + (9 * blocksize).
 

To get the memory required for decompression by CBZip2InputStream use

 65k + (5 * blocksize).
 
Memory usage by blocksize
BlocksizeCompression
memory usage
Decompression
memory usage
100k1300k 565k
200k2200k1065k
300k3100k1565k
400k4000k2065k
500k4900k2565k
600k5800k3065k
700k6700k3565k
800k7600k4065k
900k8500k4565k

For decompression CBZip2InputStream allocates less memory if the bzipped input is smaller than one block.

Instances of this class are not threadsafe.

TODO: Update to BZip2 1.0.1

Field Summary

protected static int
CLEARMASK
This constant is accessible by subclasses for historical purposes.
protected static int
DEPTH_THRESH
This constant is accessible by subclasses for historical purposes.
protected static int
GREATER_ICOST
This constant is accessible by subclasses for historical purposes.
protected static int
LESSER_ICOST
This constant is accessible by subclasses for historical purposes.
static int
MAX_BLOCKSIZE
The maximum supported blocksize == 9.
static int
MIN_BLOCKSIZE
The minimum supported blocksize == 1.
protected static int
QSORT_STACK_SIZE
This constant is accessible by subclasses for historical purposes.
protected static int
SETMASK
This constant is accessible by subclasses for historical purposes.
protected static int
SMALL_THRESH
This constant is accessible by subclasses for historical purposes.
protected static int
WORK_FACTOR
This constant is accessible by subclasses for historical purposes.

Fields inherited from interface org.apache.tools.bzip2.BZip2Constants

G_SIZE, MAX_ALPHA_SIZE, MAX_CODE_LEN, MAX_SELECTORS, NUM_OVERSHOOT_BYTES, N_GROUPS, N_ITERS, RUNA, RUNB, baseBlockSize, rNums

Constructor Summary

CBZip2OutputStream(OutputStream out)
Constructs a new CBZip2OutputStream with a blocksize of 900k.
CBZip2OutputStream(OutputStream out, int blockSize)
Constructs a new CBZip2OutputStream with specified blocksize.

Method Summary

static int
chooseBlockSize(long inputLength)
Chooses a blocksize based on the given length of the data to compress.
void
close()
protected void
finalize()
Overriden to close the stream.
void
flush()
int
getBlockSize()
Returns the blocksize parameter specified at construction time.
protected static void
hbMakeCodeLengths(char[] len, int[] freq, int alphaSize, int maxLen)
This method is accessible by subclasses for historical purposes.
void
write(byte[] buf, int offs, int len)
void
write(int b)

Field Details

CLEARMASK

protected static final int CLEARMASK
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.
Field Value:
-2097153

DEPTH_THRESH

protected static final int DEPTH_THRESH
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.
Field Value:
10

GREATER_ICOST

protected static final int GREATER_ICOST
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.
Field Value:
15

LESSER_ICOST

protected static final int LESSER_ICOST
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.
Field Value:
0

MAX_BLOCKSIZE

public static final int MAX_BLOCKSIZE
The maximum supported blocksize == 9.
Field Value:
9

MIN_BLOCKSIZE

public static final int MIN_BLOCKSIZE
The minimum supported blocksize == 1.
Field Value:
1

QSORT_STACK_SIZE

protected static final int QSORT_STACK_SIZE
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.

If you are ever unlucky/improbable enough to get a stack overflow whilst sorting, increase the following constant and try again. In practice I have never seen the stack go above 27 elems, so the following limit seems very generous.

Field Value:
1000

SETMASK

protected static final int SETMASK
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.
Field Value:
2097152

SMALL_THRESH

protected static final int SMALL_THRESH
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.
Field Value:
20

WORK_FACTOR

protected static final int WORK_FACTOR
This constant is accessible by subclasses for historical purposes. If you don't know what it means then you don't need it.
Field Value:
30

Constructor Details

CBZip2OutputStream

public CBZip2OutputStream(OutputStream out)
            throws IOException
Constructs a new CBZip2OutputStream with a blocksize of 900k.

Attention: The caller is resonsible to write the two BZip2 magic bytes "BZ" to the specified stream prior to calling this constructor.

Parameters:
out - the destination stream.

CBZip2OutputStream

public CBZip2OutputStream(OutputStream out,
                          int blockSize)
            throws IOException
Constructs a new CBZip2OutputStream with specified blocksize.

Attention: The caller is resonsible to write the two BZip2 magic bytes "BZ" to the specified stream prior to calling this constructor.

Parameters:
out - the destination stream.
blockSize - the blockSize as 100k units.

Method Details

chooseBlockSize

public static int chooseBlockSize(long inputLength)
Chooses a blocksize based on the given length of the data to compress.
Parameters:
inputLength - The length of the data which will be compressed by CBZip2OutputStream.
Returns:
The blocksize, between MIN_BLOCKSIZE and MAX_BLOCKSIZE both inclusive. For a negative inputLength this method returns MAX_BLOCKSIZE always.

close

public void close()
            throws IOException

finalize

protected void finalize()
            throws Throwable
Overriden to close the stream.

flush

public void flush()
            throws IOException

getBlockSize

public final int getBlockSize()
Returns the blocksize parameter specified at construction time.

hbMakeCodeLengths

protected static void hbMakeCodeLengths(char[] len,
                                        int[] freq,
                                        int alphaSize,
                                        int maxLen)
This method is accessible by subclasses for historical purposes. If you don't know what it does then you don't need it.

write

public void write(byte[] buf,
                  int offs,
                  int len)
            throws IOException

write

public void write(int b)
            throws IOException