<< Serial Number Generation | Home | Grizzly Death >>

BigArray class

Arrays in Java with a long index

When Java was being developed it was decided to use integer indexed arrays, leading to an upper limit for the total size of an array to be a little over 2 billion entries. Since arrays are used for the backing store of all of the Collections classes and because much of the Collections API assumes that this will be the case, Collections are also limited to around 2 billion entries as well.

Now with our 64 bit machines sometimes we want bigger Collections than will fit into our cramped JVM.

There are a few bug reports and requests for enhancement located on the Sun bug tracking system, but none of them are open. So it looks like Sun is just going to abandon us on this one. (after writing this I found the relevant bug report at Sun).

So, I decided to solve this problem, at least for me, in the future. I've begun development on a set of Collection classes that are quite happy dealing in quantities bigger than 2G. To get started the first step was to implement an Array that did indexing with longs.

The following code is my first draft for this class. It doesn't include everything I want it to, and some parts are untested, but if you need a long indexed array right now, this should get you started down the right path.

package com.squeakydolphin.bigcollections;

import java.util.Iterator;

/**
 *
 * @author Kenneth P. Turvey
 */
public class BigArray<E> implements Iterable<E> {
    private long length; 

    private  final int maxArrayElements; 

    private Object[][] values; 

    private class BigArrayIterator<E> implements Iterator<E>  {
        private long position = 0; 

        public boolean hasNext() {
            return position < length;
        }

        public E next() {
            return (E) get(position++); 
        }

        public void remove() {
            throw new UnsupportedOperationException("Not supported.");
        }
        
    }
    
    public BigArray(long length) {
        this(length, Integer.MAX_VALUE); 
    }

    BigArray(long length, int maxArrayElements) {
        this.maxArrayElements = maxArrayElements; 
        this.length = length; 
        int arrays = (int) (length / (long) maxArrayElements); 
        int lastArraySize = (int) (length % maxArrayElements); 
        if (lastArraySize > 0) {
            values = new Object[arrays + 1][]; 
        } else { 
            values = new Object[arrays][]; 
        }

        for (int index = 0; index < arrays; index++) {
            values[index] = new Object[maxArrayElements]; 
        }
        if (lastArraySize > 0) { 
            values[arrays] = new Object[lastArraySize]; 
        }
    }

    public void put(long index, E value) {
        setObject(values, index, value);
    }

    public E get(long index) {
        return (E) getObject(values, index);  
    }

    public long getLength() {
        return length;
    }

    /**
     * Resize the array preserving the contents of the array, where the indexes
     * still exist in the new array.  Copies as little data as possible when 
     * making the new array.  
     * 
     * @param size The new size of the array.
     */
    public void resize(long size) {
        int fullArrays = (int) (size / (long) maxArrayElements); 
        int oldFullArrays = values.length - 1;
        int lastArraySize = (int) (size % maxArrayElements); 
        int oldLastArraySize = values[values.length - 1].length; 
        Object[][] newValues; 
        if (lastArraySize > 0) {
            newValues = new Object[fullArrays + 1][]; 
        } else { 
            newValues = new Object[fullArrays][]; 
        }

        // Copy over full arrays
        int array;
        for (array = 0; array < oldFullArrays && array < fullArrays; array++) {
            newValues[array] = values[array]; 
        }

        for (; array < fullArrays; array++) { 
            newValues[array] = new Object[maxArrayElements]; 
        }

        if (lastArraySize > 0) {
            newValues[fullArrays] = new Object[lastArraySize]; 
        }

        for (int index = oldFullArrays * maxArrayElements; 
                index < length && index < size; index++) {
            setObject(newValues, index, get(index));             
        }

        values = newValues;
        length = size; 
    }

    private Object getObject(Object[][] values, long index) {
        int array = (int) (index / maxArrayElements);
        int element = (int) (index % maxArrayElements); 
        return values[array][element]; 
    }

    private void setObject(Object[][] values, long index, Object value) {
        int array = (int) (index / maxArrayElements);
        int element = (int) (index % maxArrayElements); 
        values[array][element] = value; 
    }

    public Iterator<E> iterator() {
        return new BigArrayIterator<E>(); 
    }
}

After posting this, I was thinking about how much memory an object of this class will use at the point where it becomes useful, that is an array with more than 2 billion entries. If we assume that the array holds bytes we will have 2GB of data, but since the underlying arrays contain references, not bytes, we would need 2 billion references too. If we assume that we are running on a 64 bit machine and that references are 64 bits, then we see that we will need 16 GB of references. So the array of bytes will require 18 GB at the earliest point when there is an advantage for using this class over the normal arrays. It will probably be a few years before this makes sense. Hopefully Sun will have provided us with arrays indexed with longs by then.

Another thing to note is that a class for holding primitive types would quite probably be useful already. To store 2 billion integers would only require 8GB. This is well within the reach of many computers these days. So if we could just specialize our class on primitive types the class above would be useful. Unfortunately we're stuck with a generic mechanism that doesn't allow one to specialize a class on primitive types.

Not being able to specialize classes on primitive types is a real disadvantage. If you need this kind of thing now, it would be a simple matter to modify this code to work on any primitive type.

Given the memory requirements that this class has for useful applications, I think I'm going to set this aside and hope that Sun will work things out for us in the near future. Without modifications the the Java language, we really can't do this kind of thing now.




Add a comment Send a TrackBack