BigArray class
Arrays in Java with a long index
When Java was being developed it was decided to use integer indexed arrays, leading to an upper limit for the total size of an array to be a little over 2 billion entries. Since arrays are used for the backing store of all of the Collections classes and because much of the Collections API assumes that this will be the case, Collections are also limited to around 2 billion entries as well.
Now with our 64 bit machines sometimes we want bigger Collections than will fit into our cramped JVM.
There are a few bug reports and requests for enhancement located on the Sun bug tracking system, but none of them are open. So it looks like Sun is just going to abandon us on this one. (after writing this I found the relevant bug report at Sun).
So, I decided to solve this problem, at least for me, in the future. I've begun development on a set of Collection classes that are quite happy dealing in quantities bigger than 2G. To get started the first step was to implement an Array that did indexing with longs.
The following code is my first draft for this class. It doesn't include everything I want it to, and some parts are untested, but if you need a long indexed array right now, this should get you started down the right path.
package com.squeakydolphin.bigcollections;
import java.util.Iterator;
/**
*
* @author Kenneth P. Turvey
*/
public class BigArray<E> implements Iterable<E> {
private long length;
private final int maxArrayElements;
private Object[][] values;
private class BigArrayIterator<E> implements Iterator<E> {
private long position = 0;
public boolean hasNext() {
return position < length;
}
public E next() {
return (E) get(position++);
}
public void remove() {
throw new UnsupportedOperationException("Not supported.");
}
}
public BigArray(long length) {
this(length, Integer.MAX_VALUE);
}
BigArray(long length, int maxArrayElements) {
this.maxArrayElements = maxArrayElements;
this.length = length;
int arrays = (int) (length / (long) maxArrayElements);
int lastArraySize = (int) (length % maxArrayElements);
if (lastArraySize > 0) {
values = new Object[arrays + 1][];
} else {
values = new Object[arrays][];
}
for (int index = 0; index < arrays; index++) {
values[index] = new Object[maxArrayElements];
}
if (lastArraySize > 0) {
values[arrays] = new Object[lastArraySize];
}
}
public void put(long index, E value) {
setObject(values, index, value);
}
public E get(long index) {
return (E) getObject(values, index);
}
public long getLength() {
return length;
}
/**
* Resize the array preserving the contents of the array, where the indexes
* still exist in the new array. Copies as little data as possible when
* making the new array.
*
* @param size The new size of the array.
*/
public void resize(long size) {
int fullArrays = (int) (size / (long) maxArrayElements);
int oldFullArrays = values.length - 1;
int lastArraySize = (int) (size % maxArrayElements);
int oldLastArraySize = values[values.length - 1].length;
Object[][] newValues;
if (lastArraySize > 0) {
newValues = new Object[fullArrays + 1][];
} else {
newValues = new Object[fullArrays][];
}
// Copy over full arrays
int array;
for (array = 0; array < oldFullArrays && array < fullArrays; array++) {
newValues[array] = values[array];
}
for (; array < fullArrays; array++) {
newValues[array] = new Object[maxArrayElements];
}
if (lastArraySize > 0) {
newValues[fullArrays] = new Object[lastArraySize];
}
for (int index = oldFullArrays * maxArrayElements;
index < length && index < size; index++) {
setObject(newValues, index, get(index));
}
values = newValues;
length = size;
}
private Object getObject(Object[][] values, long index) {
int array = (int) (index / maxArrayElements);
int element = (int) (index % maxArrayElements);
return values[array][element];
}
private void setObject(Object[][] values, long index, Object value) {
int array = (int) (index / maxArrayElements);
int element = (int) (index % maxArrayElements);
values[array][element] = value;
}
public Iterator<E> iterator() {
return new BigArrayIterator<E>();
}
}
After posting this, I was thinking about how much memory an object of this class will use at the point where it becomes useful, that is an array with more than 2 billion entries. If we assume that the array holds bytes we will have 2GB of data, but since the underlying arrays contain references, not bytes, we would need 2 billion references too. If we assume that we are running on a 64 bit machine and that references are 64 bits, then we see that we will need 16 GB of references. So the array of bytes will require 18 GB at the earliest point when there is an advantage for using this class over the normal arrays. It will probably be a few years before this makes sense. Hopefully Sun will have provided us with arrays indexed with longs by then.
Another thing to note is that a class for holding primitive types would quite probably be useful already. To store 2 billion integers would only require 8GB. This is well within the reach of many computers these days. So if we could just specialize our class on primitive types the class above would be useful. Unfortunately we're stuck with a generic mechanism that doesn't allow one to specialize a class on primitive types.
Not being able to specialize classes on primitive types is a real disadvantage. If you need this kind of thing now, it would be a simple matter to modify this code to work on any primitive type.
Given the memory requirements that this class has for useful applications, I think I'm going to set this aside and hope that Sun will work things out for us in the near future. Without modifications the the Java language, we really can't do this kind of thing now.
Serial Number Generation
How to get a short random sequence in Java
This is a problem that comes up frequently on comp.lang.java.programmer and I happened to stumble across some code I wrote a while back to solve the problem, so I thought I would post it here in the hope that it might help others that need it.
The basic problem is usually that some poster needs to generate a number that is short, unique and not easy to guess. The idea being that it will be used to reference a web page or a database entry or some such thing. Usually, for one reason or another, they don't want a UUID, often because it is too long.
The following code will generate a 12 alphanumeric long sequence that is (at least in theory) uniformly distributed over the possible values (4,738,381,338,321,616,896 of them). It usually returns very quickly. It isn't secure, in the sense of being cryptographically secure, but normally for these kinds of applications it isn't really necessary anyway. It doesn't guarantee a unique value, but for a database with a million entries in it this will only produce a duplicate, on average, once every 4.7 trillion calls. I think that's probably good enough. If you want to be safe, check to see if the number you get back is a duplicate, if it is call it again. You'll have to modify the code below and wrap it up in whatever class you intend to use it. I would probably reuse the random instead of creating a new one too.
import java.util.Random;
public class SerialGenerator {
public static void main(String args[]) {
long val = (long) Math.pow(36.0, 12);
long maxVal = (Long.MAX_VALUE / val) * val - 1;
Random generator = new Random();
long serial;
do {
serial = Math.abs(generator.nextLong());
} while (serial > maxVal);
serial %= val;
String serialString = Long.toString(serial, 36).toUpperCase();
while (serialString.length() < 12) {
serialString = "0" + serialString;
}
System.out.println(serialString);
}
}
A few example serials generated:
ZZDAXJWTJSNM 5KSUH4LGXEXF OJE2UV3OGSS4 FJX248722M3R 2X3K2Q1FYNIK Q0K40FQ51TK1