Java Performance - ArrayLists versus Arrays for lots of fast reads

I have a program where I need to make 100,000 to 1,000,000 random-access reads to a List-like object in as little time as possible (as in milliseconds) for a cellular automata-like program. I think the update algorithm I'm using is already optimized (keeps track of active cells efficiently, etc). The Lists do need to change size, but that performance is not as important. So I am wondering if the performance from using Arrays instead of ArrayLists is enough to make a difference when dealing with that many reads in such short spans of time. Currently, I'm using ArrayLists.

Edit: I forgot to mention: I'm just storing integers, so another factor is using the Integer wrapper class (in the case of ArrayLists) versus ints (in the case of arrays). Does anyone know if using ArrayList will actually require 3 pointer look ups (one for the ArrayList, one for the underlying array, and one for the Integer->int) where as the array would only require 1 (array address+offset to the specific int)? Would HotSpot optimize the extra look ups away? How significant are those extra look ups?

Edit2: Also, I forgot to mention I need to do random access writes as well (writes, not insertions).

Answers


Now that you've mentioned that your arrays are actually arrays of primitive types, consider using the collection-of-primitive-type classes in the Trove library.

@viking reports significant (ten-fold!) speedup using Trove in his application - see comments. The flip-side is that Trove collection types are not type compatible with Java's standard collection APIs. So Trove (or similar libraries) won't be the answer in all cases.


Try both, but measure.

Most likely you could hack something together to make the inner loop use arrays without changing all that much code. My suspicion is that HotSpot will already inline the method calls and you will see no performance gain.

Also, try Java 6 update 14 and use -XX:+DoEscapeAnalysis


ArrayLists are slower than Arrays, but most people consider the difference to be minor. In your case could matter though, since you're dealing with hundreds of thousands of them.

By the way, duplicate: Array or List in Java. Which is faster?


I would go with Kevin's advise.

Stay with the lists first and measure your performance if your programm is to slow compare it to a version with an array. If that gives you a measurable performance boost go with the arrays, if not stay with the lists because they will make your life much much easier.


There will be an overhead from using an ArrayList instead of an array, but it is very likely to be small. In fact, the useful bit of data in the ArrayList can be stored in registers, although you will probably use more (List size for instance).

You mention in your edit that you are using wrapper objects. These do make a huge difference. If you are typically using the same value repeatedly, then a sensible cache policy may be useful (Integer.valueOf gives the same results for -128 to 128). For primitives, primitive arrays usually win comfortably.

As a refinement, you might want to make sure the adjacent cells tend to be adjacent in the array (you can do better than rows of columns with a space filling curve).


One possibility would be to re-implement ArrayList (it's not that hard), but expose the backing array via a lock/release call cycle. This gets you convenience for your writes, but exposes the array for a large series of read/write operations that you know in advance won't impact the array size. If the list is locked, add/delete is not allowed - just get/set.

for example:

  SomeObj[] directArray = myArrayList.lockArray();
  try{
    // myArrayList.add(), delete() would throw an illegal state exception
    for (int i = 0; i < 50000; i++){
      directArray[i] += 1;
    }
  } finally {
    myArrayList.unlockArray();
  }

This approach continues to encapsulate the array growth/etc... behaviors of ArrayList.


Java uses double indirection for its objects so they can be moved about in memory and have its references still be valid, this means every reference lookup is equivalent to two pointer lookups. These extra lookups cannot be optimised away completely.

Perhaps even worse is your cache performance will be terrible. Accessing values in cache is goings to be many times faster than accessing values in main memory. (perhaps 10x) If you have an int[] you know the values will be consecutive in memory and thus load into cache readily. However, for Integer[] the Integers individual objects can appear randomly across your memory and are much more likely to be cache misses. Also Integer use 24 bytes which means they are much less likely to fit into your caches than 4 byte values.

If you update an Integer, this often results in a new object created which is many orders of magnitude than updating an int value.


If you're creating the list once, and doing thousands of reads from it, the overhead from ArrayList may well be slight enough to ignore. If you're creating thousands of lists, go with the standard array. Object creation in a loop quickly goes quadratic, simply because of all the overhead of instantiating the member variables, calling the constructors up the inheritance chain, etc.

Because of this -- and to answer your second question -- stick with standard ints rather than the Integer class. Profile both and you'll quickly (or, rather, slowly) see why.


If you're not going to be doing a lot more than reads from this structure, then go ahead and use an array as that would be faster when read by index.

However, consider how you're going to get the data in there, and if sorting, inserting, deleting, etc, are a concern at all. If so, you may want to consider other collection based structures.


Primitives are much (much much) faster. Always. Even with JIT escape analysis, etc. Skip wrapping things in java.lang.Integer. Also, skip the array bounds check most ArrayList implementations do on get(int). Most JIT's can recognize simple loop patterns and remove the loop, but there isn't much reason to much with it if you're worried about performance.

You don't have to code primitive access yourself - I'd bet you could cut over to using IntArrayList from the COLT library - see http://acs.lbl.gov/~hoschek/colt/ - "Colt provides a set of Open Source Libraries for High Performance Scientific and Technical Computing in Java") - in a few minutes of refactoring.


The options are: 1. To use an array 2. To use the ArrayList which internally uses an array

It is obvious the ArrayList introduces some overhead (look into ArrayList source code). For the 99% of the use cases this overhead can be easily ignored. However if you implement time sensitive algorithms and do tens of millions of reads from a list by index then using bare arrays instead of lists should bring noticeable time savings. USE COMMON SENSE.

Please take a look here: http://robaustin.wikidot.com/how-does-the-performance-of-arraylist-compare-to-array I would personally tweak the test to avoid compiler optimizations, e.g. I would change "j = " into "j += " with the subsequent use of "j" after the loop.


An Array will be faster simply because at a minimum it skips a function call (i.e. get(i)).

If you have a static size, then Arrays are your friend.


Need Your Help

How to map json response to pojo

java android json mapping gson

What should I do to convert json response to object(pojo) using GSON lib? I have response from webservice:

How do I do this Database Model in Django?

python mysql database django

Django currently does not support the "Point" datatype in MySQL. That's why I created my own.