What is making my code use so much memory?

I have solved in various ways a simple problem on CodeEval, which specification can be found here (only a few lines long).

I have made 3 working versions (one of them in Scala) and I don't understand the difference of performances for my last Java version which I expected to be the best time and memory-wise.

I also compared this to a code found on Github. Here are the performance stats returned by CodeEval :

. Version 1 is the version found on Github . Version 2 is my Scala solution :

object Main extends App {
    val p = Pattern.compile("\\d+")
    scala.io.Source.fromFile(args(0)).getLines
        .filter(!_.isEmpty)
        .map(line => {
            val dists = new TreeSet[Int]
            val m     = p.matcher(line)
            while (m.find) dists += m.group.toInt

            val list  = dists.toList
            list.zip(0 +: list).map { case (x,y) => x - y }.mkString(",")
        })
        .foreach(println)
}

. Version 3 is my Java solution which I expected to be the best :

public class Main {
    public static void main(String[] args) throws IOException {
        Pattern        p    = Pattern.compile("\\d+");
        File           file = new File(args[0]);
        BufferedReader br   = new BufferedReader(new FileReader(file));

        String line;
        while ((line = br.readLine()) != null) {
            Set<Integer> dists = new TreeSet<Integer>();
            Matcher      m     = p.matcher(line); 
            while (m.find()) dists.add(Integer.parseInt(m.group()));

            Iterator<Integer> it   = dists.iterator();
            int               prev = 0;
            StringBuilder     sb   = new StringBuilder();
            while (it.hasNext()) {
                int curr = it.next();
                sb.append(curr - prev);
                sb.append(it.hasNext() ? "," : "");
                prev = curr;
            }
            System.out.println(sb);
        }
        br.close();
    }
}
  • Version 4 is the same as version 3 except I don't use a StringBuilder to print the output and do like in version 1

Here is how I interpreted those results :

  • version 1 is too slow because of the too high number of System.out.print calls. Moreover, using split on very large lines (that's the case in the tests performed) uses a lot of memory.

  • version 2 seems slow too but it is mainly because of an "overhead" on running Scala code on CodeEval, even very efficient code run slowly on it

  • version 2 uses unnecessary memory to build a list from the set, which also takes some time but should not be too significant. Writing more efficient Scala would probably like writing it in Java so I preferred elegance to performance

  • version 3 should not use that much memory in my opinion. The use of a StringBuilder has the same impact on memory as calling mkString in version 2

  • version 4 proves the calls to System.out.println are slowering down the program

Does someone see an explanation to those results ?

Answers


I conducted some tests.

There is a baseline for every type of language. I code in java and javascript. For javascript here are my test results:

  • Rev 1: Default empty boilerplate for JS with a message to standard output
  • Rev 2: Same without file reading
  • Rev 3: Just a message to the standard output

You can see that no matter what, there will be at least 200 ms runtime and about 5 megs of memory usage. This baseline depends on the load of the servers as well! There was a time when codeevals was heavily overloaded, thus making impossible to run anything within the max time(10s).

Check this out, a totally different challenge than the previous:

  • Rev4: My solution
  • Rev5: The same code submitted again now. Scored 8000 more ranking point. :D

Conclusion: I would not worry too much about CPU and memory usage and rank. It is clearly not reliable.


Your scala solution is slow, not because of "overhead on CodeEval", but because you are building an immutable TreeSet, adding elements to it one by one. Replacing it with something like

val regex = """\d+""".r // in the beginning, instead of your Pattern.compile
...
.map { line => 
    val dists = regex.findAllIn(line).map(_.toInt).toIndexedSeq.sorted
...

Should shave about 30-40% off your execution time.

Same approach (build a list, then sort) will, probably, help your memory utilization in "version 3" (java sets are real memory hogs). It is also a good idea to give your list an initial size while you are at it (otherwise, it'll grow by 50% every time it runs out of capacity, which is wasteful in both memory and performance). 600 sounds like a good number, since that's the upper bound for the number of cities from the problem description.

Now, since we know the upper boundary, an even faster and slimmer approach is to do away with lists and boxed Integeres, and just do int dists[] = new int[600];.

If you wanted to get really fancy, you'd also make use of the "route length" range that's mentioned in the description. For example, instead of throwing ints into an array and sorting (or keeping a treeset), make an array of 20,000 bits (or even 20K bytes for speed), and set those that you see in input as you read it ... That would be both faster and more memory efficient than any of your solutions.


I tried solving this question and figured that you don't need the names of the cities, just the distances in a sorted array.

It has much better runtime of 738ms, and memory of 4513792 with this.

Although this may not help improve your piece of code, it seems like a better way to approach the question. Any suggestions to improve the code further are welcome.

import java.io.*;
import java.util.*;

public class Main {
public static void main (String[] args) throws IOException {
    File file = new File(args[0]);
    BufferedReader buffer = new BufferedReader(new FileReader(file));
    String line;
    while ((line = buffer.readLine()) != null) {
        line = line.trim();

        String out = new Main().getDistances(line); 

        System.out.println(out);
    }
}

public String getDistances(String s){

    //split the string
    String[] arr = s.split(";");

    //create an array to hold the distances as integers
    int[] distances = new int[arr.length];

    for(int i=0; i<arr.length; i++){
         //find the index of , - get the characters after that - convert to integer - add to distances array
        distances[i] = Integer.parseInt(arr[i].substring(arr[i].lastIndexOf(",")+1));    
    }

    //sort the array
    Arrays.sort(distances);

    String output = "";
    output += distances[0]; //append the distance to the closest city to the string

    for(int i=0; i<arr.length-1; i++){

        //get distance between current element(city) and next
        int distance_between = distances[i+1] - distances[i];

        //append the distance to the string
        output += "," + distance_between;
    }


    return output;   
 }
 }

Need Your Help

Adding WCF reference to web app can't find endpoint in config?

c# .net web-services wcf

UPDATE: I got this exact same config to work from another web server. Any thoughts on why it works for one but not the other?

Odd Binding behavior in WPF

wpf binding combobox master-detail selectedvalue

I will try and explain this as concise as possible. I have 2 objects, the first which we will call object A that has an Id property and the second we will call object B, which has a ParentId proper...