storing string: final class vs. file from external storage

I'm building a Java program. I have a list of person names which is used to check whether a word is a name or not. I don't know if I should store these names in an external file or put them as public static fields in a final class? Now my list has only 1000 names but what if it's up to 100.000 items? Thanks for your helps!

Edit - I don't really want to use something like a database as it requires other installations when executing in other machines. A simple txt file sounds good but in order to check the word names, I still have to load all list items (say 10,000 names) into my program and search for names, which makes me a little confused: if it's better than the aforementioned way of pre-defined an array of constants in a final class. Can you help me figure out one effective way to implement this?

Answers


There's no programming benefit is having the strings in code, so externalize the data to a database.

However, even a flat file is a "database" or sorts, so I would put the task of loading the names behind an interface and leave the implementation to decide how the data is to be loaded:

Some implementation options are:

  • return hard coded values
  • read values from a file
  • read from an SQL database
  • read from a NoSQL database
  • read from a web service

Pick the implementation that works best for you now - it's easy to change if you need later on.

An example interface:

public interface NameLoader {
    public List<String> getNames();
}

Have your main program use one of these and supply the implementation using a factory method:

public class NameLoader() {
    public NameLoader createNameLoader() {
        // code the implementation you're currently using here
        return new FileBasedNameLoader();
    }
}

You could go further and externalise the class to use via Spring or properties file etc.


Anything over 100 or more strings i would suggest some form of database or other means of storing the data.. If the array of names that is currently 1000 could grow or shrink in size.. Then you would best want a database so you do not have to modify the program each time an entry/query/edit has been done.

If you are trying to keep the application itself VERY basic and do not wish to result to something like a SQL or other DB.. You might want to look at Flat File db's or along those lines.

EDIT - Even something as simple as a *.txt file that has a comma separated array with all the names in it can be simple and effective if done right.


Use an external persistence mechanism of some kind; e.g. a file, a database, etcetera. There are many ways to do this, and which is best depends on your (complete) application's requirements.

Now my list has only 1000 names but what if it's up to 100,000 items?

Well, right now, it will work (sort of). But when you scale up, you will start to run into hard limits imposed by the JVM specification. For instance:

  • There is a hard limit on the number of constants (e.g. String literals) in the constant pool.
  • There is a hard limit on the size of a single method or constructor. (That includes the synthetic method that does a classes static initialization.)

Another problem with the "hard-wired list in the code" approach that you would need to edit the source code and recompile it each time you changed the list. That is not practical for a production-quality system.


A simple txt file sounds good ....

It is, if you don't need the power of a database; see below.

... but in order to check the word names, I still have to load all list items (say 10,000 names) into my program ...

That is correct. But it is hardly a problem! You should be able to implement the file loading in 10 or so lines of code. And it shouldn't be appreciably slower than embedding the data in your class file. (After all, the class loader has to read the entire classfile and turn all of the embedded data into intern'd String objects.)

... and search for names

But that's no different from if you had the list of names hard-wired. You have to search the list whether it is loaded from a file into an array / list , or if the array / list is created and initialized from a bunch of string literals in your code.

which makes me a little confused: if it's better than the aforementioned way of pre-defined an array of constants in a final class

It is. For the reasons that were listed above, and in other answers.


For the record:

  • if you need to repeatedly search your long list of names, then it is better (more efficient) to load them into a HashSet. That gives faster search times.

  • if you only need to do the search a small number of times and/or if the list is really large, then a database is more efficient. It will have fast search times, AND it will avoid the need to read the entire dataset into memory.


Need Your Help

.NET Regular Expression to find actual words in text

.net regex

I am using VB .NET to write a program that will get the words from a suplied text file and count how many times each word appears. I am using this regular expression:-

An existing connection was forcibly closed by the remote host

c# .net networking sockets

I am working with a commercial application which is throwing a SocketException with the message,