I have a file that contains a 400 images. What I want is to separate this file into two files: train_images and test_images.

The train_images should contains 150 images selected randomly, and all these images must be different from each other. Then, the test_images should also contains 150 images selected randomly, and should be different from each other, even from the images selected in the file train_images.

I begin by writing a code that aims to select a random number of images from a Faces file and put them on train_images file. I need your help in order to respond to my behavior described above.

clear all;
close all;


ImageFiles = dir('Faces');
   totalNumberOfImages = length(ImageFiles)-1;
   scrambledList = randperm(totalNumberOfImages);
   numberIWantToUse = 150;
   loop_counter = 1;
   for index = scrambledList(1:numberIWantToUse)
        baseFileName = ImageFiles(index).name;
        str = fullfile('faces', baseFileName); % Better than STRCAT

        face = imread(str);

        imwrite( face, fullfile(Train_images, ['hello' num2str(index) '.jpg']));

        loop_counter = loop_counter + 1;

Any help will be very appreciated.


Your code looks good to me. When you implement the test, you can re-run the scrambledList = randperm(totalNumberOfImages); then select the first 150 elements in scrambledList as you did in training process.

You can also directly re-initialize the loop:

for index = scrambledList(numberIWantToUse+1 : 2*numberIWantToUse)
   ... % same thing you wrote in your training loop


with this approach, your test sample will be completely different from the training sample.

Supposing that you have the Bioinformatics Toolbox, you can use crossvalind using the parameter HoldOut:

This is an example. trainand test are logical arrays, so you can use findto get the actual indexes:

ImageFiles = dir('Faces');
ImageFilesIndexes = ones(1,length(ImageFiles )) %Use a numeric array instead the char array
proportion = 150/400; %Testing set
[train,test] = crossvalind('holdout',ImageFilesIndexes,proportion );
training_files = ImageFiles(train); %250 files: It is better to use more data to train
testing_files = ImageFiles(test); %150 files

%Then do whatever you like with the files

Other possibilities are dividerand ( Neural Network Toolbox) and cvpartition (Statistics Toolbox)

