Read files searching for a string and print its path

I am trying to write a script in Perl that searches in a particular directory and all the subdirectories. The objective for this is that the script has to read all the files in the directory and all subdirectories looking for a particular text string (any string I define). If the string is found in the file then the script prints the path and name of the file in a new text file, and continues with all the files in the directory tree.

I have somenthing like this, but I am not sure on how to continue. I am a beginner with Perl, and have no clue on all the options of this.

#!/usr/bin/perl
use strict;
use File::Find;

my $dir = 'C:\PATH\TO\DIR';
my $string = "defined";

find(\&printFile, $dir);
sub printFile {
   my $element = $_;
   open FILE, "+>>Results.txt";
   if(-f $elemento && $elemento =~ /\.txt$/) {
       my $boolean = 0;
       open CFILE, $elemento;
       while(<CFILE>) {  
           if ($string) {
               print FILE "$File::Find::name\n"; 
           }
           close CFILE;
      }
   }
   close FILE;
}

sleep(5);

Answers


You are not too far off, however there are some things you need to change.

#!/usr/bin/perl
use strict;
use warnings;  # never go without warnings
use File::Find;

my $dir = 'C:\PATH\TO\DIR';
my $string = "defined";
open my $out, ">>", "Results.txt" or die $!;  # move outside, change mode, 
                                              # 3-arg open, check return value
find(\&printFile, $dir);

sub printFile {
   my $element = $_;
   if(-f $element && $element =~ /\.txt$/) { # $elemento doesn't exist
       open my $in, "<", $element or die $!;
       while(<$in>) {
           if (/\Q$string\E/) {  # make a regex and quote metachars 
               print $out "$File::Find::name\n"; 
               last;             # stop searching once found
           }
      }
   }  # lexical file handles auto close when they go out of scope
}

Even better would be to forgo the hard coded values and skip the specific output file:

my $dir = shift;
my $string = shift;

And then just print output to STDOUT.

print "$File::Find::name\n"; 

Usage:

perl script.pl c:/path/to/dir > output.txt

As others have noted in the comments, this would easily be solved with a recursive grep. But unfortunately you seem to be using Windows, in which case it is not an option (as far as I know).


If this is truly all you need to do, you might look at ack. It will search subdirectories by default, as well as other enhancements over grep. Of course if this is to a larger Perl script then you can shell out to it, or use one of the other posted answers.

$ ack include

will return something like

src/draw.c
27:#include <stdio.h>
28:#include <stdlib.h>
29:#include "parsedef.h"
31:#include "utils.h"
32:#include "frac.h"
33:#include "sscript.h"

src/utils.c
27:#include <stdio.h>
28:#include <stdlib.h>
29:#include <string.h>

... and so on

if instead you only want the names of the files with matches use the -l flag

$ ack -l include

lib/Text/AsciiTeX.xs
src/limit.c
src/sscript.c
src/dim.c
src/frac.c
src/brace.c
src/symbols.c
src/sqrt.c
src/array.c
src/ouline.c
src/draw.c
src/utils.c
src/asciiTeX.c

The #! line is irrelevant on Windows platforms, and only a convenience on Unix. It is best if you omit it here.

Your program is mostly correct, but avoids a lot of conveniences that Perl provides to make the code more concise and comprehensible.

You should always add use warnings to your use strict as it will pick up simple errors that you may otherwise overlook.

Your file opens should use lexical file handles and the three-parameter form of open, and you should check their success as a failure to open a file invalidates most subsequent code. An idiomatic open looks like this

open my $fh, '<', 'myfile' or die $!;

It is also worh pointing out that an open mode of +>> opens the file for both read and append, which is difficult to nadle. In this case you mean just >>, but it is best to open the file once and leave it open for the duration of the program run.

This is a reworking of your program, which I hope helps you. It uses a regular expression to check whether the string appears in the current line of the file. /\Q$string/ is identical to $_ =~ /\Q$string/, i.e. it tests the $_ variable by default. The \Q in the regex is a quotemeta, which escapes any characters in the string that might otherwise behave as special characters in a regex and change the meaning of the search.

Note that, within the File::Find wanted subroutine, $_ the current working directory is set to the directory containg the current file being reported. $_ is set to the file name (without a path) and $File::Find::name is set to the full absolue file and path. Because the current directory is the one containing the file, it is easy just to open the file $_ as the path isn't needed.

use strict;
use warnings;

use File::Find;

my $dir = 'C:\path\to\dir';
my $string = 'defined';

open my $results, '>', 'results.txt' or die "Unable to open results file: $!";

find (\&printFile, $dir);

sub printFile {

  return unless -f and /\.txt$/;

  open my $fh, '<', , $_ or do {
    warn qq(Unable to open "$File::Find::name" for reading: $!);
    return;
  };

  while ($fh) {
    if (/\Q$string/) {
       print $results "$File::Find::name\n";
       return;
    }
  }
}

Need Your Help

as3 symbol variables not initialized yet

actionscript-3 variables initialization symbols

I'm initializing symbols in my timeline, and trying to access the variables within those symbols, but they return 0 or undefined even though I set the variables in the symbol's timeline. For some r...

Bean gets no transactional proxy on ContextRefreshedEvent in Spring 4.2.5

java spring hibernate

I have a bean (SettingService) which is decorated with the @Transactional annotation and injected into another bean where this bean is invoked on the context refreshed event.