HDFS API - count the number of directories, files and bytes
How do you get the DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME in HDFS programatically in Scala/Java? (Not through Shell)
val fileStatus = fileSystem.getFileStatus(new Path(path)) val fileByteSize = fileStatus.getLen
FileSystem API doesn't seem to have those information. I can only get the file size of 1 file (code above). But I don't get the file count and byte size per directory.
I'm looking for a similar behavior to:
hdfs dfs -count [-q] <paths>
which count the number of directories, files and bytes under the path provided
You can use FileSystem.listStatus method to get information about files and directories in a given HDFS directory.
You can use the returned array of FileStatus objects to calculate total size, count of files, etc.