Monday, November 5, 2012

A Few File Operations || Building a Synchronizer

Recently in the project I’ve been working in the last couple of months, we had to think in a way to solve the problem of keep updated some local copies of repositories where the originals are located in remote servers. Due to the kind operations we need to run, we couldn’t afford to do them directly in the remote servers. If you are thinking right now, “well duh! Use SVN you dummy” , well, let’s say that our client does not have it and there is no close possibility he will install it for us. We only had access to the share drives where we could read the file systems. That’s all we had. This kind of operation we required is known (at least that’s how we use it) as directory synchronization. In our case we needed to keep the most updated version as possible of the remote files. The synchronization is very useful when the cost to copy everything is too high. 

For example if you have a remote directory with 200GB, you don’t want to copy everything every time you want to update your local copy. It just takes too much time. I did some research to find a tool that could do what I wanted, and I did find some good ones, but with the only inconvenience that I needed something I could customize to our processes. So I started playing a bit with the java.io.File class and realized that I could program a Synchronizer.

In this post I want to share some useful operations of the File class in light of the problem that my team needed to solve. Let’s put an example of what the Synchronizer needs to do. 

Let’s suppose we have this remote directory:
gsolano_remote
 + 20100514
++ calculations.xls
+ 20100514
++ HelloWorld.java
+ readme.txt

 And we have the local copy that need s to be updated:
 gsolano_local
+ 20100514
++ calculations.xls
++ deletelater
+ bck-ups
+ readme.txt
+ dir.txt

 If we compare the two directories, the local copy would have to execute the next actions (enclosed in parentheses).
 gsolano_local
+ 20100514
++ calculations.xls
++ deletelater (remove)
+ bck-ups (remove)
+ readme.txt (update)
+ dir.txt (remove)
+ 20100514 (add)
++ HelloWorld.java (add)

 The logic that needs to be coded to run those actions is very simple. First we list the files from source (gsolano_remote) and target (gsolano_local), then we compare them to extract: + List of new files to copy from source to target, + List of files that need to be updated because they were modified in the source. + Files and directories that are no longer present in the source and for instance need to be removed from target. Once we get these lists we just have to execute the respective copies and deletions. Let’s examine first how to scan files from a directory.
package gsolano;
import java.io.File;
import java.io.IOException;
import java.util.LinkedHashMap;
import java.util.Map;

public class Dir { 
 
 /**
  * Returns a list of all file paths relative to the provided path.
  * @param path
  * @return list of relative paths.
  */
 public static Map<String, Long> scan(String path) {
  Map<String, Long> fileList = new LinkedHashMap<String, Long>();
  scanFiles(path.toLowerCase(), path, fileList);
  return fileList;
 }
 
 /**
  * Method for recursively scan. 
  * @param rootSource
  * @param path
  * @param fileList
  */
 private static void scanFiles(String rootSource, String path, Map<String, Long> fileList) {
   File folder = new File(path); // This is the root directory.
  // List files from first level of root directory.
   File[] listOfFiles = folder.listFiles(); 
   
   if (listOfFiles.length == 0) {
    // Used to keep record of empty folders.
    fileList.put(path.toLowerCase().replace(rootSource, "") 
        + File.separator + ".", new Long(0));
   }
   else {
    for (int i = 0; i < listOfFiles.length; i++) {
     if (listOfFiles[i].isFile()) { // Is it a file?     
      try {
       // Add it to the file list with the last modified date.
      fileList.put(listOfFiles[i].getAbsolutePath().toLowerCase()
        .replace(rootSource, ""), listOfFiles[i].lastModified());
      
     } catch (Exception e) {  
      e.printStackTrace();
     }
     } else if (listOfFiles[i].isDirectory()) { // Is it a directory?
      try {
       // Recursively call for new found directory.
       scanFiles(rootSource, listOfFiles[i].getCanonicalPath(), fileList);
     } catch (IOException e) {     
      e.printStackTrace();
     }
     }   
    }
   }
  
 }
}

In this code we start exploring some capabilities of the File class. The first one is the ability to list files from a directory. We simply create an instance of a File class with the path of a directory and then we use the function “listFiles()”.

File folder = new File(path); 
File[] listOfFiles = folder.listFiles();
Now, this function only gets the files and directories at the first level, it does not retrieve the files of subsequent directories in the next levels; that’s why in the Dir class the file scanning works with a recursively function. To determine if we need to execute a recursively call, we use the functions “isFile()” and “isDirectory()”. If the file that is read is a directory (sound weird, I agree), then a recursively call is made. If it is a file, then it is added to the list. In this class we are also using the function “lastModified()” to store the last modified of each of the files scanned. This will be used to determine if the file from source changed causing to have to update the file in the target. Before jumping to the main class, let’s take a look of the class used to copy files. I modified a bit a class the I found in the Internet :
package gsolano;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

public class FileCopy {

 /**
  * Copies one file from the source to specified target.
  * @param fromFileName
  * @param toFileName
  * @param overrideFiles
  * @throws IOException
  */
 public static void copy(String fromFileName, String toFileName,
   boolean overrideFiles) throws IOException {
  
  File toFile = new File(toFileName);
  if (toFile.exists() && !overrideFiles) {
   return;
  }
  File fromFile = new File(fromFileName);

  if (!fromFile.exists())
   throw new IOException("FileCopy: " + "no such source file: "
     + fromFileName);
  if (!fromFile.isFile())
   throw new IOException("FileCopy: " + "can't copy directory: "
     + fromFileName);
  if (!fromFile.canRead())
   throw new IOException("FileCopy: " + "source file is unreadable: "
     + fromFileName);

  if (toFile.isDirectory()) {
   toFile = new File(toFile, fromFile.getName());
  }
  if (toFile.exists()) {
   if (!toFile.canWrite()) {
    throw new IOException("FileCopy: "
      + "destination file is unwriteable: " + toFileName);
   }
   String parent = toFile.getParent();
   if (parent == null)
    parent = System.getProperty("user.dir");
   File dir = new File(parent);
   if (!dir.exists())
    throw new IOException("FileCopy: "
      + "destination directory doesn't exist: " + parent);
   if (dir.isFile())
    throw new IOException("FileCopy: "
      + "destination is not a directory: " + parent);
   if (!dir.canWrite())
    throw new IOException("FileCopy: "
      + "destination directory is unwriteable: " + parent);
  } else {
   // Create directory structure.
   new File(toFile.getParent()).mkdirs();
  }
  createCopy(toFile, fromFile);
 }

 /**
  * Writes the copy from source to target.
  * @param toFile
  * @param fromFile
  * @throws FileNotFoundException
  * @throws IOException
  */
 private static void createCopy(File toFile, File fromFile)
   throws FileNotFoundException, IOException {
  FileInputStream from = null;
  FileOutputStream to = null;
  try {
   from = new FileInputStream(fromFile);
   to = new FileOutputStream(toFile);
   byte[] buffer = new byte[4096];
   int bytesRead;

   while ((bytesRead = from.read(buffer)) != -1)
    to.write(buffer, 0, bytesRead); // write
  } finally {
   if (from != null)
    try {
     from.close();
    } catch (IOException e) {
     ;
    }
   if (to != null)
    try {
     to.close();
     toFile.setLastModified(fromFile.lastModified());

    } catch (IOException e) {
     ;
    }
  }
 }
}
The FileCopy uses six more functions of the java File class: 
1.“exists()”: used to double-check if the source file really exists. 
2.“canRead()”: used to determine if the source file can be read. 
3.“canWrite():”: used to determine if target file can be overwrite. This is used in the cases where we need to update the file. 
4."getParent()”: to get the parent path of the file.
5.“mkDirs()”: I have to say this is my favorite one; it creates all the directory hierarchy of the file’s path.
6.“setLastModifiedDate()”: when we finish copying the file in the target directory, we wanted to leave the same modified date of the source. 

To conclude with the Synchronizer class we just have to see one more function and a constant: 
+ “delete()”: deletes the file or directory. 
+ “File.separator”: system dependant character used to separate directories in a path. In this example the separator is the back-slash (“\”). 

The Synchronizer class do all the basic steps required to synchronize the two directories. The list of copies and deletions are calculated with simple comparisons of collections (sets and maps). I would like to test if this works in other OS but I'm kind of lazy for that. Theoretically it does work ;).

package gsolano;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;

/**
 * 
 * Class used to synchronize two directories. One directory (source)
 * is used as base of another directory (target).
 * The class determines the operations required to leave the target
 * with the same structure as the source.
 * 
 * @author gsolano
 *
 */
public class Synchronizer {
 
 public static void main(String[] args) {
  Synchronizer.run("c:\\gsolano_remote\\", "c:\\gsolano_local\\");
 } 
 
 public static void run(String source, String target) {
  System.out.println("Scanning source directory...");
  Map<String, Long> sourceFiles = Dir.scan(source);
  System.out.println("[DONE]");
  
  System.out.println("Scanning target directory...");
  Map<String, Long> targetFiles = Dir.scan(target);
  System.out.println("[DONE]");
  
  List<String> newFilesToCopy = getNewFilesToCopy(sourceFiles.keySet(), targetFiles.keySet());
  System.out.println("Total new files to copy: " + newFilesToCopy.size());
  
  List<String> filesToUpdate = getFilesToUpdate(sourceFiles, targetFiles);
  System.out.println("Total files to update: " + filesToUpdate.size());
  
  List<String> filesToRemove = getFilesToRemove(sourceFiles.keySet(), targetFiles.keySet());
  System.out.println("Total files to remove: " + filesToRemove.size());
  
  List<String> dirsToRemove = getDirectoriesToRemove(sourceFiles.keySet(), targetFiles.keySet());
  System.out.println("Total dirs to remove: " + dirsToRemove.size());
  
  System.out.println("Copying new files...");
  for(String fileToCopy : newFilesToCopy) {
   try {
    FileCopy.copy(source + File.separator + fileToCopy, 
      target + File.separator + fileToCopy, false);
   } catch (IOException e) {
    System.out.println("Couldn't copy file: " + fileToCopy + "(" + e.getMessage() + ")");
   }
  }
  
  System.out.println("Updating files...");
  for(String fileToUpdate : filesToUpdate) {
   try {
    FileCopy.copy(source + File.separator + fileToUpdate, 
      target + File.separator +fileToUpdate, true);
   } catch (IOException e) {
    System.out.println("Couldn't copy file: " + fileToUpdate + "(" + e.getMessage() + ")");
   }
  }
  
  System.out.println("Removing files from target...");
  for(String fileToRemove : filesToRemove) {   
   new File(target + fileToRemove).delete();   
  }
  
  System.out.println("Removing directories from target...");
  for(String dirToRemove : dirsToRemove) {
   new File(target  + dirToRemove).delete();   
  }  
 }
 
 /**
  * Return the list of directories to be removed. A directory is removed
  * if it is present in the target but not in the source.
  * @param sourceFiles
  * @param targetFiles
  * @return
  */
 private static List<String> getDirectoriesToRemove(Set<String> sourceFiles, 
    Set<String> targetFiles) {
  List<String> directoriesToRemove = new ArrayList<String>();
  
  Set<String> sourceDirs = buildDirectorySet(sourceFiles);
  Set<String> targetDirs = buildDirectorySet(targetFiles);
  
  for(String dir : targetDirs) {
   if (!sourceDirs.contains(dir)) {
    directoriesToRemove.add(dir);
   }
  }  
  return directoriesToRemove;  
 }
 
 /**
  * Return the list of files to be removed.
  * A file is removed if it is present in the target
  * but not in the source.
  * @param sourceFiles
  * @param targetFiles
  * @return
  */
 private static List<String> getFilesToRemove(Set<String> sourceFiles, 
   Set<String> targetFiles) {
   List<String> filesToRemove = new ArrayList<String>();   
      
   for (String filePath : targetFiles) {       
         if (!sourceFiles.contains(filePath) && 
           !filePath.endsWith(File.separator + ".")) {
          filesToRemove.add(filePath);          
         }
      }
  return filesToRemove;
 }
 
 /**
  * Gets the the list of files missing in the target directory.
  * @param sourceFiles
  * @param targetFiles
  * @return
  */
 private static List<String> getNewFilesToCopy(Set<String> sourceFiles, 
   Set<String> targetFiles) {
   List<String> filesToCopy = new ArrayList<String>();  
        
   for (String filePath : sourceFiles) {
          if (!targetFiles.contains(filePath)) {
           if(!filePath.endsWith(File.separator + ".")) {
           filesToCopy.add(filePath);
           }
         }
      }  
  return filesToCopy;
 }
 
 /**
  * Gets the list of files to be updated according to the last
  * modified date.
  * @param sourceFiles
  * @param targetFiles
  * @return
  */
 private static List<String> getFilesToUpdate(Map<String, Long> sourceFiles, 
   Map<String, Long> targetFiles) {
   List<String> filesToUpdate = new ArrayList<String>();  
   Iterator<Map.Entry<String, Long>> it = sourceFiles.entrySet().iterator();
      
   while (it.hasNext()) {
         Map.Entry<String, Long> pairs = it.next();
         String filePath = pairs.getKey();
         if (targetFiles.containsKey(filePath) &&
           !filePath.endsWith(File.separator + ".")) {
          long sourceModifiedDate = sourceFiles.get(filePath);
          long targetModifiedDate = targetFiles.get(filePath);
          
          if(sourceModifiedDate != targetModifiedDate) {
           filesToUpdate.add(filePath);
          }                    
         }
      } 
  return filesToUpdate;
 }

 /**
  * Returns the set of directories contained in the set of file paths.
  * @param files
  * @return Set of directories representing the directory structure.
  */
 private static Set<String> buildDirectorySet(Set<String> files) {
  Set<String> directories = new HashSet<String>();  
  for(String filePath : files) { 
   if (filePath.contains(File.separator)) {
    directories.add(filePath.substring(0, 
      filePath.lastIndexOf(File.separator)));
   } 
  }  
  return directories;
 }
}

Output: ;

No comments:

Post a Comment