Skip to contents

Why {syncdr}?

{syncdr} is an R package for handling and synchronizing files and directories. Its primary objectives are:

  1. To provide a clear snapshot of the content and status of synchronization between two directories under comparison: including their tree structure, their common files, and files that are exclusive to either directory
  2. To make file organization and management in R easier: i.e., enabling content-based and modification date-based file comparisons, as well as facilitating tasks such as duplicates identification, file copying, moving, and deletion.

πŸ’‘
This article does not offer a comprehensive overview of {syncdr} functionalities. Rather it provides a sample workflow for working with the package’s main functions . After familiarizing yourself with this general workflow, read the articles throughout the rest of this website -they will explore all features of {syncdr} in a structured way.


Synchronizing with {syncdr}

Learn how to work with {syncdr} and compare and synchronize directories in R

Suppose you are working with two directories, let’s call them left and right -each containing certain files and folders/sub-folders.

Let’s first call syncdr function toy_dirs(). This generates two toy directories in .syncdrenv environment -say left and right- that we can use to showcase syncdr functionalities.


# Create syncdr env with left and right directories
.syncdrenv =toy_dirs()
#> β– β– β– β– β– β– β– β– β–                          27% | ETA:  8s
#> β– β– β– β– β– β– β– β– β– β– β– β– β– β– β– β– β– β– β–                60% | ETA:  5s

# Get left and right directories' paths 
left  <- .syncdrenv$left
right <- .syncdrenv$right

You can start by quickly comparing the two directories’ tree structure by calling display_dir_tree(). By default, it fully recurses -i.e., shows the directory tree of all sub-directories. However, you can also specify the number of levels to recurse using the recurse argument.


# Visualize left and right directories' tree structure 
display_dir_tree(path_left  = left,
                 path_right = right)
#> (←)Left directory structure:
#> /tmp/RtmphkzTu8/left
#> β”œβ”€β”€ A
#> β”‚   β”œβ”€β”€ A1.Rds
#> β”‚   β”œβ”€β”€ A2.Rds
#> β”‚   └── A3.Rds
#> β”œβ”€β”€ B
#> β”‚   β”œβ”€β”€ B1.Rds
#> β”‚   β”œβ”€β”€ B2.Rds
#> β”‚   └── B3.Rds
#> β”œβ”€β”€ C
#> β”‚   β”œβ”€β”€ C1.Rds
#> β”‚   β”œβ”€β”€ C2.Rds
#> β”‚   └── C3.Rds
#> β”œβ”€β”€ D
#> β”‚   β”œβ”€β”€ D1.Rds
#> β”‚   └── D2.Rds
#> └── E
#> (β†’)Right directory structure:
#> /tmp/RtmphkzTu8/right
#> β”œβ”€β”€ A
#> β”œβ”€β”€ B
#> β”‚   β”œβ”€β”€ B1.Rds
#> β”‚   └── B2.Rds
#> β”œβ”€β”€ C
#> β”‚   β”œβ”€β”€ C1.Rds
#> β”‚   β”œβ”€β”€ C1_duplicate.Rds
#> β”‚   β”œβ”€β”€ C2.Rds
#> β”‚   └── C3.Rds
#> β”œβ”€β”€ D
#> β”‚   β”œβ”€β”€ D1.Rds
#> β”‚   β”œβ”€β”€ D2.Rds
#> β”‚   └── D3.Rds
#> └── E
#>     β”œβ”€β”€ E1.Rds
#>     β”œβ”€β”€ E2.Rds
#>     └── E3.Rds

Step 1: Compare Directories

The most important function in syncdr is compare_directories(). It takes the paths of left and right directories and compares them to determine their synchronization status (see below). This function represents the backbone of syncdr: you can utilize the syncdr_status object it generates both:

  • to inspect the synchronization status of files present in both directories as well as those exclusive to either directory

  • as the input for all other functions within syncdr that allow synchronization between the directories under comparison.

Before diving into the resulting syncdr_status object, note that compare_directories() enables to compare directories in 3 ways:

  1. By date only -the default: by default, by_date = TRUE, so that files in both directories are compared based on the date of last modification.
sync_status (all common files)
older in left, newer in right dir
newer in left, olderin right dir
same date
  1. By date and content. This is done by specifying by_content = TRUE (by default by_date = TRUE if not specifically set to FALSE). Files are first compared by date, and then only those that are newer in either directory will be compared by content.
sync_status (common files that are newer in either left or right, i.e., not of same date )
different content
same content
  1. By content only, by specifying by_date = FALSE and by_content = TRUE . This option is however discouraged -comparing all files’ contents can be slow and computationally expensive.
sync_status (all common files)
different content
same content

Also, regardless of which options you choose, the sync_status of files that are exclusive to either directory is determined as:

sync_status (non common files)
only in left
only in right

Let’s now take a closer look at the output of compare_directories(), which is intended to contain comprehensive information on the directories under comparison. This is a list of class syncdr_status, containing 4 elements: (1) common files, (2) non common files, (3) left path and (4) right path

1. Comparing by date

# Compare by date only -the Default
sync_status_date <- compare_directories(left, 
                                        right)

sync_status_date
#> 
#> ── Synchronization Summary ─────────────────────────────────────────────────────
#> β€’ Left Directory: /tmp/RtmphkzTu8/left
#> β€’ Right Directory: /tmp/RtmphkzTu8/right
#> β€’ Total Common Files: 7
#> β€’ Total Non-common Files: 9
#> β€’ Compare files by: date
#> 
#> ── Common files ────────────────────────────────────────────────────────────────
#>             path modification_time_left modification_time_right modified
#> 1 /left/B/B1.Rds    2024-11-04 20:18:10     2024-11-04 20:18:11    right
#> 2 /left/B/B2.Rds    2024-11-04 20:18:13     2024-11-04 20:18:14    right
#> 3 /left/C/C1.Rds    2024-11-04 20:18:11     2024-11-04 20:18:17    right
#> 4 /left/C/C2.Rds    2024-11-04 20:18:14     2024-11-04 20:18:15    right
#> 5 /left/C/C3.Rds    2024-11-04 20:18:16     2024-11-04 20:18:17    right
#> 6 /left/D/D1.Rds    2024-11-04 20:18:13     2024-11-04 20:18:12     left
#> 7 /left/D/D2.Rds    2024-11-04 20:18:16     2024-11-04 20:18:15     left
#> 
#> ── Non-common files ────────────────────────────────────────────────────────────
#> 
#> ── Only in left ──
#> 
#> # A tibble: 4 Γ— 1
#>   path_left     
#>   <fs::path>    
#> 1 /left/A/A1.Rds
#> 2 /left/A/A2.Rds
#> 3 /left/A/A3.Rds
#> 4 /left/B/B3.Rds
#> ── Only in right ──
#> # A tibble: 5 Γ— 1
#>   path_right               
#>   <fs::path>               
#> 1 /right/C/C1_duplicate.Rds
#> 2 /right/D/D3.Rds          
#> 3 /right/E/E1.Rds          
#> 4 /right/E/E2.Rds          
#> 5 /right/E/E3.Rds
2. Comparing by date and content

# Compare by date and content 
sync_status_date_content <- compare_directories(left, 
                                                right,
                                                by_content = TRUE)

sync_status_date_content
#> 
#> ── Synchronization Summary ─────────────────────────────────────────────────────
#> β€’ Left Directory: /tmp/RtmphkzTu8/left
#> β€’ Right Directory: /tmp/RtmphkzTu8/right
#> β€’ Total Common Files: 7
#> β€’ Total Non-common Files: 9
#> β€’ Compare files by: date & content
#> 
#> ── Common files ────────────────────────────────────────────────────────────────
#>             path modification_time_left modification_time_right modified
#> 1 /left/B/B1.Rds    2024-11-04 20:18:10     2024-11-04 20:18:11    right
#> 2 /left/B/B2.Rds    2024-11-04 20:18:13     2024-11-04 20:18:14    right
#> 3 /left/C/C1.Rds    2024-11-04 20:18:11     2024-11-04 20:18:17    right
#> 4 /left/C/C2.Rds    2024-11-04 20:18:14     2024-11-04 20:18:15    right
#> 5 /left/C/C3.Rds    2024-11-04 20:18:16     2024-11-04 20:18:17    right
#> 6 /left/D/D1.Rds    2024-11-04 20:18:13     2024-11-04 20:18:12     left
#> 7 /left/D/D2.Rds    2024-11-04 20:18:16     2024-11-04 20:18:15     left
#>         sync_status
#> 1 different content
#> 2 different content
#> 3      same content
#> 4 different content
#> 5 different content
#> 6 different content
#> 7 different content
#> 
#> ── Non-common files ────────────────────────────────────────────────────────────
#> 
#> ── Only in left ──
#> 
#> # A tibble: 4 Γ— 1
#>   path_left     
#>   <fs::path>    
#> 1 /left/A/A1.Rds
#> 2 /left/A/A2.Rds
#> 3 /left/A/A3.Rds
#> 4 /left/B/B3.Rds
#> ── Only in right ──
#> # A tibble: 5 Γ— 1
#>   path_right               
#>   <fs::path>               
#> 1 /right/C/C1_duplicate.Rds
#> 2 /right/D/D3.Rds          
#> 3 /right/E/E1.Rds          
#> 4 /right/E/E2.Rds          
#> 5 /right/E/E3.Rds
3. Comparing by content only

# Compare by date and content 
sync_status_content <- compare_directories(left, 
                                            right,
                                            by_date    = FALSE,
                                            by_content = TRUE)

sync_status_content
#> 
#> ── Synchronization Summary ─────────────────────────────────────────────────────
#> β€’ Left Directory: /tmp/RtmphkzTu8/left
#> β€’ Right Directory: /tmp/RtmphkzTu8/right
#> β€’ Total Common Files: 7
#> β€’ Total Non-common Files: 9
#> β€’ Compare files by: content
#> 
#> ── Common files ────────────────────────────────────────────────────────────────
#>             path       sync_status
#> 1 /left/B/B1.Rds different content
#> 2 /left/B/B2.Rds different content
#> 3 /left/C/C1.Rds      same content
#> 4 /left/C/C2.Rds different content
#> 5 /left/C/C3.Rds different content
#> 6 /left/D/D1.Rds different content
#> 7 /left/D/D2.Rds different content
#> 
#> ── Non-common files ────────────────────────────────────────────────────────────
#> 
#> ── Only in left ──
#> 
#> # A tibble: 4 Γ— 1
#>   path_left     
#>   <fs::path>    
#> 1 /left/A/A1.Rds
#> 2 /left/A/A2.Rds
#> 3 /left/A/A3.Rds
#> 4 /left/B/B3.Rds
#> ── Only in right ──
#> # A tibble: 5 Γ— 1
#>   path_right               
#>   <fs::path>               
#> 1 /right/C/C1_duplicate.Rds
#> 2 /right/D/D3.Rds          
#> 3 /right/E/E1.Rds          
#> 4 /right/E/E2.Rds          
#> 5 /right/E/E3.Rds
*️⃣ Comparing directories with verbose = TRUE

When calling compare_directories(), you have the option to enable verbose mode by setting verbose = TRUE. This will display both directories tree structure and, when comparing files by content, provide progress updates including the time spent hashing the files.


compare_directories(left,
                    right,
                    by_date    = FALSE,
                    by_content = TRUE,
                    verbose    = TRUE)
#> β ™ cli-147-153
#> βœ” B1.Rds [5ms]
#> 
#> β ™ cli-147-153
#> βœ” B2.Rds [4ms]
#> 
#> β ™ cli-147-153
#> βœ” C1.Rds [4ms]
#> 
#> β ™ cli-147-153
#> βœ” C2.Rds [4ms]
#> 
#> β ™ cli-147-153
#> βœ” C3.Rds [4ms]
#> 
#> β ™ cli-147-153
#> βœ” D1.Rds [4ms]
#> 
#> β ™ cli-147-153
#> βœ” D2.Rds [4ms]
#> 
#> ── Hashing completed! Total time spent: 0.08494329 secs ──
#> 
#> β ™ cli-147-199
#> βœ” B1.Rds [4ms]
#> 
#> β ™ cli-147-199
#> βœ” B2.Rds [4ms]
#> 
#> β ™ cli-147-199
#> βœ” C1.Rds [4ms]
#> 
#> β ™ cli-147-199
#> βœ” C2.Rds [4ms]
#> 
#> β ™ cli-147-199
#> βœ” C3.Rds [7ms]
#> 
#> β ™ cli-147-199
#> βœ” D1.Rds [4ms]
#> 
#> β ™ cli-147-199
#> βœ” D2.Rds [4ms]
#> 
#> ── Hashing completed! Total time spent: 0.07256794 secs ──
#> 
#> (←)Left directory structure:
#> /tmp/RtmphkzTu8/left
#> β”œβ”€β”€ A
#> β”‚   β”œβ”€β”€ A1.Rds
#> β”‚   β”œβ”€β”€ A2.Rds
#> β”‚   └── A3.Rds
#> β”œβ”€β”€ B
#> β”‚   β”œβ”€β”€ B1.Rds
#> β”‚   β”œβ”€β”€ B2.Rds
#> β”‚   └── B3.Rds
#> β”œβ”€β”€ C
#> β”‚   β”œβ”€β”€ C1.Rds
#> β”‚   β”œβ”€β”€ C2.Rds
#> β”‚   └── C3.Rds
#> β”œβ”€β”€ D
#> β”‚   β”œβ”€β”€ D1.Rds
#> β”‚   └── D2.Rds
#> └── E
#> (β†’)Right directory structure:
#> /tmp/RtmphkzTu8/right
#> β”œβ”€β”€ A
#> β”œβ”€β”€ B
#> β”‚   β”œβ”€β”€ B1.Rds
#> β”‚   └── B2.Rds
#> β”œβ”€β”€ C
#> β”‚   β”œβ”€β”€ C1.Rds
#> β”‚   β”œβ”€β”€ C1_duplicate.Rds
#> β”‚   β”œβ”€β”€ C2.Rds
#> β”‚   └── C3.Rds
#> β”œβ”€β”€ D
#> β”‚   β”œβ”€β”€ D1.Rds
#> β”‚   β”œβ”€β”€ D2.Rds
#> β”‚   └── D3.Rds
#> └── E
#>     β”œβ”€β”€ E1.Rds
#>     β”œβ”€β”€ E2.Rds
#>     └── E3.Rds
#> ── Synchronization Summary ─────────────────────────────────────────────────────
#> β€’ Left Directory: /tmp/RtmphkzTu8/left
#> β€’ Right Directory: /tmp/RtmphkzTu8/right
#> β€’ Total Common Files: 7
#> β€’ Total Non-common Files: 9
#> β€’ Compare files by: content
#> 
#> ── Common files ────────────────────────────────────────────────────────────────
#>             path       sync_status
#> 1 /left/B/B1.Rds different content
#> 2 /left/B/B2.Rds different content
#> 3 /left/C/C1.Rds      same content
#> 4 /left/C/C2.Rds different content
#> 5 /left/C/C3.Rds different content
#> 6 /left/D/D1.Rds different content
#> 7 /left/D/D2.Rds different content
#> 
#> ── Non-common files ────────────────────────────────────────────────────────────
#> 
#> ── Only in left ──
#> 
#> # A tibble: 4 Γ— 1
#>   path_left     
#>   <fs::path>    
#> 1 /left/A/A1.Rds
#> 2 /left/A/A2.Rds
#> 3 /left/A/A3.Rds
#> 4 /left/B/B3.Rds
#> ── Only in right ──
#> # A tibble: 5 Γ— 1
#>   path_right               
#>   <fs::path>               
#> 1 /right/C/C1_duplicate.Rds
#> 2 /right/D/D3.Rds          
#> 3 /right/E/E1.Rds          
#> 4 /right/E/E2.Rds          
#> 5 /right/E/E3.Rds

Step 2: Visualize Synchronization Status

The best way to read through the output of compare_directories() is by visualizing it with display_sync_status() function.

For example, let’s visualize the sync status of common files in left and right directories, when compared by date


display_sync_status(sync_status_date$common_files,
                    left_path  = left,
                    right_path = right)

or let’s display the sync status of non common files:


display_sync_status(sync_status_date$non_common_files,
                    left_path  = left,
                    right_path = right)

Step 3: Synchronize directories

syncdr enables users to perform different actions such as copying, moving, and deleting files using specific synchronization functions. Refer to the vignette("asymmetric-synchronization") and vignette("symmetric-synchronization") articles for detailed information.

For the purpose of this general demonstration, we will perform a β€˜full asymmetric synchronization to right’. This specific function executes the following:

  • On common files:
    • If by date only (by_date = TRUE): Copy files that are newer in the left directory to the right directory.
    • If by date and content (by_date = TRUE and by_content = TRUE): Copy files that are newer and different in the left directory to the right directory.
    • If by content only (by_content = TRUE): Copy files that are different in the left directory to the right directory.
  • On non common files:
    • Copy to the right directory those files that exist only in the left directory
    • Delete from the right directory those files that are exclusive in the right directory (i.e., missing in the left directory)

# Compare directories

sync_status <- compare_directories(left,
                                   right,
                                   by_date = TRUE)

# Synchronize directories 
full_asym_sync_to_right(sync_status = sync_status)
#> βœ” synchronized
#>