I am trying to do what would be a very simple task. I have two HDDs (spinning drives) and I am trying to move the data from one to the other using rsync.

The command in itself is very simple

rsync -r --info=progress2 /mnt/disk1/backupfolder /mnt/disk2/backupfolder

The amount of data to move is around 4tb.

Somehow, once around 89% and another at 94% the process dies, and halts the server itself, making it completely unavailable and unresponsive (pings don’t work, nothing hosted works, ssh does not work). Only a reset via button on the case works here.

At first I was under suspicion was temperature. After constantly checking the second time with beszel, seems everything is in the normal ranges.

Did anyone else experience such bizarre system shutdowns/hangs? In the meantime I am going to test the memory with memtest just to be sure is not that.

Edit: forgot to mention, both drive smart data gives a pass, although they are second hnd bought with warranty.

Edit2: memtest finished and nothing is there (thank goodness, because ram right now is just stupid priced). Some commenters mentioned something on the disks. Will now proceed with this lead

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    8 hours ago

    If both drives exhibit the behavior, I’d suspect the drive controller.

    • SpikesOtherDog@ani.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      True, but it’s not clear to me that both drives are exhibiting the behavior and it sounds more like a copy between two drives. I wouldn’t rule it out and do think it is a possibility, but in my professional experience drives fail much more frequently than controllers.

      It makes sense to me to test the drives individually, in another system preferably, using smart long test, which is non-destructive. Next test other drives in this system. If there are errors, try changing out the SATA cables, too. If you can shuffle the data off the drives, do so and then try running them through a secure erase in another system. A bad drive should fail the same way in another system.

      My other thought for probably not being the controller is that 4TB is a very long time for a sustained transfer to fail on a flakey component. Also, there are no reports of other errors.