It was already known before the whistleblower that:
- Siri inputs (all STT at that time, really) were processed off device
- Siri had false activations
The “sinister” thing that we learned was that Apple was reviewing those activations to see if they were false, with the stated intent (as confirmed by the whistleblower) of using them to reduce false activations.
There are also black box methods to verify that data isn’t being sent and that particular hardware (like the microphone) isn’t being used, and there are people who look for vulnerabilities as a hobby. If the microphones on the most/second most popular phone brand (iPhone, Samsung) were secretly recording all the time, evidence of that would be easy to find and would be a huge scoop - why haven’t we heard about it yet?
Snowden and Wikileaks dumped a huge amount of info about governments spying, but nothing in there involved always on microphones in our cell phones.
To be fair, an individual phone is a single compromise away from actually listening to you, so it still makes sense to avoid having sensitive conversations within earshot of a wirelessly connected microphone. But generally that’s not the concern most people should have.
Advertising tracking is much more sinister and complicated and harder to wrap your head around than “my phone is listening to me” and as a result makes for a much less glamorous story, but there are dozens, if not hundreds or thousands, of stories out there about how invasive advertising companies’ methods are, about how they know too much, etc… Think about what LLMs do with text. The level of prediction that they can do. That’s what ML algorithms can do with your behavior.
If you’re misattributing what advertisers know about you to the phone listening and reporting back, then you’re not paying attention to what they’re actually doing.
So yes - be vigilant. Just be vigilant about the right thing.
I think the best way to handle this would be to just encode everything and upload all files. If I wanted some amount of history, I’d use some file system with automatic snapshots, like ZFS.
If I wanted to do what you’ve outlined, I would probably use rclone with filtering for the extension types or something along those lines.
If I wanted to do this with Git specifically, though, this is what I would try first:
First, add lossless extensions (
*.flac
,*.wav
) to my repo’s.gitignore
Second, schedule a job on my local machine that:
.mp3
,.ogg
- possibly also with a confirmation that the codec is up to my standards with a call to ffprobe, avprobe, mediainfo, exiftool, or something similar), it encodes the file to your preferred lossy format.git status --porcelain
to if there have been any changes.git add --all && git commit --message "Automatic commit" && git push
Added album: "Satin Panthers - EP" by Hudson Mohawke
orRemoved album: "Brat" by Charli XCX; Added album "Brat and it's the same but there's three more songs so it's not" by Charli XCX
Third, schedule a job on my
remote machineserver that runsgit pull
at regular intervals.One issue with this approach is that if you delete a file (as opposed to moving it), the space is not recovered on your local or your server. If space on your server is a concern, you could work around that by running something like the answer here (adjusting the depth to an appropriate amount for your use case):
Another potential issue is that what I described above involves having an intermediary git to push to and pull from, e.g., running on a hosted Git forge, like GitHub, Codeberg, etc… This could result in getting copyright complaints or something along those lines, though.
Alternatively, you could use your server as the git server (or check out forgejo if you want a Git forge as well), but then you can’t use the above trick to prune file history and save space from deleted files (on the server, at least - you could on your local, I think). If you then check out your working copy in a way such that Git can use hard links, you should at least be able to avoid needing to store two copies on your server.
The other thing to check out, if you take this approach, is git lfs.EDIT: Actually, I take that back - you probably don’t want to use Git LFS.