Off-and-on trying out an account over at @[email protected] due to scraping bots bogging down lemmy.today to the point of near-unusability.

  • 57 Posts
  • 1.42K Comments
Joined 2 years ago
cake
Cake day: October 4th, 2023

help-circle
  • tal@lemmy.todaytoSelfhosted@lemmy.worldWhere to start with backups?
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    2 hours ago

    If databases are involved they usually offer some method of dumping all data to some kind of text file. Usually relying on their binary data is not recommended.

    It’s not so much text or binary. It’s because a normal backup program that just treats a live database file as a file to back up is liable to have the DBMS software write to the database while it’s being backed up, resulting in a backed-up file that’s a mix of old and new versions, and may be corrupt.

    Either:

    1. The DBMS needs to have a way to create a dump — possibly triggered by the backup software, if it’s aware of the DBMS — that won’t change during the backup

    or:

    1. One needs to have filesystem-level support to grab an atomic snapshot (e.g. one takes an atomic snapshot using something like btrfs and then backs up the snapshot rather than the live filesystem). This avoids the issue of the database file changing while the backup runs.

    In general, if this is a concern, I’d tend to favor #2 as an option, because it’s an all-in-one solution that deals with all of the problems of files changing while being backed up: DBMSes are just a particularly thorny example of that.

    Full disclosure: I mostly use ext4 myself, rather than btrfs. But I also don’t run live DBMSes.

    EDIT: Plus, #2 also provides consistency across different files on the filesystem, though that’s usually less-critical. Like, you won’t run into a situation where you have software on your computer update File A, then does a sync(), then updates File B, but your backup program grabs the new version of File B but then the old version of File A. Absent help from the filesystem, your backup program won’t know where write barriers spanning different files are happening.

    In practice, that’s not usually a huge issue, since fewer software packages are gonna be impacted by this than write ordering internal to a single file, but it is permissible for a program, under Unix filesystem semantics, to expect that the write order persists there and kerplode if it doesn’t…and a traditional backup won’t preserve it the way that a backup with help from the filesystem can.



  • tal@lemmy.todaytoTechnology@beehaw.orgMove Over, ChatGPT
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    9 hours ago

    In all fairness, while this is a particularly bad case, the fact that it’s often very difficult to safely fiddle with environment variables at runtime in a process, but very convenient as a way to cram extra parameters into a library have meant that a lot of human programmers who should know better have created problems like this too.

    IIRC, setting the timezone for some of the Posix time APIs on Linux has the same problem, and that’s a system library. And IIRC SDL and some other graphics libraries, SDL and IIRC Linux 3D stuff, have used this as a way to pass parameters out-of-band to libraries, which becomes a problem when programs start dicking with it at runtime. I remember reading some article from someone who had been banging into this on Linux gaming about how various programs and libraries for games would setenv() to fiddle with them, and races associated with that were responsible for a substantial number of crashes that they’d seen.

    setenv() is not thread-safe or signal-safe. In general, reading environment variables in a program is fine, but messing with them in very many situations is not.

    searches

    Yeah, the first thing I see is someone talking about how its lack of thread-safety is a problem for TZ, which is the time thing that’s been a pain for me a couple times in the past.

    https://news.ycombinator.com/item?id=38342642

    Back on your issue:

    Claude, being very smart and very good at drawing a straight line between two points, wrote code that took the authentication token from the HTTP request header, modified the process’s environment variables, then called the library

    for the uninitiated - a process’s environment variables are global. and HTTP servers are famously pretty good at dealing with multiple requests at once.

    Note also that a number of webservers used to fork to handle requests — and I’m sure that there are still some now that do so, though it’s certainly not the highest-performance way to do things — and in that situation, this code could avoid problems.

    searchs

    It sounds like Apache used to and apparently still can do this:

    https://old.reddit.com/r/PHP/comments/102vqa2/why_does_apache_spew_a_new_process_for_each/

    But it does highlight one of the “LLMs don’t have a broad, deep understanding of the world, and that creates problems for coding” issues that people have talked about. Like, part of what someone is doing when writing software is identifying situations where behavior isn’t defined and clarifying that, either via asking for requirements to be updated or via looking out-of-band to understand what’s appropriate. An LLM that’s working by looking at what’s what commonly done in its training set just isn’t in a good place to do that, and that’s kinda a fundamental limitation.

    I’m pretty sure that the general case of writing software is AI-hard, where the “AI” referred to by the term is an artificial general intelligence that incorporates a lot of knowledge about the world. That is, you can probably make an AI to program write software, but it won’t be just an LLM, of the “generative AI” sort of thing that we have now.

    There might be ways that you could incorporate an LLM into software that can write software themselves. But I don’t think that it’s just going to be a raw “rely on an LLM taking in a human-language set of requirements and spitting out code”. There are just things that that can’t handle reasonably.


  • I think that the problem will be if software comes out that’s doesn’t target home PCs. That’s not impossible. I mean, that happens today with Web services. Closed-weight AI models aren’t going to be released to run on your home computer. I don’t use Office 365, but I understand that at least some of that is a cloud service.

    Like, say the developer of Video Game X says “I don’t want to target a ton of different pieces of hardware. I want to tune for a single one. I don’t want to target multiple OSes. I’m tired of people pirating my software. I can reduce cheating. I’m just going to release for a single cloud platform.”

    Nobody is going to take your hardware away. And you can probably keep running Linux or whatever. But…not all the new software you want to use may be something that you can run locally, if it isn’t released for your platform. Maybe you’ll use some kind of thin-client software — think telnet, ssh, RDP, VNC, etc for past iterations of this — to use that software remotely on your Thinkpad. But…can’t run it yourself.

    If it happens, I think that that’s what you’d see. More and more software would just be available only to run remotely. Phones and PCs would still exist, but they’d increasingly run a thin client, not run software locally. Same way a lot of software migrated to web services that we use with a Web browser, but with a protocol and software more aimed at low-latency, high-bandwidth use. Nobody would ban existing local software, but a lot of it would stagnate. A lot of new and exciting stuff would only be available as an online service. More and more people would buy computers that are only really suitable for use as a thin client — fewer resources, closer to a smartphone than what we conventionally think of as a computer.

    EDIT: I’d add that this is basically the scenario that the AGPL is aimed at dealing with. The concern was that people would just run open-source software as a service. They could build on that base, make their own improvements. They’d never release binaries to end users, so they wouldn’t hit the traditional GPL’s obligation to release source to anyone who gets the binary. The AGPL requires source distribution to people who even just use the software.


  • I will say that, realistically, in terms purely of physical distance, a lot of the world’s population is in a city and probably isn’t too far from a datacenter.

    https://calculatorshub.net/computing/fiber-latency-calculator/

    It’s about five microseconds of latency per kilometer down fiber optics. Ten microseconds for a round-trip.

    I think a larger issue might be bandwidth for some applications. Like, if you want to unicast uncompressed video to every computer user, say, you’re going to need an ungodly amount of bandwidth.

    DisplayPort looks like it’s currently up to 80Gb/sec. Okay, not everyone is currently saturating that, but if you want comparable capability, that’s what you’re going to have to be moving from a datacenter to every user. For video alone. And that’s assuming that they don’t have multiple monitors or something.

    I can believe that it is cheaper to have many computers in a datacenter. I am not sold that any gains will more than offset the cost of the staggering fiber rollout that this would require.

    EDIT: There are situations where it is completely reasonable to use (relatively) thin clients. That’s, well, what a lot of the Web is — browser thin clients accessing software running on remote computers. I’m typing this comment into Eternity before it gets sent to a Lemmy instance on a server in Oregon, much further away than the closest datacenter to me. That works fine.

    But “do a lot of stuff in a browser” isn’t the same thing as “eliminate the PC entirely”.









  • tal@lemmy.todaytoLinux@lemmy.worldSetting up LaTeX on debian?
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    8 days ago

    I’ve always just written single-file LaTeX, but it looks like the settings.sty failure you’re getting is because of this:

    % Most commands and style definitions are in settings.sty.
    \usepackage{settings}
    

    By installing texlive from source, and installing CurVe to the working directory, I was able to fix that problem.

    I’m not sure how this would resolve the issue — I’d think that you’d still need settings.sty. It looks to me like Debian trixie packages CurVe in texlive-pictures, so I don’t think that you need to manually install texlive or CurVe from source:

    $ apt-file search curve.cls
    texlive-pictures: /usr/share/texlive/texmf-dist/tex/latex/curve/curve.cls
    $ apt show texlive-pictures
    [snip]
    curve -- A class for making curriculum vitae
    [snip]
    $ sudo apt install texlive texlive-pictures
    [snip]
    $ pdflatex test.tex
    [snip]
    ! LaTeX Error: File `settings.sty' not found.
    

    I think that that example CV you have is missing some of the LaTeX source, the stuff that’s in its settings.sty. Like, it might not be the best starting point, unless you’ve resolved that bit.

    EDIT: If you just want a functioning CurVe example, I can render this one:

    https://github.com/ArwensAbendstern/CV-LaTeX/tree/master/simple CurVe CV English

    Need to download CV.ltx and experience.ltx. Then $ pdflatex CV.ltx renders it to a PDF for me.



  • It looks like I was wrong about it being the default journaling mode for ext3; the default is apparently to journal only metadata. However, if you’re journaling data, it gets pushed out to the disk in a new location rather than on top of where the previous data existed.

    https://linux.die.net/man/1/shred

    CAUTION: Note that shred relies on a very important assumption: that the file system overwrites data in place. This is the traditional way to do things, but many modern file system designs do not satisfy this assumption. The following are examples of file systems on which shred is not effective, or is not guaranteed to be effective in all file system modes:

    • log-structured or journaled file systems, such as those supplied with AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)

    • file systems that write redundant data and carry on even if some writes fail, such as RAID-based file systems

    • file systems that make snapshots, such as Network Appliance’s NFS server

    • file systems that cache in temporary locations, such as NFS version 3 clients

    • compressed file systems

    In the case of ext3 file systems, the above disclaimer applies (and shred is thus of limited effectiveness) only in data=journal mode, which journals file data in addition to just metadata. In both the data=ordered (default) and data=writeback modes, shred works as usual. Ext3 journaling modes can be changed by adding the data=something option to the mount options for a particular file system in the /etc/fstab file, as documented in the mount man page (man mount).





  • I assume you’ve tried multiple USB ports?

    That’s a thought.

    Check fdisk -l and see if it shows up in there.

    It won’t hurt, but if he’s not seeing anything with lsblk, fdisk -l probably won’t show it either, as they’re both iterating over the block devices.

    Honestly, if he doesn’t know that the hard drive itself functions, the drive not working would be my prime theory as to culprit. I have had drive enclosures not present a USB Mass Storage device to a computer if they can’t talk to the hard drive over SATA.


  • usb-storage was loaded when I manually started the service (sorry, forgot to state that).

    Ahhh, gotcha. Yeah, then that kinda kills that line of reasoning.

    It looks like past versions of the product have worked with Linux.

    https://forums.linuxmint.com/viewtopic.php?t=299894

    I’m a little surprised that you aren’t at least seeing something showing up as a USB device in your before-after difference. Looking at this, it looks like the device uses wall power, has a DC power supply with a barrel connector. It’s possible that it might not function at all without that being powered up. Are you sure that you have the barrel connector in and the plug plugged into an outlet that is hot (i.e. not, say, an outlet controlled by a light switch that is off)?

    EDIT: Also, are you sure that the drive works? My past experience with JBOD drive enclosures has been that a non-functional drive, something that the drive enclosure can’t talk to, won’t be presented as a Mass Storage device.