But .txt is not the same as .rs; yet .txt is not the same as .docx, although both of these files look the same to the human eye.

  • Aniki@feddit.org
    link
    fedilink
    arrow-up
    1
    ·
    53 minutes ago

    no, nice take though

    so you’re right that text, audio and video are important data types. if you look at peripheral devices of computers, you can find: keyboard, printers, microphone, speakers, camera, screen. they put text / audio / video data in / out of the computer. so there’s that

    however, internally, databases are incredibly important. basically everything inside the computer is organized in databases if it’s not some media that’s displayed to the user directly. therefore, a lot of files are database files (sqlite3 files mostly). they’re tabular data and you can watch them with sqlite3 command line program.

  • sbeak@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 hours ago

    Not just those. Files are just a method of storing digital data, so it’s not just those four. You can have files storing databases, software (think exe, AppImage, deb, rpm, etc.), design files, projects, and more!

    And file extensions are a method of telling different programs how to handle different files, since the data is formatted a bit differently. For instance, a “.txt” file is stored in plain text, while an executable file is compiled code that needs to be run.

    For your example, I would like to note that you are comparing a plain text file type to a rich text file type. Plain text file types, like .txt, .md (Markdown), and the different code files (like .json, .py, .rs, etc.), can be viewed and edited with a simple Notepad-style text editor. The data is stored, as the name suggests, in plain text. In comparison, rich text file types, like .odt and .docx, encode additional data like fonts, styles, images, animations, etc., and require a rich text processor (like LibreOffice, MS Office, etc.) to read them. You can’t view them through a notepad-style application, for example.

    And for images, video, and audio, you have it take into account compression, codecs, that sort of thing. You might have heard that a PNG can store transparent images and is a lossless format while a JPEG cannot and is a lossy format. “Lossless” means that, after compression, no data has been removed (or “lost”), while “lossy” means that some data is removed after compression. For audio, MP3s are lossy while WAV files are lossless. You might have also heard of “raw” photos and “raw” videos, those mean that the data is directly from the camera in its original quality.

    For most file types, you can’t just change the extension to convert them, as the data stored is arranged differently! This is why renaming a .txt file into a .odt will not be a valid rich text document, for example.

    • sbeak@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 hours ago

      Oh, and you also have files like .zip or .tar(.gz), which are used to store a compressed version of some amount of digital files. And they can different in compression techniques, how data is arranged, etc.

  • Treczoks@lemmy.world
    link
    fedilink
    arrow-up
    9
    arrow-down
    1
    ·
    11 hours ago

    Nope. Wrong. There are thousands of file types, and while a handful of them fall somehow under your four categories, most of them actually don’t.

    And calling .docx a “text file” is an insult to all honest text files.

    • cheese_greater@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      14 hours ago

      I find it so incredibke u can record random audio rn and its turned sound into stone as text that 1000 monkeys could eventually type up by random like Shakespeare

  • MoonManKipper@lemmy.world
    link
    fedilink
    English
    arrow-up
    18
    ·
    15 hours ago

    There are thousands of types of file. They all contain data as a long sequence of numbers, and how those numbers are interpreted depends on the type of file - text characters, floating point numbers, pixel colour information or compressed data

      • Aniki@feddit.org
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        44 minutes ago

        an image, technically, is an array of pixels. specifically a 2-dimensional array. this means, it’s just a long list of lists of pixels. so if you have a 1920x1080 image, it’s just a list of 1080 lists of 1920 pixels each.

        each pixel, again, is a tuple (i.e. a list with fixed length) of numbers which specify the brightness of red / green / blue lamp. so if you want to display a yellow pixel, the data would be (1.0, 1.0, 0.0) which turns red and green on and blue off.

        so if you have a 1920x1080 image, technically you have 1920*1080*3 ≈ 6 million numbers. each number takes 32-bit, you can read it here

      • MoonManKipper@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        14 hours ago

        Depends on the file format. There is compressed and uncompressed audio - some times the numbers just represent the audio waveform (e.g. .wav) - some times with lossy lossless compression. Most, but not all, video formats are compressed due to the data size

  • xx3rawr@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    13 hours ago

    I learned in computing that there are two: binary and text. If you open the file with a text editor and you can read some stuff, it’s text. If just random characters, it’s binary.

  • etchinghillside@reddthat.com
    link
    fedilink
    arrow-up
    11
    ·
    14 hours ago

    Someone will probably correct me – but the funny thing is that stuff like docx and xls are just a zipped/tarballed collection of different kinds of files. So I would add whatever the official term for “zipped/tarballed” to your list.

  • Zwuzelmaus@feddit.org
    link
    fedilink
    arrow-up
    12
    ·
    edit-2
    15 hours ago

    There is structured text and formatted text.

    There are index files.

    There are databases.

    There are mixed media files.

    There are combined databases with mixed media and indexes, a.k.a. NOSQL databases.

    t.b.c.

  • TheDarkQuark@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    13 hours ago

    If you have a .docx file, rename it to .zip, and extract it. You’ll see the .docx is just packaged text (and image) files.