Now the extension is the first difference which everyone sees and is arguably the first difference to be noticed. It is simply text appended to the end of the filename. In terms of image formats you might have BMP, PNG, GIF, JPG, TIFF, WEBP, etc...
The extension though is only related to the file name, it is NOT related to the actual data of the file. The file name doesn't reside as data in the image, if you look at the file in a hex editor you will not find the filename. The filename and extension is for allocation in the file system used. If you open the allocation table of your drive you then will see it there. So changing the extension doesn't even change a single bit in the image file.
The header of an image format or any file format specification is actually in the data. Changing this might or might not affect the use of the file depending on where in the header you mess with. There are several parts in the header and each format has a different type and number of headers but typically the Magic/Signature is the first thing you will see.
The Magic is the first few bytes which identifies what format the file is in and in some cases provide metadata. Parsing this will allow the ability to know the file's format but even more importantly its structure. Here you can see the Magic between the same image but one in JPG and one in PNG.
I'm not going to explain the entire header of these two formats but I will present them so that the difference can be seen.
Note: JPG has several different headers depending on options and type. You can see from the image I posted above that the JPG used here is of the EXIF type.
Now we can see that the structures of the image is completely different. A JPG decoder can not interpret a meaningful image if it tried to decode a PNG and vice versa.
Even if the headers were the same the inner workings of the two formats would be completely different which leads use to the encoding and decoding of the image data.
JPG works by use of huffman tables and quantization tables. These tables are computed and is what gives the image the compression and lossy factors. This is also what allows different encoded images to be decoded the same way.
PNG on the other hand uses compressed chunks. Chunks are used so that PNG can be presented in a streamed manner but the image is encoded typically using a compression algorithm of the lossless kind. This gives per-pixel integrity at the cost of usually producing a larger file size (than JPG).