I have three questions: one are pnml files created through the compiler?
No, it's the actual source code.
The compiler takes a single NML file, but if you make many similar entities, that is not very convenient. As a result, people made a file for each entity (a .pnml file), and added cpp (the C pre-processor) as pre-processing step, which can merge all those small files into 1 nml file that the compiler takes. For bonus, you also get the C macro processor in that way, for generating unique labels, and replace hard-coded numbers by a readable name etc.
Next, what is the easy way to determine spriteset offsets? Like the [x, x, y, y] offsets. I find that's the hardes part of coding, as the other ones is kinda like Java syntax.
No experience myself, but afaik people make graphics templates for each length vehicle, with fixed position for each view. That way, the offsets are always the same, just the name of the used graphics file changes. Figure out the offset once, and it works for all vehicles by copy/pasting that. Better, by using the macro-processor from above, or use templates from nmlc itself, the copies are generated each time, so if you ever need to change a template, everything changes along with it automagically without need to update all your pasted code.
Last, why doesn't nml come with something like grfcodec which decodes grf files? It only has an encoder.
grfcodec doesn't do much, it's mostly a transformation from numbers written in text (so you can edit them in an editor) to binary values. There is some minor stuff like computing a few lengths etc, but there is a simple 1-to-1 mapping between number in the input to binary value in the output. As you can imagine, doing the reverse isn't terribly difficult either then. In addition at the time when grfcodec was written, there was a need to be able to examine and modify the existing grfs, so converting back and forth between both forms was a fundamental requirement for grfcodec.
For nml, the story is different. Its aim is to simplify coding a grf, like a high-level language, much like Java which is so much simpler to write than byte-code. At the same time, the gap between nml source code and actual grf binary values has increased. Nml for example allows mostly arbitrary expressions, which doesn't exist in grf, so nml generates heaps of grf code for a single line nml. In addition, some grf code doesn't exist in nml at all. Grf action C and advanced uses of action 6 cannot be expressed in nml, there is simply no way to write down an action C in nml.
This means some grfs cannot faithfully be decoded at all to nml, since it may contain grf code that has no equivalent in nml. If you limit it to the subset that can be expressed, you've got the problem of a heap of grf code at one side, which needs to be recognized as some (unknown) lines of nml, which is far from trivial (there is no simple 1-to-1 mapping here, like in grfcodec). This is a decompiler problem, which is somewhat solvable, but a lot of work, and the results are not that good.
A second consideration is that source code is far more useful than anything decompiled and is generally readily available. For example, you get the collection of .pnml files, neatly organized in directories with all the vehicles in their templates etc, or you get a bunch of Python code with tables of properties that are easy to edit. Since most projects are open source anyway, it's far easier and with better results to just ask for the original commented source code from the author, than trying to reverse-engineer a grf.