NMR software and file formats
05 Apr 2023Recently worked on helping my previous PI analyze some NMR (Nuclear Magnetic Resonance) data for some molecules. I am used to analyze NMR data with Sparky. I am going to discuss about some NMR related software and file formats in this post.
Short background
The raw data are usually obtained from Bruker NMR machines or Varian NMR machines. Varian was trying to produce 1 GHz NMR machine and failed. It went bankrupt and was bought by Agilent. The software from Bruker is called TopSpin. TopSpin is free for acedamic usage. The software from Varian is called Vnmrj. It was not free but now there is an open source version called OpenVnmrj.
UCSF format
ucsf
format is the format used by Sparky and its successor Poky.
Conversion from bruker
The ucsf format could easily be converted from bruker format by using the following command:
bruk2ucsf.exe .\USF_BH_20230328_TOCSY\27\pdata\1\2rr usf_bh_20230328_tocsy.ucsf
bruk2ucsf.exe
comes with sparky software distribution.
The raw date from Bruker will only have ser file. We have to use TopSpin to process the raw data to get the 2rr file.
Conversion from vnmrj
To get ucsf format from Vnmrj is a little more complicated. You have to have Vnmrj software installed in your system. The current OpenVnmrj can be installed on Ubuntu 20.04. After opening your data in OpenVnmrj, you have to run trace='f1'
and flush
in Vnmrj. Then run:
vnmr2ucsf exp7/procpar exp7/datdir/phasefile exp7.ucsf H H
to generate the ucsf file.
The exp7 means you have opened your date in experiment7 in your Vnmrj softawre.
The exp7 folder should be in ~/vnmrsys/data
folder.
UCSF data format
Detailed information about the data format could be found here.
Let’s analyze the headers of a C13HSQC file:
00000000 55 43 53 46 20 4E 4D 52 00 00 02 01 00 02 ** ** ** ** ** 00 00 00 00 4D 6F 6E 20 41 UCSF NMR......*****....Mon A
0000001C 70 72 20 30 33 20 31 31 3A 33 32 3A 33 34 20 32 30 32 33 00 00 00 00 00 00 00 00 00 pr 03 11:32:34 2023.........
00000038 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
00000054 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 01 B4 00 00 00 00 .....................@......
0000008C 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
000000A8 00 00 00 00 00 00 00 00 00 00 00 00 31 33 43 00 00 00 00 00 00 00 04 00 00 00 04 00 ............13C.............
000000C4 00 00 00 80 42 FB 8B 64 46 A2 20 D8 42 96 03 2B 00 00 00 00 00 00 00 00 00 00 00 00 ....B..dF. .B..+............
000000E0 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
000000FC 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
00000118 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
00000134 31 48 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 00 00 80 43 FA 13 85 45 DF 36 DB 1H..................C...E.6.
00000150 40 A0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 @..@........................
0000016C 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
00000188 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............................
000001A4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 C6 39 F0 00 C7 22 5F 00 C7 74 0B 00 .................9..."_..t..
Position | hex | meaning |
---|---|---|
00-09 | 55 43 53 46 20 4E 4D 52 00 00 | UCSF NMR |
10-13 | 02 01 00 02 | 2 dimetion, real data, 2 for current format |
180(B4)-185 | 31 33 43 00 00 00 | 13C (Carbon 13) |
188-191 | 00 00 04 00 | 1024 data points |
196-199 | 00 00 00 80 | tile size 128 |
200-203 | 42 FB 8B 64 | 125 MHz |
204-207 | 46 A2 20 D8 | Spectrum width 20752.4219 Hz |
208-211 | 42 96 03 2B | Center of spectrum 75 ppm |
308(134)-313 | 31 48 00 00 00 00 | 1H (proton) |
316-319 | 00 00 04 00 | 1024 data points |
324-327 | 00 00 00 80 | tile size 128 |
328-331 | 43 FA 13 85 | 500 MHz |
332-335 | 45 DF 36 DB | Spectrum width 7142.857 Hz |
336-339 | 40 A0 00 40 | Center of Spectrum 5 ppm |
The header for carbon 13 starts at 180. The header for proton starts at 180 + 128 = 308. The data starts at 180 + 128 + 128 = 436.
The number of points among each axis is 1024, and the tile size is 128. So the number of tiles among each axis is 1024 / 128 = 8.
The tiles are saved in 1D vector, if we reorganize the 1d vector into 2D matrix, we will have the following:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 |
33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 |
41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 |
49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 |
57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 |
The data in each tile is also saved in row order. So if we have 64 data points in each tile and want to reorganize the data into 2d matrix, the data will also looks like the table above.