The CSD is a widely used repository for small-molecule organic and metal-organic crystal structures for scientists. Structures deposited with Cambridge Crystallographic Data Centre (CCDC) are publicly available for download at the point of publication or at consent from the depositor. They are also scientifically enriched and included in the database used by software offered by the centre. Targeted subsets of the CSD are also freely available to support teaching and other activities.[2]
History
The CCDC grew out of the activities of the crystallography group led by Olga Kennard OBE FRS in the Department of Organic, Inorganic and Theoretical Chemistry of the University of Cambridge. From 1965, the group began to collect published bibliographic, chemical and crystal structure data for all small molecules studied by X-ray or neutron diffraction. With the rapid developments in computing taking place at this time, this collection was encoded in electronic form and became known as the Cambridge Structural Database (CSD).
The CSD was one of the first numerical scientific databases to begin operations anywhere in the world, and received academic grants from the UK Office for Scientific and Technical Information and then from the UK Science and Engineering Research Council. These funds, together with subventions from National Affiliated Centres, enabled the development of the CSD and its associated software during the 1970s and 1980s. The first releases of the CSD System to the United States, Italy and Japan occurred in the early 1970s. By the early 1980s the CSD System was being distributed in more than 30 countries. As of 2014, the CSD System was distributed to academics in 70 countries.
During the 1980s, interest in the CSD System from pharmaceutical and agrochemicals companies increased significantly. This led to the establishment of the Cambridge Crystallographic Data Centre (CCDC) as an independent company in 1987, with the legal status of a non-profit charitable institution, and with its operations overseen by an international board of governors. The CCDC moved into purpose-built premises on the site of the University Department of Chemistry in 1992.
Kennard retired as Director in 1997 and was succeeded by David Hartley (1997-2002) and Frank Allen (2002-2008). Colin Groom was appointed as executive director from 1 October 2008[3] to September 2017.[4] And most recently, Juergen Harter was appointed CEO in June 2018.[5]
CCDC software products diversified to the use of crystallographic data in applications in the life sciences and crystallography. Much of this software development and marketing is carried out by CCDC Software Limited (founded in 1998), a wholly owned subsidiary which covenants all of its profits back to the CCDC.
Although the CCDC is a self-administering organization, it retains close links with the University of Cambridge, and is a University Partner Institution that is qualified to train postgraduate students for higher degrees (PhD, MPhil).
One Millionth Structure Added to CSD, CSD ID: XOPCAJ
The CSD is updated with about 50,000 new structures each year,[8] and with improvements to existing entries. Entries (structures) in the repository are released for public access as soon as the corresponding entry has appeared in the peer-reviewed scientific literature. Meanwhile, data can also be deposited and published directly through the CSD without an accompanying scientific article as what is known as a CSD Communication.
Periodically, general statistics about the breadth of CSD holdings are reported, for example the January 2014 report.[9] As of January 2019[update], the summary statistics are as follows:[10]
Query
structures
% of CSD
Total # of structures
995,907
100.0
# of different compounds
900,984
-
# of literature sources
2,004
-
Organic structures
431,037
43.5
Transition metal present
478,138
48.2
alkali or alkaline earth metal present
48,056
4.8
main group metal present
101,948
10.3
3D coordinates present
937,809
94.6
Error-free coordinates
926,422
98.81
Neutron studies
2,142
0.2
Powder diffraction studies
4,761
0.5
Low/high temp. studies
503,368
50.8
Absolute configuration determined
28,834
2.9
Disorder present in structure
256,019
25.8
Polymorphic structures
29,817
3.0
R-factor < 0.100
935,419
94.4
R-factor < 0.075
845,708
85.3
R-factor < 0.050
553,042
55.8
R-factor < 0.030
121,806
12.3
No. of atoms with 3D coordinates
85,791,623
-
As of January 2019, the top 25 scientific journals in terms of publication of structures in the CSD repository were:[11]
1. 73,070 structures were reported in Inorg. Chem.
These 25 journals account for 704,541 of the 996,193 or 70.7% of the structures in the CSD.
These data show that most structures are determined by X-ray diffraction, with less than 1% of structures being determined by neutron diffraction or powder diffraction. The number of error-free coordinates were taken as a percentage of structures for which 3D coordinates are present in the CSD.
The significance of the structure factor files, mentioned above, is that, for CSD structures determined by X-ray diffraction that have a structure file, a crystallographer can verify the interpretation of the observed measurements.
Growth trend
Historically, the number of structures in the CSD has grown at an approximately exponential rate passing the 25,000 structures milestone in 1977, the 50,000 structures milestone in 1983, the 125,000 structures milestone in 1992, the 250,000 structures milestone in 2001, the 500,000 structures milestone in 2009,[12][13][14] and the 1,000,000 structures milestone on June 8, 2019.[15] The one millionth structure added to CSD is the crystal structure of 1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one.
Growth Trend of Structure in CSD from 1965 - 2018[11]
Number of published structures per year
Year
# published
Total
2018
53429
974,653
2017
55031
921,224
2016
54975
866,193
2015
53610
811,218
2014
50759
757,608
2013
48025
706,849
2012
45199
661,121
2011
43882
615,922
2010
41240
572,040
2009
40627
530,800
2008
36802
490,173
2007
36569
453,371
2006
34713
416,802
2005
31733
382,089
2004
27988
350,356
2003
26287
322,368
2002
24306
296,081
2001
21781
271,775
2000
19998
249,994
1999
18780
229,996
1998
17289
211,216
1997
15896
193,927
1996
15487
178,031
1995
13001
162,544
1994
12290
149,543
1993
12032
137,253
1992
10691
125,221
1991
9941
114,530
1990
8935
104,589
1989
7750
95,654
1988
7644
87,904
1987
7472
80,260
1986
6873
72,788
1985
6911
65,915
1984
6511
59,004
1983
5250
52,493
1982
5233
47,243
1981
4666
42,010
1980
4252
37,344
1979
3876
33,092
1978
3415
29,216
1977
3092
25,801
1976
2735
22,709
1975
2171
19,974
1974
2142
17,803
1973
1991
15,661
1972
1969
13,670
1971
1548
11,701
1970
1261
10,153
1969
1130
8,892
1968
975
7,762
1967
936
6,787
1966
683
5,851
1965
656
5,168
1923-1964
4512
4,512
Note: data for 1923-1964 are aggregated together in the last line of the table.
File format
3D printed model of Benzoic Acid, taken from a crystal structure determination, created using coordinates from the Cambridge Structural Database, and via the CCDC program Mercury. The top model shows a single molecule of benzoic acid. The bottom model shows a hydrogen-bonded dimer.
The deposited CSD files can be downloaded in the CIF format. The validated and curated CSD files can be exported in a wide range of formats, including CIF, MOL, Mol2, PDB, SHELX and XMol, using tools in the CSD System.
The CCDC uses two different codes to distinguish between the deposited dataset and the curated CSD entry. For example, one specific ‘CSD Communication’ of an organic molecule was deposited with the CCDC and assigned the deposition number 'CCDC-991327.' This allows free public access to the data as deposited. From the deposited data, selected information is extracted to prepare the validated and curated CSD entry which was assigned the refcode 'MITGUT'. As a part of the curation process, CCDC also applies an algorithm, DeCIFer, to help the editors assign chemistry to structures when those representations (e.g. bond types and charge assignments etc.) are missing from the original CIF files submitted.[8] The validated and curated entry is included in the CSD System and WebCSD distributions, with availability restricted to those making appropriate contributions.
Viewing the data
3D printed model of 1-methyl-2,3,4,5-tetrakis((trimethylsilyl)ethynyl)-1H-pyrrole structure. CSD Identifier: XURZAN
Each data set in CSD can be openly viewed and retrieved using the free Access Structure service. Through this web-browser based service, users can view the data set in 2D and 3D, obtain some basic information about the structure, and download the deposited data set. More advanced search functions and curated information are available through the subscription based CSD system.
^"CCDC Homepage". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16.
^Groom C, Allen F (July 2009). "CCDC well groomed: an interview with Colin Groom, Executive Director, Cambridge Crystallographic Data Centre, and Frank Allen, Emeritus Fellow". Journal of Computer-Aided Molecular Design. 23 (7): 391–4. Bibcode:2009JCAMD..23..391W. doi:10.1007/s10822-009-9272-5. PMID19421719.
^Farrugia LJ (1 August 1999). "WinGX suite for small-molecule single-crystal crystallography". Journal of Applied Crystallography. 32 (4): 837–838. doi:10.1107/S0021889899006020.