Protein sequences bound to mineral surfaces persist into deep time

  1. Beatrice Demarchi  Is a corresponding author
  2. Shaun Hall
  3. Teresa Roncal-Herrero
  4. Colin L Freeman  Is a corresponding author
  5. Jos Woolley
  6. Molly K Crisp
  7. Julie Wilson
  8. Anna Fotakis
  9. Roman Fischer
  10. Benedikt M Kessler
  11. Rosa Rakownikow Jersie-Christensen
  12. Jesper V Olsen
  13. James Haile
  14. Jessica Thomas
  15. Curtis W Marean
  16. John Parkington
  17. Samantha Presslee
  18. Julia Lee-Thorp
  19. Peter Ditchfield
  20. Jacqueline F Hamilton
  21. Martyn W Ward
  22. Chunting Michelle Wang
  23. Marvin D Shaw
  24. Terry Harrison
  25. Manuel Domínguez-Rodrigo
  26. Ross DE MacPhee
  27. Amandus Kwekason
  28. Michaela Ecker
  29. Liora Kolska Horwitz
  30. Michael Chazan
  31. Roland Kröger
  32. Jane Thomas-Oates
  33. John H Harding  Is a corresponding author
  34. Enrico Cappellini
  35. Kirsty Penkman
  36. Matthew J Collins  Is a corresponding author
  1. University of York, United Kingdom
  2. University of Sheffield, United Kingdom
  3. University of Copenhagen, Denmark
  4. University of Oxford, United Kingdom
  5. Bangor University, United Kingdom
  6. Arizona State University, United States
  7. Nelson Mandela Metropolitan University, South Africa
  8. University of Cape Town, South Africa
  9. New York University, United States
  10. Complutense University of Madrid, Spain
  11. American Museum of Natural History, United States
  12. National Museum of Tanzania, Tanzania
  13. The Hebrew University, Israel
  14. University of Toronto, Canada
  15. University of the Witwatersrand, South Africa
  16. Centre of Excellence in Mass Spectrometry, University of York, United States

Abstract

Proteins persist longer in the fossil record than DNA, but the longevity, survival mechanisms and substrates remain contested. Here, we demonstrate the role of mineral binding in preserving the protein sequence in ostrich (Struthionidae) eggshell, including from the palaeontological sites of Laetoli (3.8 Ma) and Olduvai Gorge (1.3 Ma) in Tanzania. By tracking protein diagenesis back in time we find consistent patterns of preservation, demonstrating authenticity of the surviving sequences. Molecular dynamics simulations of struthiocalcin-1 and -2, the dominant proteins within the eggshell, reveal that distinct domains bind to the mineral surface. It is the domain with the strongest calculated binding energy to the calcite surface that is selectively preserved. Thermal age calculations demonstrate that the Laetoli and Olduvai peptides are 50 times older than any previously authenticated sequence (equivalent to ~16 Ma at a constant 10°C).

https://doi.org/10.7554/eLife.17092.001

eLife digest

The pattern of chemical reactions that break down the molecules that make our bodies is still fairly mysterious. Archaeologists and geologists hope that dead organisms (or artefacts made from them) might not decay entirely, leaving behind clues to their lives. We know that some molecules are more resistant than others; for example, fats are tough and survive for a long time while DNA is degraded very rapidly. Proteins, which are made of chains of smaller molecules called amino acids, are usually sturdier than DNA. Since the amino acid sequence of a protein reflects the DNA sequence that encodes it, proteins in fossils can help researchers to reconstruct how extinct organisms are related in cases where DNA cannot be retrieved.

Time, temperature, burial environment and the chemistry of the fossil all influence how quickly a protein decays. However, it is not clear what mechanisms slow down decay so that full protein sequences can be preserved and identified after millions of years. As a result, it is difficult to know where to look for these ancient sequences.

In the womb of ostriches, several proteins are responsible for assembling the minerals that make up the ostrich eggshell. These proteins become trapped tightly within the mineral crystals themselves. In this situation, proteins can potentially be preserved over geological time. Demarchi et al. have now studied 3.8 million-year-old eggshells found close to the equator and, despite the extent to which the samples have degraded, discovered fully preserved protein sequences.

Using a computer simulation method called molecular dynamics, Demarchi et al. calculated that the protein sequences that are able to survive the longest are stabilized by strong binding to the surface of the mineral crystals. The authenticity of these sequences was tested thoroughly using a combination of several approaches that Demarchi et al. recommend using as a standard for ancient protein studies.

Overall, it appears that biominerals are an excellent source of ancient protein sequences because mineral binding ensures survival. A systematic survey of fossil biominerals from different environments is now needed to assess whether these biomolecules have the potential to act as barcodes for interpreting the evolution of organisms.

https://doi.org/10.7554/eLife.17092.002

Introduction

Unknown mechanisms of survival of proteins into deep time

Ancient protein and DNA sequences are revolutionising our understanding of the past, providing information on phylogeny, migration, evolution, domestication and extinction (Hagelberg et al., 2015; Cappellini et al., 2014). However, the absence of data from warm regions and deep time (Wade, 2015) highlights the fragility of these biomolecules and has so far hampered our ability to answer fundamental evolutionary questions, such as resolving the phylogenetic tree of the genus Homo in Africa. The survival of proteins and DNA in tropical environments and in fossils that go back a few million years (Ma) is deemed extremely unlikely and therefore the impact of the 'biomolecular revolution' in palaeontology and palaeoanthropology has so far been relatively limited.

Claims for exceptional preservation in the fossil record have been put forward in a number of studies (Towe and Urbanek, 1972; Bertazzo et al., 2015; Schweitzer et al., 2013; Cleland et al., 2015), but these have not been satisfactorily substantiated. Morphological (Towe and Urbanek, 1972; Bertazzo et al., 2015), immunological (Schweitzer et al., 2013) and spectroscopic (Bertazzo et al., 2015) evidence of preserved tissues in dinosaurs and other fossils seems to be inconsistent with the observed levels of hydrolysis, dehydration and racemization (Penkman et al., 2013) in intracrystalline proteins from the fossil mollusc shell (Sykes et al., 1995) and eggshell (Brooks et al., 1990). The mechanisms that might allow preservation over palaeontological and geological time scales are also poorly understood: crosslinking, organo-metallic complexing, including with iron, compression/confinement (Logan et al., 1991; Schweitzer et al., 2014), and mineral stabilization (Collins et al., 2000) have all been proposed as mechanisms that enhance the survival of ancient biomolecules.

The role of temperature in accelerating diagenesis

A confounding factor when evaluating the authenticity and antiquity of biomolecular sequences is the geographic area of provenance of the fossils and therefore the combined effect of time and temperature on the extent of degradation. Here we have used kinetic estimates of degradation rates of DNA (Allentoft et al., 2012), collagen in bones (Buckley et al., 2008), and intracrystalline amino acids (Crisp et al., 2013) to normalize their numerical (chronological) ages to thermal age (Wehmiller, 1977) (Figure 1, Figure 1—source data 1, Appendix 1). Thermal age is a measure which enables simple comparison between ancient biomolecular targets by normalising them to an equivalent (thermal) age, allowing all samples to be treated as having experienced a constant temperature of 10°C. Thus samples from cooler sites, which experience slower rates of chemical reaction, will have thermal ages younger than their geochronological age, whilst samples from warmer sites will be thermally ‘older’. Various factors can affect the effective diagenetic temperature experienced by a sample (and therefore impact on its thermal age), from burial depth to seasonal and interglacial / glacial cycles (Wehmiller, 1977; Huang et al., 2000; Eischeid et al., 1995). The greatest absolute ages for recovered DNA (Orlando et al., 2013) (0.7 Ma = 0.08 Ma@10°C) and for protein (Rybczynski, 2013) (3.5 Ma = 0.3 Ma@10°C) sequences are from high latitudes and their survival is consistent with predictions from the kinetic data. Younger samples from more temperate latitudes will have greater thermal ages, yet the oldest of these which has preserved protein (Weybourne Crag: 1.5 Ma = 0.2 Ma@10°C) has a thermal age similar to that of Middle Pleistocene DNA at Sima de los Huesos (0.4 Ma = 0.2 Ma@10°C) (Meyer et al., 2014).

Eggshell peptide sequences from Africa have thermal ages two orders of magnitude older than those reported for DNA or bone collagen.

(A) Sites reporting the oldest DNA and collagen sequences are from high latitude sites compared to ostrich eggshell samples from sites in Africa illustrated in (B) for which the current mean annual air temperatures are much higher. (C) Kinetic estimates of rates of decay for DNA (Lindahl and Nyberg, 1972), collagen (Buckley and Collins, 2011) and ostrich eggshell proteins (Crisp et al., 2013) were used to estimate thermal age assuming a constant 10°C (Figure 1—source data 1; Appendix 1. For Elands Bay Cave and Pinnacle Point the oldest samples are shown). Note log scale on the z-axis: struthiocalcin-1 peptide survival is two orders of magnitude greater than any previously reported sequence, offering scope for the survival of peptide sequences into deep time.

https://doi.org/10.7554/eLife.17092.003
Figure 1—source data 1

Data and calculations for thermal ages reported in Figure 1 and in Supplementary file 1.

https://doi.org/10.7554/eLife.17092.004

Aim of the study: understanding protein survival in ostrich eggshell from hot environments

Here we explore the impact of strong protein binding in biominerals and its effect on sequence survival, by targeting ancient ostrich eggshell (Struthio camelus; Struthioniformes), which is abundant in archaeological and palaeontological sites throughout Africa (Materials and methods). Our aim was to elucidate a mechanism of preservation and to set out a rigorous methodology for the authentication of ancient protein sequences. We isolated and characterised the intracrystalline proteins (Figure 2, Figure 2—figure supplement 1, Appendix 2) and tracked their diagenesis back in time to 3.8 Ma ago. Using a systematic approach, we validated the sequences from each of the eggshell samples analysed using amino acid racemization (AAR), organic volatile compounds, ancient DNA and proteomic analyses. All our results are supported by in-depth analysis of patterns of diagenesis in both samples and blanks as well as the evaluation of potential contamination and carry-over.

Figure 2 with 2 supplements see all
Proteome persistence and patterns of degradation.

(A) Amino acid racemization in ostrich eggshell up to 3.8 Ma old from sites in South Africa and Tanzania. (B) Linear increase of THAA Val D/L values with the log of thermal age. (C) Exponential decrease of the number of identified MS/MS spectra with age (THAA Val D/L). (D) The average hydropathicity of the peptides identified remains stable up to Val D/Ls ~1. Note that Val D/Ls > 1.0 are unexpected and may be due to decomposition processes occurring in the closed system. The intracrystalline proteome composition in modern eggshell does not vary across microstructural layers (Figure 2—figure supplement 1), but patterns of degradation are different between fossil samples and purified proteins degraded at high temperature in the absence of the mineral (Figure 2—figure supplement 2).

https://doi.org/10.7554/eLife.17092.005

Results

Fossil eggshell from Africa, 0–3.8 Ma: provenance and thermal age calculations

Twenty-four eggshell samples were sourced from well-dated sites in South Africa and Tanzania (Figure 1, Table 1): Elands Bay Cave (0.3–16 ka BP, Table 3), Pinnacle Point Caves PP 5/6 and PP 30 (50–80 ka BP and ~150 ka BP, respectively, Table 4), Wonderwerk Cave (1 Ma, Table 5), Olduvai Gorge (1.34 Ma, Table 6) and Laetoli (2.6–4.3 Ma, Table 7). The age and stratigraphy of the oldest fossils, from Laetoli, is well-constrained despite the eggshell fragments being surface finds: their morphology shows no evidence of long-distance transport and the fossil-bearing horizons are well identified within the stratigraphy. The absence of lava flows in stratigraphic proximity also excludes the possibility that these fragments had been exposed to additional heat. The provenance of the fragments from Olduvai Gorge is also secure, as these were found in situ during the excavation of the Bell Korongo site, which overlies directly a volcanic tuff dated by 40Ar/39Ar (Domínguez-Rodrigo et al., 2013).

Table 1

Summary of archaeological and paleontological eggshell samples analysed in this study. See also Tables 37.

https://doi.org/10.7554/eLife.17092.008
SiteApproximate age range (ka)Approximate thermal age range (ka@10°C)Number of specimens
Elands Bay Cave0.3–160.5–458
Pinnacle Point 5/650–150120–4708
Wonderwerk~1000~36001
Olduvai~1340~163004
Laetoli2600–43008900–204003
Table 2

Binding of proteins to the (10.4) calcite surface. The binding energies calculated as (a) mean for the full protein (by minimization, see Appendix 3); (b) for four individual domains within the proteins (by molecular dynamics [ovocleidin]; by minimization [struthiocalcin]); (c) for the four domains treated as peptides (by molecular dynamics, see Appendix 3).

https://doi.org/10.7554/eLife.17092.009
OC-17SCA-1SCA-2
Charge on the protein+7 (balanced
by Cl-)
−11 (balanced by Na+, Ca2+)−10 (balanced byCa2+)
Binding energy (mean): kJ mol−1−197 ± 22−142 ± 33
Binding energy (domains): kJ mol−1−422 ± 43−423 ± 42 (YHHGEEEEDVWI)
−611 ± 44 (YSALDDDDYPKG)
−255 ± 72 (SDSEEEAGEEVW)
−231 ± 68 (ASIHSEEEHQAIV)
Binding energy (peptides only) kJ mol−1−142 ± 19 (YHHGEEEEDVWI)
−219 ± 24 (YSALDDDDYPKG)
−131 ± 32 (SDSEEEAGEEVW)
−122 ± 41 (ASIHSEEEHQAIV)
Water molecules displaced21.320.223.1
Estimated binding free energy: kJ mol−1−188 ± 37−159 ± 24−99 ± 39
Residence times
for water (ps)
[average surface bound water molecule = 120 ps]
130 ± 3 (YHHGEEEEDVWI)
135 ± 3 (YSALDDDDYPKG)
124 ± 4 (SDSEEEAGEEVW)
123 ± 5 (ASIHSEEEHQAIV)
Table 3

Summary of samples from Elands Bay Cave, South Africa. The stratigraphic layers have been independently dated by radiocarbon. Unpublished uncalibrated dates provided by J. Parkington. Date calibration was performed with OxCal v.4.2 (Ramsey, 2009. Calibration curves: IntCal13 for dates obtained on charcoal; Marine13 for dates obtained on shells/crayfish, DeltaR = 93 ± 28 [Dewar et al., 2012]). Age estimates for undated layers based on estimating the median (mid-point) of two dates obtained on layers bracketing the layer with OES samples.

https://doi.org/10.7554/eLife.17092.010
LOTNEaarLayerAge (cal BP) 95.4%Material used for 14C dating/notes
18686887Kaunda<323 (estimate)Layer above dates on layer NKOM
18726888George Best322–1008Layer between dates on layers NKOM and EDDI
18666889D. Lamour906–2282Layer between dates on layers EDDI and LARM
18496891Maroon Robson8773 ± 125Charcoal
18506893Nero8748–10096Layer between dates on layers Maroon Robson / Burnt Robeson
18236896Crayfish11545 ± 441Crayfish
18196899Smoke12589 ± 104Charcoal
18406907OBS 115208–15940Layer between dates on layers Smoke and SOSE
Table 4

Sample details for sub-fossil OES analysed for LC-MS/MS; from Pinnacle Point, South Africa. Stratigraphic information and weighted mean OSL age estimates (ka) for PP 5–6 (Karkanas et al., 2015) and PP 30 (Rector and Reed, 2010).

https://doi.org/10.7554/eLife.17092.011
SiteLOTNEaarArchaeological sample informationStratigraphic aggregateAge (ka)
PP5-646137676Plotted Find 102627, Lot 3151RBSR51 ± 2
PP5-646497283Plotted Find 165702, Lot 8038SGS64 ± 3
PP5-646717316Specimen 273467, Lot 3255SADBS71 ± 3
PP5-646057198Specimen 273489, Lot 3277SADBS71 ± 3
PP5-646527286Plotted Find 178331, Lot 8172ALBS72 ± 3
PP5-646757320Specimen 273514, Lot 7980LBSR81 ± 4
PP 3046837328Specimen 66008, Lot 1795Single horizon~151
PP 3046977342Specimen 65168, Lot 1750Single horizon~151
Table 5

Sample details for sub-fossil OES samples from Wonderwerk Cave, South Africa. Ages based on cosmogenic isotope burial dating and magnetostratigraphy, from Matmon et al. (2012).

https://doi.org/10.7554/eLife.17092.012
LOTNEaarStratumIndependent age (Ma)
1442610581ME46, SPF#4390, Exc. 1, stratum 10, square Q33, depth 15–20 cm1.07–0.99
Table 6

Sample details for fossil OES samples from Olduvai, Tanzania.

https://doi.org/10.7554/eLife.17092.013
LOTNEaarLocality/StratumIndependent age (Ma)
1557510955Sample BK09-31501.338 ± 0.024
1557810958Sample BK10-53091.338 ± 0.024
1557910959Sample BK09-26271.338 ± 0.024
1558210962Sample BK09-27061.338 ± 0.024
Table 7

Sample details for fossil OES samples from Laetoli, Tanzania. Ages of the strata and localities (40Ar/39Ar) from Deino (2011). LOT 13901 is attributed to Struthio camelus. LOTs 13902 and 13898 are attributed to Struthio kakesiensis (Harrison and Msuya, 2005).

https://doi.org/10.7554/eLife.17092.014
LOTNEaarLocality/StratumIndependent age (Ma)
1390110574Loc 15, Upper Ndolanya Beds~2.66
1390210573Loc 10 West, Upper Laetolil Beds~3.8–3.85
1389810575Kakesio 1−6, Lower Laetolil Beds~3.85 -> 4.3

The chronological ages of the samples were normalised to thermal ages: the mean annual air temperature (MAT) for each site was estimated from the NOAA NCDC GCPS monthly weather station (Eischeid et al., 1995; Karl et al., 1990) and borehole data (Huang et al., 2000; National Climatic Data Center (NCDC), 2012) (Appendix 1—table 1). Samples on the surface or buried at shallow depth will have experienced an effective temperature which is higher than the MAT, as rates of reaction scale exponentially with temperature (Wehmiller, 1977). The greater the seasonal range at the site, the older the thermal age will be, but the effect of seasonal fluctuations will be mitigated by burial depth, which dampens temperature changes. Holocene sites which today have a MAT of exactly 10°C will have been cooler in the past 500 years due to recent anthropogenic warming. In this study, we used borehole temperature estimates (Huang et al., 2000) or long-term historic records (Eischeid et al., 1995) to counter this effect. Pre-Holocene samples from sites which today have an MAT of 10°C will have an even younger thermal age due to the reduction in temperature during glacial periods. This retards the rate of chemical degradation, and therefore slows the advance of thermal age. While we did not correct for seasonal fluctuation, a correction was applied for altitude. The long-term temperature model of Hansen et al. (2013), scaled to local or regional estimates of present day values and predicted temperature decline at the last glacial maximum (LGM), was used to project MATs from present day to the time of deposition (Appendix 1—tables 1, 2, 4).

Ostrich eggshell protein degradation was compared with the extent of degradation of DNA and bone collagen detected in a variety of Northern Hemisphere sites (Figure 1). Published kinetic parameters for the degradation of the molecules (Appendix 1—table 3; [Crisp et al., 2013; Lindahl and Nyberg, 1972; Holmes et al., 2005]) were used to calculate the relative rate difference over a given interval of the long-term temperature record and to quantify the offset from the reference temperature of 10°C, thus estimating the thermal age in years@10°C for each sample (Figure 1C). It is clear that Northern Hemisphere samples are thermally younger than their chronological age (e.g. Ellesmere Island is ~0.02 Ma@10°C), while the age of the eggshell samples considered here increases, e.g. the 3.8 Ma sample from Laetoli and the 1.34 Ma Olduvai samples are estimated to have thermal ages of ~16 Ma@10°C (Appendix 1—table 4; Figure 1—source data 1). The difference in chronological age between our two oldest sites is therefore minimised by the effect of temperature, which is dampened in Laetoli due to the greater altitude relative to Olduvai.

Eggshell contains proteins (struthiocalcin-1 and -2) that bind very strongly to the calcite surface: good candidates for long-term survival?

This sample set, spanning the last ~16 Ma@10°C (Table 1), was chosen in order to explore patterns of diagenesis and protein survival using a well-established experimental approach that can isolate the intracrystalline fraction of proteins enclosed in biominerals, including ostrich eggshell (bleaching; Crisp et al., 2013). The intracrystalline fraction of avian eggshell typically contains C-lectins; in ostrich these are struthiocalcin 1 & 2 (SCA-1 & SCA-2) (Mann and Siedler, 2004). The eggshell proteins were characterised in terms of their amino acid composition across microstructural layers (Appendix 2, sections A and B) and the main proteins sequenced and identified (Appendix 2, section C) in modern eggshell, revealing uniform composition across the eggshell layers. Therefore, samples of the archaeological and paleontological eggshell, usually recovered in a fragmentary state, can be considered to be representative of the overall proteome.

The crystallography of SCA-1 (Ruiz-Arellano et al., 2015) reveals a similar overall structure to ovocleidin-17 (OC-17), which has previously been proposed to play a catalytic role in the calcification of chicken eggshell via the positively charged cluster of arginine residues interacting with the carbonates on the (10.4) calcite surface (Freeman et al., 2010). OC-17 is, however, absent in ostrich; instead, SCA-1 and 2 are negatively charged (Table 2) and thus likely to bind to calcium ions.

A molecular dynamics (MD) study of the binding of whole SCA molecules at the mineral surface allowed the strongest binding regions of SCA-1 and SCA-2 to be identified, two for each of the two proteins (Appendix 3). In MD simulations the four peptide sequences that cover the binding regions (Table 2) were moved close to the (10.4) calcite surface from aqueous (bulk) solution to determine their respective binding energies (Appendix 3). All four peptides showed negative binding energies, indicating it was energetically favourable for them to bind to the calcite surface, rather than to remain in solution. SCA-1 bound more strongly than SCA-2 and the binding energies for all four peptides had the same relative order as in the simulations with full proteins. This indicates that the peptides operate as effective proxies for the binding of SCA. The differences between bindings of the different peptides are probably due to the individual amino acids and the primary structure of the peptide enabling favoured binding configurations.

When a molecule binds at the surface there will also be changes in entropy - an entropy loss for the molecule as it becomes bound and an entropy gain as water molecules on the surface are displaced. Given only one molecule binds, compared to the displacement of multiple water molecules, this will be an entropically favourable process. We have previously estimated the entropy associated with the water molecules and use this as a correction to the internal energy to estimate the free energy of binding (Freeman and Harding, 2014). These estimated free energy values (including the influence of water displacement) demonstrate the same trends as the configurational energy, since the number of water molecules displaced in all cases is similar.

The structure of the water close to the interface is also more ordered than bulk water and has lower energy. Thus, when hydrolysis of the bound protein or peptide occurs, it must react with the stabilized water at the interface, not the water in the bulk. This will raise the barrier to hydrolysis and thus promote the survival of the sequence (see also the discussion below). We would therefore expect that the stronger the peptide binding, the more likely the sequence is to survive in the fossil record, as it is best stabilized by its interactions with the mineral surface and must react with stabilized water. The MD simulations thus predict that the YSALDDDDYPKG sequence, with the lowest binding energy (Table 2) will survive the longest.

Tracking protein breakdown in fossil ostrich eggshell: a multi-analytical approach to validate ancient sequences

For fossil eggshell samples the extent of degradation, quantified by chiral amino acid analysis (AAR), shows that both hydrolysis and racemization increase with time, and that racemic equilibrium is reached in samples older than 1 Ma (~3.6 Ma@10°C; Figure 2; and Appendix 4—table 1 and 2). As degradation proceeds, the complexity of the proteome decreases, until only SCA-1 and SCA-2 are detected by LC-MS/MS in the oldest samples (Supplementary file 1). These two proteins are extremely well preserved in samples up to 150,000 years old, but by 3.8 Ma few peptides are recovered (Figure 3, Figure 3—figure supplement 1 and 2). A total of 22 peptide sequences was recovered from SCA-1 & SCA-2 in samples from Laetoli (Appendix 5, section A; Supplementary file 2), consistent with the idea that dehydration, in addition to mineral binding, may also play a role in retarding degradation of non-binding peptides (Collins and Riley, 2000). However, 80% of the spectra, consistently identified in ten independent LC-MS/MS analyses of three ostrich eggshell samples from Laetoli were assigned to charged species that contained the four Asp residues found in the peptide YSALDDDDYPKG. The survival of this Asp-rich peptide region is not limited to the samples from Laetoli; the eggshells from Olduvai (~16 Ma@10°C) and Wonderwerk (~3.6 Ma@10°C) also show that this region of SCA-1 is preferentially preserved.

Figure 3 with 2 supplements see all
Survival of struthiocalcin 1 and struthiocalcin 2 peptides.

Over time (and increasing THAA Val D/L values) the spectral count decreases as degradation progresses. Blue bars highlight the peptides investigated computationally (represented by the filled atoms in the models). Highly degraded samples (Val D/L ~0.8–1.1) preserve the DDDD-containing peptide. The full time series is shown for SCA-1 in Figure 3—figure supplement 1 and for SCA-2 in Figure 3—figure supplement 2.

https://doi.org/10.7554/eLife.17092.015

This peptide does not survive in the absence of the mineral, as shown by the artificial degradation experiments we conducted on purified eggshell proteins heated at 140°C in water (Figure 2—figure supplement 2, Appendix 4, sections B and C). Indeed, the same region of the protein was too flexible to be determined when the crystallographic structure of pure SCA-1 was solved (Ruiz-Arellano et al., 2015). The patterns of degradation of the same proteins heated in water vs after demineralisation of the eggshell mineral also differ significantly from each other. It is noteworthy that the hydropathicity calculated for the surviving peptides decreases in eggshell mineral and increases in water (Figure 2 and Figure 2—figure supplement 2), consistent with the hypothesis that mineral binding plays a crucial role in the survival of selected peptide sequences.

The authenticity of the peptide sequences recovered in the oldest samples was thoroughly assessed (Appendix 5). The amino acid concentration was analysed in all bleached eggshell samples and controls (procedural blanks); concentrations in the blanks were negligible, while the samples retain the original organic fraction within the intracrystalline environment (Figure 4A). In addition, the presence of volatile organic compounds in 2.7 Ma ostrich eggshell demonstrates the stability of ostrich eggshell as a closed system (Appendix 5, Section E). Ratite eggshell has previously proven to be an excellent source of ancient DNA (Oskam et al., 2010) but, unsurprisingly, NGS sequencing failed to recover avian DNA from the Laetoli eggshell we tested (Appendix 5, Section F). Water blanks were injected between each LC-MS/MS eggshell sample analysis to assess carry-over (Figure 4B–F). Despite low levels of SCA-1 being occasionally detected in some of the blanks (Figure 4D), the effective carry-over from sample to sample can be estimated to be below 0.01%. We also stress that each batch of fossil eggshell was analysed separately in time (Figure 4C) and that therefore carry-over between younger and older eggshell samples is impossible. Finally, independent analyses of the results in a second laboratory (Copenhagen) also demonstrated the replicability of our results (Appendix 5, Section A). All peptides and proteins detected in this study presented damage patterns (i.e. diagenesis-induced modifications, such as deamidation, oxidation) that are entirely consistent with the age of the samples (Appendix 5, Section D; Supplementary file 3).

Authenticity of the ancient sequences.

Amino acid analyses (A): Total concentrations in all eggshell samples (sum of Asx, Glx, Gly, Ala, Val and Ile). Carry-over: (B) Total ion chromatogram for one eggshell sample (EBC_1823) and the blank analysed immediately after (blank_post_EBC1823). (C) Spectral abundance of SCA-1 and SCA-2 in LC-MS/MS blanks. (D) SCA-1 coverage in the blank analysed after a Pinnacle Point eggshell sample PP_4652. Note 'DDDD-' and 'EEEED-' peptides and Asn deamidation. (E) Extracted ion chromatogram for LDDDDYPK in EBC_1823, blank_post_EBC1823 and EBC_1819. (F) Absolute and relative total abundance of 'DDDD' peptides in Laetoli samples/blanks. Signal reduction is at least 100-fold (more often 1000- or 10,000-fold). Independent replication and manual de novo sequencing of the peptides from Laetoli (Appendix 5, section A; Supplementary file 2), consistency of diagenesis-induced modifications (Appendix 5, section D; Supplementary file 3) and volatile organic compound analyses (Appendix 5, section E) were also used to validate the results obtained.

https://doi.org/10.7554/eLife.17092.018

Discussion

Surface stabilization is the mechanism for long-term survival of biomineralizing peptides

The breakdown of the proteins and peptides should primarily occur via hydrolysis, involving water and proteins or peptides as reactants. The rate-determining step is the attack of a water molecule (or molecules, see Pan et al., 2011 for an extended discussion). We schematically map out the pathway in Figure 5 (red line) where the reaction coordinate denotes approach of water to the peptide and their subsequent reaction. The process requires energy (heat) to be given to the system in order to overcome the energy barrier. In hot environments, such as Tanzania, the high ambient heat means that many interactions have sufficient energy to overcome this barrier, yet our experimental findings demonstrate that some peptides survive.

Schematic diagram of energy barriers for peptide hydrolysis.

A pictorial representation of the energy barriers associated with the lysis of the peptide. The process in bulk water is depicted in red and the process at the surface is depicted in blue. The surface process shows a larger barrier due to the stabilization of the reactants at the surface.

https://doi.org/10.7554/eLife.17092.019

We argue that the mechanism allowing the survival of the ancient sequences over ~4 Ma (~16 Ma@10°C) at equatorial sites is the stabilization of optimally configured peptides and associated water molecules by surface binding at this interface. The low, negative free energy of binding (Table 2) of the amino acid residues means that they will readily bind to the calcite surface and remain bound indefinitely and this binding stabilizes the peptides by lowering their configurational energy (Table 2). Thus, both the position of the ground state and the top of the barrier will be lowered with respect to the situation when the peptide is in bulk water (Figure 5). The binding of the peptide also forces the hydrolysis reaction to take place with the stabilized water close to the calcite surface.

Furthermore, the presence of the calcite surface significantly stabilizes the water molecules surrounding the peptide. Estimates of the residence times (Table 2) and diffusion values of water molecules trapped between the protein and mineral surface indicate that these water molecules have greater residence times and lower diffusion rates than water molecules on the surface with no protein present. This large stabilization of water molecules selectively lowers the ground state energy of the reactants (protein or peptide plus water) at the interface with respect to the bulk. Thus the energy barrier will be significantly larger for the bound protein or peptide than for the unbound one. Our surface molecules would therefore need more energy in the system (i.e. a higher temperature) to overcome the augmented barrier. The net effect of the binding of the protein or peptide is therefore to retard hydrolysis and prolong peptide sequence survival, albeit of a select (mineral-binding) region of the protein.

Translating this concept to real samples in geological settings, the burial temperature in Tanzania, which may be high enough to allow rapid hydrolytic breakdown of most proteins, would not be enough to hydrolyse mineral bound peptides over corresponding timescales because of their own stabilization but particularly because they are surrounded by the stabilized water. This is effectively equivalent to a localised 'cooling' effect: the water molecules at the calcite surface would therefore be expected to operate as if they were 'cooler' in terms of reactivity and rates of peptide bond hydrolysis.

Conclusions: biominerals are a source of ancient protein sequences, preserved over geological timescales

The survival of 3.8 Ma old peptide sequences in equatorial Africa corresponds to an estimated thermal age of ~16 Ma@10°C, two orders of magnitude beyond the oldest recovered DNA (Figure 1). We explain this exceptional preservation in terms of surface stabilization of both the peptide and water molecules involved in the hydrolytic breakdown of the peptide. Our discovery identifies mineral-binding proteins as the most likely source of ancient biomolecular sequences in the fossil record.

In this study, we also set out parameters for the authentication of ancient sequences: the combination of the consistency in patterns of protein degradation and survival of particular peptide regions, independent replication of the results and an in-depth analysis of analytical blanks provide overwhelming evidence for the endogeneity and integrity of the peptides recovered. We suggest that all ancient proteomics studies undertake a similar approach to verify the authenticity of the sequences reported.

We anticipate that this study will open up new avenues in palaeontology and palaeoanthropology, for the first time enabling direct comparison between morphological and molecular records of fossils in deep time. Furthermore, the selective preservation of domains associated with biomineralization offers a novel strategy for uncovering functional regions governing mineral formation.

Materials and methods

Summary

Ostrich eggshell samples were ground and bleached for 72 hr (NaOCl 12% wt/vol) and rinsed thoroughly before demineralization. Amino acid and mass spectrometry analysis of ancient proteins was conducted using published techniques (Buckley et al., 2009; Crisp et al., 2013). Modifications include the use of trypsin and elastase as digestion enzymes for separate preparations (Welker et al., 2015). Liquid chromatography tandem mass spectrometry (LC-MS/MS) analyses were performed on Thermo Scientific Orbitrap platforms. Resulting spectra were searched against the Struthioniformes genomes using PEAKS (version 7.5 [Ma et al., 2003]). For PEAKS, FDR rate was set at 0.5%, with proteins accepted with −10lgp scores ≥ 40 and ALC (%) ≥ 80. A combination of minimization and conventional MD using the DL_Poly Classic code was used to explore possible protein–calcite binding geometries (Freeman et al., 2011).

Materials

Modern ostrich eggshell (OES)

Request a detailed protocol

Modern OES was purchased from Oslinc, an ostrich farm based in Boston, Lincolnshire, UK (www.oslinc.co.uk). The complete shell was less than 1 year old; for the purposes of reproducibility all analyses were performed using the same eggshell. Modern OES was used for the investigation of the amino acid composition of the microstructural layers as well as high-temperature studies on purified proteins (kinetic experiments).

Archaeological and palaeontological OES

Request a detailed protocol

The details of the archaeological and palaeontological samples included in this study are detailed in Tables 37. LOT numbers and individual NEaar (sample) numbers were attributed at the NEaar laboratory, University of York.

Elands Bay Cave

Request a detailed protocol

Elands Bay Cave (EBC) is located on the present coastline about 200 km north of Cape Town (South Africa). Human occupation occurred repeatedly since the terminal Pleistocene. Ostrich eggshell is present throughout the sequence. The site’s chronology has been firmly established through multiple radiocarbon dates; the samples analysed in this study can each be assigned to an age range on the basis of dates obtained on each layer and/or bracketing the OES (Parkington, 1980; Stowe and Sealy, 2016).

Pinnacle Point

Request a detailed protocol

The caves at Pinnacle Point (PP) have been in the spotlight of archaeological research for the past few years, and they have yielded extraordinary evidence for early modern human behaviour as well as detailed palaeoclimatic information (Karkanas et al., 2015; Bar-Matthews et al., 2010; Brown et al., 2009; Marean, 2010; Marean et al., 2007). The OES fragments analysed in this study come from two sites in the PP complex, PP 5–6 and PP 30, and were selected from excavations done up to 2010. PP 5–6 is a well-dated sequence, spanning ca. 50–90 ka. PP 30 is a hyena den (~150 ka BP) and the OES material reflects a single depositional episode.

Wonderwerk Cave

Request a detailed protocol

Wonderwerk Cave (WW) is located in the arid interior of South Africa, near the southern border of the Kalahari Desert. The site has yielded a unique ca. 2 million years long archaeological sequence (Horwitz and Chazan, 2015; Berna et al., 2012). Stratum 10, from which the OES samples analysed here are derived, bears the earliest evidence of intentional use of fire during the Acheulean, constrained to the Jaramillo subchron (1.07–0.99 Ma) based on a combination of paleomagnetic and cosmogenic burial age dating (Horwitz and Chazan, 2015; Berna et al., 2012). The OES fragments from the cave have been used as an effective proxy for refining palaeoclimatic and environmental reconstructions, especially for the early-mid Pleistocene and Holocene levels (Ecker et al., 2015; Lee-Thorp and Ecker, 2015).

Olduvai

Request a detailed protocol

Olduvai Gorge (Tanzania) contains an extensive record of the past two million years of human evolution. The eggshell samples analyzed in the present study were found in situ during the excavation of the BK (Bell Korongo) site located in uppermost Bed II. The site is exceptional by the large amount of ostrich eggshell fragments that were found throughout all its stratigraphic sequence. A volcanic tuff just underlying BK was recently dated to ~1.34 Ma (Domínguez-Rodrigo et al., 2013). The samples analyzed were found in Level 4, which contains a wealth of fossil bones and associated stone tools. This level has been interpreted as a central-place where the butchery of several animal carcasses (including megafaunal remains from Sivatherium and Pelorovis) was carried out (Domínguez-Rodrigo et al., 2014).

Laetoli

Request a detailed protocol

Laetoli (Tanzania) is one of the most famous sites for palaeoanthropologists: it has yielded hominin and other animal remains (Harrison, 2011a, 2011b) and the first unequivocal evidence for bipedalism thanks to the footprints of Australopithecus afarensis preserved in Pliocene volcanic ash, discovered by Mary Leakey in 1976 (Leakey and Hay, 1979).

The eggshell at Laetoli are surface finds, but visual examinations show no evidence of rolling, transportation or weathering (having been exposed on the surface for only a very short period of time after having eroded out of the sediment). As a consequence, there is no likelihood of long-distance transport. The location and preservation of the fossils, the absence of significant spatial displacement of surface finds, the short stratigraphic sections at each collecting spot, and the identification of the fossil-bearing horizons in each of those sections, allow the fossils to be placed quite precisely in their original stratigraphic context. The age and stratigraphy given for each of the samples can be assigned with a high degree of confidence. There are no lava flows in stratigraphic proximity or direct superposition to the stratigraphic units from which the Lower Laetolil and Upper Ndolanya specimens were recovered. Given that more than 40 m of consolidated sediment, and a time difference of 1.5 million years, separate the overlying lava flow (the Ogol Lavas) from the stratum from which the Upper Laetolil fossils were obtained, and that the intervening fossil-bearing beds show no geological evidence of having been impacted by heating, we do not believe that the samples have been exposed to additional heating that would have made them thermally older than we predict (Table 7).

Methods

Bleaching pre-treatment

Request a detailed protocol

All analyses reported in this study were conducted on bleached ostrich eggshell (OES) in order to isolate the functionally intra-crystalline proteins. Based on the results of Crisp et al. (2013), powdered eggshell was submerged in NaOCl (12% w/v) for a minimum of 72 hr.

Chiral amino acid (AAR) analyses

Request a detailed protocol

Sample preparation for chiral amino acid analyses (total hydrolysable and free amino acids fractions: THAA and FAA) was carried out following the method of Crisp et al. (2013). Separation of the chiral forms (D and L) of multiple amino acids was performed by RP-HPLC with fluorescence detection using a modified method of Kaufman and Manley (1998). The amino acids reported here are among those detected routinely with good chromatographic resolution in OES: Asx and Glx (Asp+Asn and Glu +Gln due to irreversible deamidation during sample preparation), alanine (Ala), valine (Val) and isoleucine (Ile). Serine (Ser) is not reported as its decomposition patterns are complicated by decomposition and a reversal in D/L values, therefore its utility decreases for older samples (Vallentyne, 1964; Kimber et al., 1986).

Proteomics

High-temperature experiments: purified proteins from modern OES

Request a detailed protocol

Bleached modern OES powders were demineralized in cold dilute acetic acid (10% v/v, 4°C, overnight). The solution was centrifuged at 4500 RPM for 1 hr at 4°C, ultrafiltered (Amicon ultra-filters, 10 kDa) and rinsed repeatedly with ultrapure water. The concentrated proteins were lyophilized and resuspended in ultrapure water. 125 μL of suspension was transferred to four individual sterile hydrolysis vials, each sealed with a clean teflon cap and heated at 140°C for 2, 8, 24 and 120 hr respectively. Alkylation / reduction of disulphide bonds was carried out using dithiothreitol (60°C, 60 min; Sigma Aldrich, St Louis, MO) and iodoacetamide (room temperature, 45 min; Sigma Aldrich). The solutions were then dried down in a centrifugal evaporator to be analysed directly by LC-MS/MS.

Protein extraction: archaeological OES (York)

Request a detailed protocol

The average sample size for proteomics was 35 mg of OES powder. Two separate preparations were carried out on two subsamples (~17 mg each), for digestion with trypsin ('T') and elastase ('E'). All subsamples were demineralized in cold 0.6 M HCl and the solution neutralized, lyophilized and resuspended in ammonium bicarbonate or Tris-HCl buffer containing the RapiGest SF surfactant (1 mg/mL; Waters Ltd, Hertfordshire, UK), for 'T' and 'E' subsamples, respectively. Following reduction and alkylation of disulphide bonds with DTT and IAA according to the usual protocols, digestion was carried out overnight at 37°C by adding: 4 μL trypsin (0.5 μg/μL; Promega, 2800 Woods Hollow Road Madison, WI 53,711 USA) for 'T' subsamples or 4 μL elastase (1 μg/μL; Worthington, Lakewood, NJ, USA) for 'E' subsamples.

Digestion was stopped by adding trifluoroacetic acid (TFA) to a final concentration of ~0.1% (v/v) and RapiGest™ precipitated by incubating in an acidic environment at 37°C for 30 min. Samples were centrifuged on a bench-top centrifuge (13000 RPM, 30 min) and purified using C18 solid-phase extraction (Pierce zip-tip; Thermo-Fisher) according to the manufacturer’s instructions. Eluted peptides were evaporated to dryness using a centrifugal evaporator before LC-MS/MS analyses.

Protein extraction: Laetoli OES (Copenhagen)

Request a detailed protocol

The Laetoli eggshell samples (LOT 13901r, 13902r, 13898r), powdered and pre-bleached in York, were sent to the University of Copenhagen for replication. A negative control sample, prepared exactly like the ancient ones except for the initial addition of eggshell powder, was processed and analysed together with the ancient samples, following the same procedure. All samples, including the negative extraction control, were processed in laboratories regularly used for ancient DNA extraction, implementing all the measures necessary to avoid potential contamination from modern biomolecules. All surfaces were UV irradiated overnight, and repeatedly cleaned with bleach and ethanol. In addition, facemasks, nitrile gloves, hairnets and body suits were worn continuously by operators.

An aliquot of 58 mg, 48 mg and 64 mg was weighed from samples 13902r, 13901r and 13898r respectively, and placed in 1.5 mL Protein Lo Bind Tubes (Eppendorf). Subsequently, they were suspended in 1 mL 0.5 M EDTA pH 8.00, mechanically shaken for approximately one minute and incubated overnight under rotation at room temperature. The following day, after centrifugation at 17,000 g for 10 min, the EDTA supernatant was removed and stored in a −18˚C freezer. The demineralisation step with 1 mL 0.5 M EDTA was repeated one more time. The third day, after removal of EDTA supernatant, all demineralised pellets were re-suspended with 100 µL of 0.1 M Tris pH 8.00, mechanically shaken for approximately one minute, and precipitated by centrifugation at 17,000 g for 10 min. The wash step with 100 µL of 0.1 M Tris was repeated two more times.

The samples were then further processed in a guanidinium hydrochloride lysis buffer solution following published methods (Kulak et al., 2014; Jersie-Christensen et al., 2016), without sonication or equivalent steps. Samples were instead mechanically shaken for approximately one minute and a micro-pestle (Eppendorf) was used to manually disrupt the pellet. Ancient samples and negative control were initially diluted to 1:3 in dilution buffer (Kulak et al., 2014), 0.5 µg of rLysC (Promega) were added, and the solution was digested at 37˚C, with mechanical shaking at 900 rpm, for two hours. Samples were then further diluted 1:3 in dilution buffer, 0.5 µg of mass spectrometry grade trypsin (Promega) were added, and the solution was digested overnight at 37˚C, with mechanical shaking at 900 rpm. On the following day, samples were acidified, using 10% trifluoroacetic acid (TFA) in ultrapure water, to reach pH < 2.00, and then centrifuged at 17,000 g for 1 hr. The resulting peptide mixtures in the supernatant fraction were then concentrated using in house created C18 solid phase extraction stage tips as described by Cappellini et al. (2012).

LC-MS/MS analysis

Request a detailed protocol
Oxford TDI
Request a detailed protocol

Subsamples digested with trypsin and elastase were combined in a single LC-MS/MS run, with the following exceptions:

  • Subsamples E and T for all Olduvai and Laetoli OES

  • Subsample LOT 13901N: digestion step was not performed

  • Purified proteins heated at high temperature in water (kinetics): digestion step was not performed

Elastase and trypsin-digested samples were analysed by LC-MS/MS as described before (Fischer and Kessler, 2015). Briefly, peptides were separated on a PepMAP C18 column (75 μm × 500 mm, 2 μm particle size, Thermo) using a Dionex Ultimate 3000 UPLC at 250 nL/min and Acetonitrile gradient from 2–35% in 5% DMSO/0.1% formic acid. Peptides were detected with a Q-Exactive mass spectrometer (Thermo) at a resolution of 70000 @ m/z 200 and an ion target value of 3e6 between m/z 380 and 1800. Up to 15 precursors were selected for HCD fragmentation at a resolution of 17,500 with an ion target of 1e5 and a maximal injection time of 128 ms. Normalized collision energy was fixed at 28% and the isolation windows was 1.6 m/z units.

Copenhagen
Request a detailed protocol

The samples were separated on a 50 cm PicoFrit column (75 μm inner diameter) in-house packed with 1.9 μm C18 beads (Reprosil-AQ Pur, Dr. Maisch) on an EASY-nLC 1000 system connected to a Q-Exactive HF (Thermo Scientific, Bremen, Germany). The peptides were separated with a gradient going from 2% to 25% buffer B in 110 min followed by a 25 min step to 40%. After the gradient the column was washed by going to 60% in 5 min, held for 5 min and re-equilibrated back to 2% for 15 min, resulting in a final acquisition of 165 min. Buffers contained 0.1% TFA dissolved in either 80% acetonitrile for buffer B, or milli-Q water for buffer A. The flow rate was 200 nL/min throughout the gradient and wash.

The Q-Exactive HF was operated in data-dependent top 10 mode. Full scan mass spectra were recorded at a resolution of 120,000 at m/z 200 over the m/z range 300–1750 with a target value of 3e6 and a maximum injection time of 20 ms. HCD-generated product ions were recorded with a maximum ion injection time set to 108 ms through a target value set to 2e5 and recorded at a resolution of 60,000 with a fixed first mass set to m/z 100. Normalized collision energy was 28%. The isolation window was set at 1.3 m/z units and the dynamic exclusion to 30 s.

Identification of peptides and proteins

Request a detailed protocol

Product ion spectra were analysed using the software PEAKS Studio (v. 7.0, Bioinformatics Solutions Inc. (BSI) [Ma et al., 2003]). Mascot generic format (mgf) files were searched against a reference database containing the genomes of all Struthioniformes and common contaminants (40566 entries), assuming no digestion enzyme and with fragment ion mass tolerance of 0.050 Da and a parent ion tolerance of 5.0 ppm. Results obtained by SPIDER searches (i.e. including all modifications) were used for the investigation of protein survival in OES using the following threshold values for acceptance of high-quality peptides: false discovery rate (FDR) threshold 0.5%, protein scores −10lgP ≥ 40, de novo sequences scores (ALC% ) ≥ 80.

Volatiles

Request a detailed protocol

On crushing or demineralisation of the subfossil OES, a strong odour was emitted from some samples, so analysis of these volatiles was attempted using gas chromatography mass spectrometry (GC-MS). A sealed container was designed that allowed in-line crushing of a sample under N2. Volatiles emitted during crushing of the shells were measured using thermal desorption (Unity, Markes International, Llantrisant, UK) coupled to gas chromatography with a high-resolution quadrupole time of flight mass spectrometer (7200B GC/Q-TOFMS, Agilent Technologies,Wilmington, DE, USA). Volatile organic compounds (VOCs) were flushed from the shell crusher onto the trap using high purity nitrogen at 100 mL min−1 for 10 min. The trap was held at −30°C during sampling, then ballistically heated to 250°C and held for 5 min to ensure complete desorption. The heating of the trap triggered the start of the GC run. A 5% phenyl-polysilphenylene-siloxane capillary column was used (50 m × 0.32 mm × 1 mm BPX5, SGE, Australia) to separate the VOCs. The oven was held at 40°C for 5 min, followed by a ramp rate of 10°C min−1 to a final temperature of 230°C, which was held for 3 min. High purity helium gas was used as the mobile phase at a flow rate of 4.5 ml min−1. The mass spectrometer was operated in an electron ionisation mode at 70 eV and the ion source was at 250°C. Spectra were collected between m/z 35 and 500 at an acquisition rate of 5 spectra s−1. The mass spectra obtained were compared to the NIST MS database (NIST MS Search Program version 2.0) after background correction. A nitrogen blank, sampled through the shell crusher was used to determine the method background and identify unique VOCs emitted from the egg shells. Analysis was undertaken on a fragment of one of the subfossil OES from Laetoli (LOT 13901, ~2.7 Ma).

Ancient DNA

Request a detailed protocol

Two DNA extractions using ~0.05 g of eggshell (sample Laetoli LOT 13902) were made following Dabney et al. (2013) and the extracts combined before the final elution in 25 μL TET.

Data availability

Request a detailed protocol

The data discussed in the paper are archived in the following databases: the mass spectrometry proteomics datasets have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD003786; Illumina genetic data have been deposited in the NCBI Short Read Archive (SRA), BioProject ID PRJNA314978; computational modelling data can be found at DOI: 10.15131/shef.data.3491387 (this contains pdb files giving the initial configurations used for SCA-1, SCA-2 and the four peptide sequences and input files for DL_POLY that contain a complete specification of the forcefield used and other setting parameters for the simulations).

Appendix 1

Thermal age calculations

In this study we compared the thermal age of some of the oldest samples from which DNA and collagen sequence had been reported with the eggshell proteins from Africa (Figure 1). To estimate temperature fluctuation we estimated MATs for each site: each MAT was estimated from NOAA NCDC GCPS monthly weather station (Eischeid et al., 1995; Karl et al., 1990) and borehole data (Huang et al., 2000; National Climatic Data Center (NCDC), 2012) (Appendix 1—Table 1). Temperature was altitude-corrected using a lapse rate of 6.4°C /km−1; where altitudes were not given they were estimated using the Google Maps API Elevation Service. For simplicity and due to the antiquity of the samples, no corrections were made for seasonal variation in temperature. Temperatures were then projected back using estimates of temperature fluctuation from the time of deposition to the present day from the long-term modelled temperature record of Hansen et al. (2013). For the three youngest dates at Elands Bay Cave, the thermal ages were refined using the SST record from core MD02-2594 - Dyez14 (Dyez et al., 2014).

Appendix 1—table 1

Historical temperature data. Weather station temperature data derived from the NOAA Baseline Climatological Dataset - Eischeid et al. (1995): The quality control of long-term climatological data using objective data analysis. Preprints of AMS Ninth Conference on Applied Climatology, Dallas, TX., January 15–20, 1995.

https://doi.org/10.7554/eLife.17092.024
WMO IDWmo station namelatitudelongitudeAlt. (m)MAT °C
7191700Eureka,N.W.T.79.98−85.9310−19.6
100800Svalbard Lufthavn78.2515.4727−6.3
420201Dundas Radio Greenland76.6−68.820−10.5
420200Thule A.B.76.52−68.577−12.1
7195702Fort Mcpherson67.4−134.930−9.3
7196501Fort Selkirk62.8−137.4454−3.9
358600Honington52.330.77549.6
3605801Ust-Kan50.9284.7510370.6
807500Burgos/Villafria42.373.6389410.2
6832800Tsabong−26.0522.4596020.14
6871200Cape Columbine−32.8317.856015.6
892800Mossel Bay (Cape St.)−34.1822.155917.7
Borehole data: data from NOAA Paleoclimatology Borehole Datasets http://www.ncdc.noaa.gov/paleo/borehole
CA-289-260.99−13415240
UK-STOWLANGTOFT52.280.85479.1
ES-ROMANERA37.69−7.3316620.2
ES-AC-1BILLITON37.6−6.8311018.7
ES-PB1ADAROVALVERDE37.56−6.7823718.6
TZ-LONGIDO−2.6136.47131624.3
TZ-BASOTU−4.3835.17173623.9
TZ-KIZAGA−4.4234.37147223.2
TZ-SIUYU−4.934.88167823.1
ZA-SB1−27.2825.5135719.2
ZA-AP11−28.321.0592823.7
ZA-PC227−29.3321.78105221.5
Appendix 1—table 2

Long term climate records: cores used in this study to assess the extent of depression at LGM (used to scale).

https://doi.org/10.7554/eLife.17092.025
RecordlatitudelongitudeMethodSST (°C) during LGMRef
ODP98258−16UK3713(Lawrence et al., 2009)
ODP88250.35−167.58UK379.1(Martínez-Garcia et al., 2010)
MD02-2594−33.3017.30Mg/Ca~16doi.pangaea.de/10.1594/PANGAEA.810663
MD962077−33.1731.25Mg/Ca~16doi.pangaea.de/10.1594/PANGAEA.810716
ODP722A16.6259.80UK3720.2(Herbert et al., 2010)
ODP662−1.39−11.74UK3718.7(Herbert et al., 2010)
ODP846−3.10−90.82Mg/Ca18.6(Herbert et al., 2010)
IODP114619.46116.27UK3724.3(Herbert et al., 2010)

Because temperature changes likely increased with latitude and proximity to ice sheets the record of Hansen et al. (2013) was scaled to estimates of maximum decline in temperature for the site or the region at the last glacial maximum (LGM) (Appendix 1—Table 4). We estimate a fall of approximately 2.5°C at LGM in East Africa, consistent with observed reductions in Tex86 estimates of lake surface temperature reduction in Lake Tanganyika (Tierney et al., 2010) and Lake Malawi (Konecky et al., 2011). This decline of 2.5°C was used as the Δ°C LGM for Laetoli. Note that the offset between TEX86 and in situ temperature estimates in L. Tanganyika (Kraemer et al., 2015) does not alter estimates of Δ°C LGM. Late Pliocene global mean annual air temperature was estimated to have been 2–3°C warmer than today, but the increase was accentuated at higher latitudes. At Ellesmere Island (Beaver Pond) (Ballantyne et al., 2010) multiple proxies suggest Pliocene warming of +19 ± 1.9°C, significantly higher than mid-Pliocene simulations of the high Arctic (Hill, 2015).

Appendix 1—table 3

Kinetic parameters used for estimating thermal age.

https://doi.org/10.7554/eLife.17092.026
TargetChemical reactionActivation energy (kJ mol−1)Reference
Eggshell proteinsValine racemization117(Crisp et al., 2013)
DNADepurination123(Lindahl and Nyberg, 1972)
CollagenGelatinization173(Holmes et al., 2005)

Finally the thermal age (Timeyr@10°C) for each sample was calculated by examining the offset from 10°C at each time interval in the long-term temperature record and calculating the relative rate difference over this time interval using published kinetic parameters for the target molecule (Appendix 1—Table 3). For every year the site is warmer than 10°C, it accumulates thermal years faster than chronological years, while thermal years (yr@10°C) accrue more slowly for every year when the site is cooler than 10°C (e.g. Figure 1). To estimate the decomposition of ostrich eggshell proteins we used a conservative (low) activation energy, valine racemization (Crisp et al., 2013); the same diagenetic parameter used as the independent variable in e.g. Figure 2.

Thermal age highlights the differences in protein preservation between Ellesmere Island (MAT −19.7°C) and Laetoli (MAT 18.7°C), both sites which yield protein sequence, and both of which are estimated to be > 3.5 Ma. When corrected for thermal age, Laetoli is three orders of magnitude older than Ellesmere Island (Appendix 1—Table 4). This is actually a smaller difference than might be anticipated from the current difference between the sites (MAT Δ 39°C), as despite high latitude Pleistocene glacial stage cooling, Ellesmere Island was much warmer during the Pliocene (Ballantyne et al., 2010).

Appendix 1—table 4

Thermal age calculations.

https://doi.org/10.7554/eLife.17092.027
SiteLatLongAlti (m)MAT (°C)ScaleΔT LGM (°C)RefAge (ka)RateEa(kJ)ln AMean thermal age (Ma@10°C)Min thermal age (Ma@10°C)Max thermal age (Ma@10°C)
Ellesmere Island78.55−82.5292−19.75.0−21(Ballantyne et al., 2010)3600Collagen173610.0030.0010.007
Ellesmere Island78.55−82.5292−19.75.0−21(Ballantyne et al., 2010)3800Collagen173610.0040.0010.011
Poolepynten7911.434−6.34.0−16(Hubberten, 2004)130DNA12741.20.0020.0020.002
Thistle Creek63.11−139.54483−4.11.0−4(Elias, 2001)560DNA12741.20.020.020.02
Thistle Creek63.11−139.54483−4.11.0−4(Elias, 2001)700DNA12741.20.030.030.03
Happisburgh52.821.5329.12.0−9(Harrison et al., 2015)700Collagen173610.220.190.25
Sima de los Huesos42.35−3.5210049.51.0−4(Moreno et al., 2012)400DNA12741.20.250.220.28
Elands Bay Cave−32.3218.321015.71.0−4(Martínez-Méndez, 2010)16Valine118380.040.030.05
Pinnacle Point−34.222.0920181.0−4(Bar-Matthews et al., 2010)150Valine118380.420.380.47
Wonderwerk27.823.55164518.60.6−2.5(Ecker et al., 2015)1000Valine118383.63.14.1
Olduvai−2.9935.35576260.6−2.5(Tierney et al., 2010) (Berke et al., 2012)1338Valine1183816.314.418.5
Laetoli−3.2235.19170018.70.6−2.5(Tierney et al., 2010) (Berke et al., 2012)3800Valine1183816.013.618.7

Appendix 2

Ostrich eggshell bleached proteome

Microstructure of OES layers

We characterised the microstructure and the proteins extracted from modern S. camelus eggshell. The whole eggshell had a dimension of 25 cm in length with a diameter of 15 cm and an average thickness of 2 mm. We also examined one specimen of fossil eggshell from Laetoli: LOT 13898, a fragment.

The modern OES was first sectioned in three pieces using a diamond-blade saw. Cross-sections of approx. 10 × 4 × 2/3 mm in size were cut off with a wheel for SEM observations. Each cross section was manually polished down to approximately 50 μm thickness using a diamond lapping pad (30 to 0.1 μm) and a tripod holder, allowing a transversal view of each individual layer. The surface was slightly etched using 2% acetic acid to provide topography to the surface. The cross and transversal sections were mounted in a SEM aluminium stub covered by a sticky carbon pad. All samples were coated with a 10 nm Pt/Pd layer. Further, a subsample from each modern OES layer was prepared by polishing off the rest of the layer and then each individual layer was powdered and bleached for amino acid analyses. For the modern OES a FEI Sirion FEG FESEM (field emission, high resolution Scanning Electron Microscope) and for the Laetoli OES a JEOL JSM-7800 were used, both operated at 5 kV for imaging, and 5 and 10 mm working distance respectively.

The modern OES presented a well-defined structure consisting of three structural calcitic layers, as observed by others (Heredia et al., 2005; Figure 2—figure supplement 1). The external layer (in contact with the environment) is the thin crystalline layer with little organic content (Heredia et al., 2005). The middle layer is the palisade, and the innermost layer is the mammillary/cone layer (Feng et al., 2001; Patnaik, 2009). The palisade and cone layers contain organic components, both protein (Hincke et al., 1995) and polysaccharides (Baker and Balch, 1962), which help hold the calcite crystals together (Heredia et al., 2005). The SEM microphotograph of the cross section of the eggshell showed that the thickness of the crystal layer is about 40 μm, the palisade layer is the thicker with a thickness of 1200 μm and cone layer has a thickness of 750 μm. A deeper understanding of the microstructure of each individual layer is gained by observations in the direction perpendicular to the cross section. The crystalline layer is very compact and dense with porosity distributed along its surface. The palisade layer consists of an agglomeration of flakes more or less parallel to the surface of the eggshell; the cone layer consists of a pile of elongated crystals perpendicular to the surface of the eggshell; the organic membrane consists of a net of different fibres which lie parallel to the eggshell surface. Sample LOT 13898 from Laetoli (Figure 2—figure supplement 1A, right), attributed to S. kakesiensis, is 33% thicker than the modern S. camelus eggshell. It has been reported before (Harrison and Msuya, 2005) that samples from the Upper Laetolil beds present a thickness ranging from 2.5 to 4.4 mm. In this sample, the organic layer is not present, probably due to degradation after deposition.

AAR analyses on OES layers (modern)

Total hydrolysable amino acids (THAA) were analysed on one bleached and one unbleached modern eggshell subsample per microstructural layer (Appendix 2—table 1). While the absolute concentrations were higher in unbleached eggshell for all layers (Figure 2—figure supplement 1B), the overall amino acid composition is comparable between (a) bleached and unbleached powders and (b) the different microstructural layers (Figure 2—figure supplement 1C). This points towards a similar composition of the proteome enclosed in the intracrystalline fraction of the three main microstructural layers of OES.

Appendix 2—table 1

THAA concentrations (pmol/mg) detected in OES microstructural layers; bleached and unbleached powders were analysed. Analytical errors measured on replicate analyses are < 5% (Asx = 3.80%; Glx = 3.81%; Gly = 4.89%; Ala = 3.67%; Val = 3.98%; Ile = 3.63%).

https://doi.org/10.7554/eLife.17092.028
[Asx][Glx][Ser][L-Thr][L-His][Gly][L-Arg][Ala][Tyr][Val][Phe][Leu][Ile]
Cone-
bleached
4925540939922589152069232655431415792815198342642280
Cone-
unbleached
1445217835134559140489419136810810234434297344334105007676
Palisade-
bleached
5990643444792347207966413427518315252768265950323052
Palisade-
unbleached
1001811545796745073667126916247908530524843450584465102
Whole-
bleached
6323653346622445203269763322531412632943273252893177
Whole-
unbleached
124121377499275420421614951722210670436460035342101526342
Crystalline-
bleached
45184181308513011399409719813528491704183832992225
Crystalline-
unbleached
2442224212179908100674823365123712005564459912107551907912535

Bleached OES proteomes characterized by LC-MS/MS

66 LC-MS/MS analyses were conducted as part of this study (York/Oxford dataset): 33 bleached ostrich eggshell subsamples and 33 blanks. 71161 product ion spectra were matched to known peptide sequences (excluding common contaminant proteins).

Ostrich eggshell proteome

The modern bleached OES yielded 273 unique protein descriptions.(Appendix 2—Table 2) reports the most abundant protein groups, i.e. the ones identified on the basis of 20 or more unique peptides. A Gene Ontology (GO) analysis showed that most proteins’ function, cellular component and biological process are presently unknown, while 26 proteins are associated with binding and 14 with catalytic activities.

Here we summarize the roles and main characteristics of the most abundant protein groups found in the bleached eggshell proteome.

  • Struthiocalcin-1 and struthiocalcin-2 are mineralization-specific C-type lectins originally isolated and sequenced from OES matrices (Mann and Siedler, 2004); their structure and role in eggshell biomineralization have been recently investigated (Sánchez-Puig, 2012; Ruiz-Arellano and Moreno, 2014; Ruiz-Arellano et al., 2015).

  • Aggrecan proteins are important constituents of the cartilage and this large aggregating chondroitin sulfate proteoglycan binds to hyaluronic acid via an N-terminal globular region and may regulate the matrix assembly of the cartilage (Kiani et al., 2002; Watanabe et al., 1998).

  • Vitelline membrane outer layer protein 1 (VMO1): the vitelline membrane outer layer in hens is formed in the upper oviduct after ovulation. This membrane is a fibrous layer constructed from ovomucin (with soluble proteins bound) and proteins vitelline membrane outer layer I and II (Shimizu et al., 1994).

  • Serum albumin (ALB) is the main protein of plasma, has a good binding capacity for water, Ca2+, Na+, K+, fatty acids, hormones, bilirubin and drugs. Its main function is the regulation of the colloidal osmotic pressure of blood (UniProtKB - P19121 (ALBU_CHICK)). It has been identified as abundant in the eggshell of chicken, turkey and quail (Mann and Mann, 2015).

  • The tenascin (TNC) family of glycoproteins is expressed in the embryo, particularly during neural development, skeletogenesis, and vasculogenesis, and contains a serine-proline rich domain (Jones and Jones, 2000). They appear to accelerate collagen fibril formation and may play a role in supporting the growth of epithelial tumours (Egging et al., 2007).

  • Carbonic anhydrase is a family of enzymes with diverse amino acid sequences and structures that catalyse the slow conversion between carbon dioxide and bicarbonate (Liljas and Laurberg, 2000). The presence of this enzyme in the hen’s reproductive tract was linked very early to its role in making carbonate ions available for eggshell formation (Robinson and King, 1963; Gutowska and Mitchell, 1945).

  • Mucins (LOC 100859916)are gel-forming proteins that have been linked to biomineralization in mollusc shells (Marin et al., 2000) and have been identified in eggshell membranes, most recently in a proteomics study which identified a 1–10 fold increase of mucin 5AC in fertilized egg membranes (Cordeiro and Hincke, 2016).

  • Apolipoprotein D (APOD)is expressed in subsets of central nervous system neurons and glia during late chicken embryogenesis (Ganfornina et al., 2005). APOD can bind cholesterol, progesterone, pregnenolone, bilirubin and arachidonic acid in plasma and may be involved in repairing the nervous system (Rassart et al., 2000).

  • The antibody immunoglobulin from egg exists as IgG (or IgY) in yolk and as two forms in egg white (IgA, IgM) (Abdou et al., 2013). The polymeric immunoglobulin receptor is involved in the secretion of antibodies and was one of the proteins identified in the acid-soluble organic matrix of the chicken calcified eggshell layer additionally to the immunoglobulins (Mann et al., 2006).

  • Mesothelin (MSLNL) is expressed in eggshell membranes of fertilized chicken eggs (Cordeiro and Hincke, 2016). This cell-surface differentiation antigen is normally expressed at low levels in humans and is restricted to tissues such as the mesothelial cells lining some body cavities and epithelial cells of kidney, tonsil, trachea, and fallopian tube; it is overexpressed in and represents a marker for a range of tumours, including human and hen ovarian cancer (Yu et al., 2011).

  • Golgi apparatus protein 1 or cysteine-rich fibroblast growth factor receptor was isolated in chick embryos (Burrus and Olwin, 1989); it is located in the Golgi and may be involved in intracellular fibroblast growth factor trafficking and the regulation of cellular responses to these (Zuber et al., 1997).

  • Stanniocalcin-1 is a member of the stanniocalcins family, present in all vertebrates and linked to calcium homeostasis but also to embryogenesis and tumorigenesis (Trindade et al., 2009).

  • Ovomucoid (IOVO) is one of the major egg white proteins; it is a glycoprotein protease inhibitor well-characterized in chicken, turkey and quail (Mann and Mann, 2015).

  • Serotriflin belongs CRISP family protein with binding affinity for small serum protein-2 in snake serum (Aoki et al., 2008). BLAST analysis shows that serotriflin-like proteins have been identified in the genomes of 39 Archosauria. In ostrich the cysteine-rich secretory protein 2 (gene: N308_13534) displays 54% identity with snake serotriflin (P0CB15 CRIS_PROFL).

  • Delta and Notch-like epidermal growth factor-related receptor (DNER)is a calcium-binding clathrin-binding protein linked to central nervous system development (GO term).

  • BPI (bactericidal permeability-increasing) fold-containing family B member 4 is a lipid-binding protein identified in fertilized eggshell membrane of chicken (Cordeiro and Hincke, 2016). It shares sequence homology with ovocalyxin-36 (OCX-36), which has an immune role (Cordeiro et al., 2013). In S. camelus this protein (gene: N308_11956) shares only 37% identity with chicken OCX-36 (BLASTp), although this is likely due to missing regions of OCX-36. BLAST analysis also identified the uncharacterized protein LOC104140623 as a homologue to BPI-fold-containing family B member 4.

  • Pigment epithelium-derived factor (PEDF) or serpin F1 (SERPINF1) is a non-inhibitory serpin and neurite-promoting factor. Human SERPINF1 and ostrich PEDF share 63% identity (BLAST analysis).

  • Prosaposin (P07602|SAP_HUMAN) shares 56% identity with Struthio proactivator polypeptide (saposin A-D precursor); prosaposin is also found in chicken eggshell cuticle (Rose-Martel et al., 2012) and is abundant in the calcified eggshell of quail, turkey and chicken (Mann and Mann, 2015). The saposins are involved in lysosomal sphingolypid metabolism and bind glycolipids (Mann and Mann, 2015).

  • Pantetheinase-like isoform X1 (gene N308_06452 in Struthio camelus australis, 67% identity with human pantetheinase) is an amidohydrolase involved in the dissimilative pathway of CoA (Maras et al., 1999).

  • Beta-microglobulin (B2M) is involved in folding of the eggshell matrix proteins (Nys et al., 2011) and was identified in the calcified chicken eggshell (Mann et al., 2006) and in the membranes (Cordeiro and Hincke, 2016).

  • Cygnin is a small basic protein found in 62 bird genomes and analogue to Duck Basic Protein Small 2 isolated from duck egg white and shown to perform multiple biological functions related to reducing the risk of diseases usingin vitro experiments (Naknukool et al., 2011).

  • Signal peptide CUB and EGF-like domain-containing protein 1 partial (SCUBE1) binds calcium and may function as an adhesive molecule; it is involved in post-embryonic development and the Struthio analogue (gene: N308_04062) shares 83% of its sequence with human SCUBE1.

  • Ovalbumin (OVAL) is abundant both in egg white, in eggshell membranes and the calcified layers of a range of birds (Mann et al., 2006; Mann and Mann, 2015; Cordeiro and Hincke, 2016). Ovalbumin was found to stabilize the liquid precursor phase of calcium carbonate (Wolf et al., 2011).

  • Ovostatin-like (ovomacroglobulin) in Struthio shares 80% of its sequence with chicken ovostatin (OVST). Ovostatin is a proteinase inhibitor which reduces enzyme activity in the presence of high-molecular weight substrates.

Appendix 2—table 2

30 major protein groups identified in bleached modern OES (>20 unique peptides only).

https://doi.org/10.7554/eLife.17092.029
Protein groupAccession-lgPCoverage (%)#UniqueDescription
1gi|46396750344100346RecName: Full = Struthiocalcin-1; Short = SCA-1
2gi|4639675129398222RecName: Full = Struthiocalcin-2; Short = SCA-2
7gi|697501075, gi|67821762628139208Aggrecan core protein [Struthio camelus australis]
8gi|69750892426888126Vitelline membrane outer layer protein 1-like [Struthio camelus australis]
10gi|69750902926880113Serum albumin-like [Struthio camelus australis]
13gi|6974557832464089Tenascin isoform X3 [Struthio camelus australis]
11gi|6974772022347584Carbonic anhydrase 4 [Struthio camelus australis]
14gi|6974818282371582Mucin-5AC [Struthio camelus australis]
9gi|697523391, gi|6782215882286977Apolipoprotein D [Struthio camelus australis]
12gi|3751626482317576Immunoglobulin A heavy chain constant region secretory form partial [Struthio camelus]
15gi|697470179, gi|6974701772135163Mesothelin isoform X1 [Struthio camelus australis]
22gi|697433909, gi|6782065871954045Golgi apparatus protein 1 [Struthio camelus australis]
17gi|3751626441926342immunoglobulin M heavy chain constant region secretory form partial [Struthio camelus]
19gi|678209093, gi|6974411801976641Stanniocalcin-1 partial [Struthio camelus australis]
18gi|697514088, gi|6782198031978639Ovomucoid [Struthio camelus australis]
20gi|6974886111918939Serotriflin-like [Struthio camelus australis]
21gi|6782147781924238Delta and Notch-like epidermal growth factor-related receptor partial [Struthio camelus australis]
23gi|697475278, gi|697475274, gi|6974752801924137Polymeric immunoglobulin receptor [Struthio camelus australis]
26gi|6974309751813333BPI fold-containing family B member 4-like [Struthio camelus australis]
25gi|6974309361614232Uncharacterized protein LOC104140623 [Struthio camelus australis]
31gi|697485873, gi|6782148461726828Pigment epithelium-derived factor [Struthio camelus australis]
28gi|697430934, gi|6782057481533426BPI fold-containing family B member 4-like [Struthio camelus australis]
29gi|6974329181724126Prosaposin [Struthio camelus australis]
16gi|3751626541789624immunoglobulin lambda constant region partial [Struthio camelus]
30gi|697522911, gi|697522913, gi|6975229161725022Pantetheinase-like isoform X1 [Struthio camelus australis]
24gi|6974464191657122Beta-2-microglobulin [Struthio camelus australis]
27gi|678210026,gi|6782100251546322Cygnin [Struthio camelus australis]
34gi|6782163651733121Signal peptide CUB and EGF-like domain-containing protein 1 partial [Struthio camelus australis]
32gi|6974920531434221Ovalbumin {ECO:0000303|PubMed:21058653} [Struthio camelus australis]
37gi|6975056891471320Ovostatin-like [Struthio camelus australis]
Common contaminants in bleached OES

The common contaminant proteins identified in bleached OES samples and the procedural blanks are reported in Appendix 2—Table 3. The enzymes trypsin and elastase are detected in all samples except in the Laetoli subsamples: one of these was prepared without digestion step (13901N), while all the other subsamples were digested with trypsin or elastase but analysed separately by LC-MS/MS. Keratins were also detected in all samples, despite the thorough cleaning procedures adopted (including a long bleaching step). This highlights the need for careful evaluation of ancient protein data from fossil samples.

Appendix 2—table 3

List of contaminant proteins detected in the 66 analyses, with total number of spectra identified per protein.

https://doi.org/10.7554/eLife.17092.030
Row labelsCount of #Spec
sp|TRYP_PIG|771
sp|K2C1_HUMAN|493
Chymotrypsin-like elastase family member 1 OS = Sus scrofa GN = CELA1 PE = 1 SV = 1422
sp|K1C9_HUMAN|301
sp|K1C10_HUMAN|216
sp|K22E_HUMAN|213
sp|TRFE_HUMAN|98
sp|ALBU_HUMAN|94
PREDICTED: keratin type II cytoskeletal cochleal isoform X1 [Struthio camelus australis]80
PREDICTED: keratin type II cytoskeletal cochleal isoform X2 [Struthio camelus australis]32
PREDICTED: keratin type II cytoskeletal 5-like [Struthio camelus australis]25
sp|RS27A_HUMAN|23
Keratin type II cytoskeletal 75 partial [Struthio camelus australis]20
sp|GFP_AEQVI|19
sp|TRY1_BOVIN|19
Keratin type II cytoskeletal cochleal partial [Struthio camelus australis]16
PREDICTED: keratin type II cytoskeletal cochleal isoform X3 [Struthio camelus australis]16
sp|ANT3_HUMAN|8
sp|HBB_HUMAN|8
Keratin type II cytoskeletal 75 [Struthio camelus australis]5
sp|TRFL_HUMAN|5
Keratin type I cytoskeletal 14 partial [Struthio camelus australis]3
PREDICTED: keratin type II cytoskeletal 4-like [Struthio camelus australis]3
sp|HBA_HUMAN|3
sp|K1C15_SHEEP|2
sp|GSTP1_HUMAN|1

Appendix 3

Computational

Conventional molecular dynamics (MD) was used to explore possible protein–calcite binding geometries. In order to determine the most likely structure of SCA-1 and SCA-2 in solution, initial structural estimates were obtained from crystal structures. The model structure for SCA-2 was obtained using a homology model (Biasini et al., 2014) based on the published crystal structure (Ruiz-Arellano et al., 2015). Calcite structures were optimized using the Raiteri forcefield (Raiteri et al., 2010), selected due to its parameterization against the free energy of dissolution of calcium carbonate, which provides reliable descriptions of the structuring and interactions between water and the surface. The resultant structures possessed the known density of calcite, 2.71 g/mL, and their exposed surfaces maintained the (10.4) structure. Since the calcite blocks were of sufficient thickness, it was not necessary to freeze the atoms in the middle of the calcite block. Charges for SCA-1 and SCA-2 were calculated using Mulliken charges obtained from the AMBER Antechamber program (Wang et al., 2004). The overall charges for SCA-1 and SCA-2 were determined to be −11 and −10 respectively using the Avogadro code (Hanwell et al., 2012) at pH 8.0 to 10, the point of zero charge for calcite. Cross terms for the interaction of the atoms within the protein and peptide molecules were calculated using Lorentz-Berthelot mixing rules (Lorentz, 1881; Berthelot, 1898); cross terms for interactions between the protein and peptide molecules and the calcium carbonate surface were taken from the Raiteri potential (Raiteri et al., 2010) or otherwise fitted via scaling methods based on the charge differences of the ions as described previously in the literature (Freeman et al., 2007; Schröder et al., 1992). A mean cut-off of 12 Å was used for van der Waals interactions. All calculations were performed using the DL POLY Classic program (Todorov et al., 2006).

The surface unit cell for the (10.4) surface was defined by two perpendicular vectors of length 83.544 Å, which gives a total surface area of 6979 Å2, sufficient to accommodate the proteins or the peptide sequences. A block of 10 layers of calcite was used (giving a thickness of 32 Å). The water layer was a depth of 82 Å (about 30,000 water molecules), which is sufficient for an adequate coverage of the biomolecules.

Since performing full adsorption studies of these proteins in every possible binding configuration was not computationally tractable, it was necessary to perform an initial study of the relative binding energies of the proteins placed at the surface in various configurations. This was accomplished by performing Euler transformations on the proteins, rotating them in 5 degree increments over the angles alpha, beta, and gamma from 0 to 180 degrees. The reoriented protein was then placed at the calcite surface with 30,000 TIP3P water molecules, providing a water density of 0.99 g/mL, and subjected to an energy minimization routine in DL POLY Classic with the calcite block held rigid. Five calcium counterions were placed in each solution with one additional sodium counterion needed for SCA-1.

This enabled us to choose low energy configurations for more detailed investigation. Binding energy calculations were performed using the following set of simulations upon each: protein/amino acid sequence in aqueous solution, protein/amino acid sequence starting in an unbound position above the surface and adsorbing onto the surface and a simulation of the protein/amino acid sequence in aqueous solution bound to the surface. If compared to simulations of the calcite-water interface and pure water itself, it is possible to obtain information based solely upon the interaction of these proteins/amino acid sequences and the surfaces. The following equations were used, employing configurational energies in all cases:

(1) E(aqueous)=E(molecule; aqueous)E(water)E(molecules)
(2) E(molecule;binding)=E(molecule;hydratedsurface)E(hydratedsurface)E(molecule;aqueous)+E(water)

In Equations 1 and 2 the meaning of the symbols is as follows: E(aqueous) is the solvation energy of the protein/amino acid sequence in water; E(molecule; aqueous) is the total energy of the molecule in water; E(water) is the energy of a box of water containing the same number of water molecules as the calculation for E(molecule; aqueous) to ensure Equation (2) is balanced; and E(molecule) is the energy of the protein/amino acid sequence molecule in vacuum; E(molecule; binding) is the energy of binding of the molecule to the surface; E(molecule; hydrated surface) is the total energy of the molecule on the surface in the presence of water; E(hydrated surface) is the energy of the hydrated surface with the same number of calcium carbonate units as the previous calculation.

Additionally we have estimated the entropic contribution to the molecular binding arising from the displacement of water molecules from the surface to bulk water. From our previous work (Freeman and Harding, 2014) we expect this to be the dominant contribution to the entropy of binding for molecules of this kind. Our results show that the number of water molecules displaced is in the range 20.2–23.1 for the proteins and 6.6–8.3 for the amino acid sequences. This corresponds to an entropic contribution of 36.4–41.6 kJ mol−1 for the proteins and 11.8–14.9 kJ mol−1 for the amino acid sequences at 300 K, sufficiently small to justify the use of configurational energies in this work to estimate the strength of binding. It should also be noted that a smaller, negative contribution to the entropy is expected from the reduced freedom of motion of the protein/amino acid sequences at the calcite surfaces, which is considered to be constant between all binding profiles and expected to be smaller than that of many water molecules.

Simulations of these protein/amino acid sequences in water were performed by studying each of the systems for 10 ps in steps of 10 K from 10 K to 330 K. The lowest energy configurations of these proteins/amino acid sequences in aqueous solution were obtained from averages of 3 separate simulations over 3 ns of simulation time post equilibration at a temperature slightly above that used for other experiments, 330 K, which was selected as an efficient method to explore configurations at energies near the lowest energy configuration without having to resort to more expensive methods such as replica exchange or metadynamics. These structures were utilized for further calculations on surfaces, and for calculations in aqueous solution at 300 K. It was found that at this given temperature these protein/amino acid sequences would adopt a low energy conformation within approximately 500 ps and would not vary in energy by more than 5% of the total energy during the remainder of the simulation. 3 ns of simulation time at 300 K with a 1 fs timestep was sufficient to provide acceptable statistics for analysis. The simulations were performed at a constant volume (NVT ensemble) employing a Nose-Hoover thermostat with a relaxation time of 1 ps. For the systems containing surfaces, the lowest energy configuration of each protein or peptide sequence is placed such that the nearest atom to the surface is within 3 Å of the surface to ensure absorption to the surface within a reasonable timeframe (accounting for the slow kinetics of molecules diffusing through the adsorbed water layers) and again simulated for 3 ns. Molecules were placed at this distance due to the large barriers to crossing the organized water layers at the surface.

Water residence times were calculated and averaged within 1.1 Å regions surrounding atoms comprising each individual amino acid within the sequences identified as being most likely to be present at the surface during protein binding to calcite. Water residence times were calculated by fitting survival probability functions as defined below:

(3) Pα(t)=j=1Nw1Nm+1n=1mpαj(t0,t0+t/,δt)

where t=mδt and t=nδt. The binary function Pαj(t0; t0+t; δt) takes the value of 1 when the water molecule remains within the shell defined by α during both times t0 and t0+t and is otherwise equal to zero. t is set to 0.1 ps; N is the total number of configurations produced during the molecular dynamics simulation and Nw is the total number of water molecules in the system. Average residence times can be obtained by fitting these functions to a simple exponential assuming a single relaxation time, Pαt=P0 exp(t/τ). The values were further averaged for each amino acid in the relevant amino acid sequences in order to get an estimate of how much each amino acid within the sequence affected water mobility within its sphere of influence.

Loss of entropy of water molecules

We provide a conservative estimate of the loss of entropy due to surface stabilization in the main text. However, this might be even more significant than estimated, as suggested by further computational examination of the water on the mineral surface around the peptides, which shows that this value may underestimate the total entropy loss of the water molecules. The average residence time for the water molecules (120 ps) increases in the presence of the peptides (Table 2), particularly with the tightest binding peptide sequence (with the DDDD component) to 135 ± 3 ps. We also examined the self-diffusion coefficients of the water molecules in the different environments present within these simulations. In a simulation of pure water, the diffusion coefficient for TIP3P water is approximately 6 × 10−9 m2s−1, a value which drops significantly for molecules bound near the surface to approximately 1 × 10−9 m2s−1. For water molecules trapped between SCA-1 or its substituent peptides and the calcite surface, this diffusion coefficient is of the order of 1 × 10−10 m2s−1.

Appendix 4

Degradation patterns

A. AAR on archaeological OES

Here we report the results of chiral amino acid analyses on all OES samples included in this study. All FAA and THAA D/L values increase with increasing age of the samples (estimated from the numerical dates obtained on the archaeological layers and reported in Tables 37). The extent of hydrolysis (estimated as the percentage of free amino acids: FAA = [FAA]/[THAA]*100) also increases with time, although the values are more variable than the DL ratios, mainly due to imprecisions in the measurement of the small masses and volumes involved. (Appendix 4—table 1 and 2) report the amino acid racemization (D/Ls and % FAA data) for all samples analysed in this study. Thermal age estimates are also reported for each sample.

Appendix 4—table 1

D/L values of archaeological OES samples analysed in this study. b = bleached; H* = THAA obtained by 24-hr acid hydrolysis; F = FAA. Ile A/I values could not be calculated for Laetoli samples due to the presence of a compound co-eluting with D-alloisoleucine. Analytical errors measured on replicate analyses are <5% (D/Ls: Asx = 0.54%; Glx = 1.18%; Ala = 3.94%; Val = 2.09%; Ile = 3.77%. Concentrations: Asx = 3.80%; Glx = 3.81%; Gly = 4.89%; Ala = 3.67%; Val = 3.98%; Ile = 3.63%). (★) Sample 4605 yielded high D/L values because this sample had been exposed to high temperatures in the burial environment (burning [Crisp, 2013]). Note: thermal age calculations were performed on the basis of the Hansen model (Hansen et al., 2013); due to the absence of more continuous record for younger (last 2000 years) samples in the Hansen record, the Elands Bay Cave time points <1600 years refined using the SST record from core MD02-2594 - Dyez14 (Dyez et al. 2014).

https://doi.org/10.7554/eLife.17092.031
LOTNEaarAsx D/LGlx D/LAla D/LVal D/LIle A/IThermal age (years)
18686887bH*0.2910.0610.0660.0750.052401−564
18686887bF0.3460.0740.1190.0980.076
18726888bH*0.2680.0560.0610.0250.0321313–1932
18726888bF0.2120.0790.1000.0000.000
18666889bH*0.2780.0570.0660.0280.0523962−5704
18666889bF0.3310.0620.1160.0000.076
18496891bH*0.4140.0950.1420.0530.08119,759–27,951
18496891bF0.5550.1360.2260.1380.169
18506893bH*0.5240.1410.2150.0800.13319,759–27,951
18506893bF0.6980.1770.3680.1660.254
18236896bH*0.4620.1170.1850.0730.09125,657–36,249
18236896bF0.6210.1610.3010.1580.208
18196899bH*0.7140.2220.3170.1120.18426,878–37,877
18196899bF0.8060.2940.4420.2170.332
18406907bH*0.4690.1230.1870.0950.12232,379–44,863
18406907bF0.6650.1710.3270.1680.241
4605(★)7198bH*0.8660.7170.8400.5250.708166,094–198,795
4605(★)7198bF0.9310.7820.9310.6920.955
46137676bH*0.6610.2260.4110.2100.269122,139–148,223
46137676bF0.8000.3450.5700.3490.472
46497283bH*0.7120.3140.5000.2550.320151,169–181,807
46497283bF0.8460.3800.7290.4280.590
46527286bH*0.6780.2770.4630.2330.302168,899–202,088
46527286bF0.8220.3550.6980.3910.528
46717316bH*0.6810.2700.4680.2420.299166,094–198,795
46717316bF0.8630.3390.6870.3840.577
46757320bH*0.6870.2790.4940.2790.330192,147–230,607
46757320bF0.8320.3210.6880.4230.590
46837328bH*0.7640.3920.6760.3730.505378,398–467,602
46837328bF0.8790.4250.8210.5230.737
46977342bH*0.7520.3880.6530.3680.497378,398–467,602
46977342bF0.9040.6590.9200.7770.973
1442610581bH*0.7300.8700.8550.8550.9853,238,624–4,188,207
1442610581bF0.7900.9300.9601.0101.280
1557510955bH*0.8601.0350.9811.001>1.214,387,543–18,460,416
1557510955bF0.9241.0010.9611.005>1.2
1557810958bH*0.8911.0400.9841.007>1.214,387,543–18,460,416
1557810958bF0.9161.0090.9600.999>1.2
1557910959bH*0.8111.0210.9651.007>1.214,387,543–18,460,416
1557910959bF0.9261.0060.9691.005>1.2
1558210962bH*0.8811.0330.9761.012>1.214,387,543–18,460,416
1558210962bF0.9151.0020.9471.016>1.2
1390210573bH*0.9651.0500.9251.12>1.213,764,246–18,893,425
1390210573bF0.9351.0300.9251.065>1.2
1390110574bH*0.9201.0400.9451.16>1.28,943,148–11,841,107
1390110574bF0.9351.0150.9301.08>1.2
1389810575bH*0.9451.0500.9301.17>1.214,746,875–20,367,942
1389810575bF0.9351.0100.9301.095>1.2
Appendix 4—table 2

%FAA values (%FAA = [FAA]/[THAA] * 100). b = bleached. Total% FAA for Laetoli are calculated on the basis of Asx, Gly, Ala, Val only. * [Ala] and [Gly] > 100% are likely due to the effect of decomposition of other amino acids to FAA Gly and FAA Ala (e.g. Ser) (Walton, 1998). (★) Sample 4605 had been exposed to high temperatures in the burial environment (burning) (Crisp 2013).

https://doi.org/10.7554/eLife.17092.032
LOTNEaarAsxGlyAlaValIleAverage
186868877161819613
1872688841110326
18666889183539151324
18496891264141211830
18506893171457343030
18236896364147232134
18196899224556312836
18406907184051262332
4605(★)7198707483615669
46137676535766444152
46497283494963403948
46527286484761424048
46717316506172474557
46757320625974484758
46837328907191646676
46977342558396645871
144261058150122*138*908497
155751095553475546n/a50
155781095870627262n/a67
155791095929273126n/a28
155821096278627464n/a69
139021057374648265n/a71
139011057488789274n/a83
138981057587769373n/a82

B. Proteome persistence and patterns of degradation

Fossil OES

Systematic analyses of the bleached proteomes surviving in sub-fossil OES shows that:

  1. The complexity of the proteome decreases with increasing age of the sample (using the extent of racemization of Val, a slow racemizer, as a proxy for time).

  2. The loss of proteins (number of identified product ion spectra; number of identified proteins) follows an exponential pattern over time.

  3. The major proteins identified in modern OES are excellently preserved in samples up to 150,000 years old (Val THAA D/L ~0.5).

  4. Beyond Val D/L ~0.5, quasi-exponential degradation of the proteome proceeds and only the main mineralization proteins, SCA-1 and SCA-2, are detected in the OES analysed from Wonderwerk and Laetoli.

  5. The average length of the peptides identified appears to be stable (10–12 amino acids) across all samples, but the characteristics of the amino acid residues change drastically in the oldest samples, with mean hydropathicity index up to −17, due to the preferential preservation of 'DDDD'-containing peptides.

Purified OES proteome heated at 140°C

The major proteins identified from the organic fraction extracted by demineralization of bleached OES and, once purified and lyophilized, heated at high temperature (140°C) for 2, 8, 24 and 120 hr are a subset of the sequences identified in modern and fossil OES powders. SCA-1 and SCA-2 are the dominant proteins, followed by the vitelline-membrane outer layer protein 1, von Willebrand factor, immunoglobulin and apolipoprotein D. SCA-1 and SCA-2 do not survive beyond 8 hr heating, while peptide sequences from other proteins were identified in samples heated for 24 hr. However, this is likely to be arising from contamination, due to lengthy and more complex sample preparation. No peptides were detected in the 120-hr heated sample.

Figure 2—figure supplement 2(C–F) shows that while degradation proceeds rapidly andfollows an exponential trajectory (as for fossil OES), the peptides surviving duringaccelerated diagenesis in water display increasing hydrophobicity, contrary to that observed in OES fossil eggshell. This difference offers further support to the hypothesis that mineral binding of charged (acidic) peptides is key for survival.

Appendix 4—table 3

: Proteins identified in the purified extract from bleached OES, heated at 140°C in ultrapure water for 2, 8, 24 and 120 hr (identified by at least 2 unique peptide sequences). Values are number of peptide sequences identified.

https://doi.org/10.7554/eLife.17092.033
Protein description2 hr8 hr24 hr120 hrTotal
RecName: Full = Struthiocalcin-1; Short = SCA-129521316
RecName: Full = Struthiocalcin-2; Short = SCA-21248132
Vitelline membrane outer layer protein 1-like [Struthio camelus australis]22224
von Willebrand factor partial [Struthio camelus australis]2020
immunonoglobulin heavy chain variable region partial [Struthio camelus]1818
Apolipoprotein D [Struthio camelus australis]14216
Mitogen-activated protein kinase MLT [Struthio camelus australis]7916
Aggrecan core protein [Struthio camelus australis]1313
iron binding protein [Struthio camelus]99
Histone H4 [Struthio camelus australis]448
Cyclin-K [Struthio camelus australis]77
BPI fold-containing family B member 4 partial [Struthio camelus australis]426
Histone H2B 1/2/3/4/6 [Struthio camelus australis]336
Complement C3 [Struthio camelus australis]235
PREDICTED: histone H2B 1/2/3/4/6 [Struthio camelus australis]224
PREDICTED: polymeric immunoglobulin receptor [Struthio camelus australis]33
Carbonic anhydrase 4 partial [Struthio camelus australis]22

C. Persistence of SCA-1 and SCA-2

Fossil OES

Figure 3—figure supplement 1 displays the combined spectral count for the sequence of SCA-1. Unmodified peptides, as well as peptides with modifications, were considered for all samples. This allows us to visualize the frequency (intensity) at each amino acid position and to identify patterns of survival in time. It is obvious that the occurrence of diagenesis-induced modifications (e.g. deamidation) increases over time (see also Supplementary file 3), and that samples older than 150,000 years display a striking pattern of preservation, with the only low-frequency region detected consistently being the 'DDDD-' containing peptides. The main features of the protein structure also highlight that highly-structured regions are not preferentially preserved and that the DDDD-motif is part of a highly flexible (disordered) domain.

Figure 3—figure supplement 2 shows the degradation pattern of the second major protein in OES: SCA-2. Similarly to SCA-1, modifications increase in frequency over time, and the overall frequency decreases. SCA-2 is however not detected with high confidence in samples older than 150,000 years old. Furthermore, the 150-ka old SCA shows two main regions surviving: one, around positions 45−55, contains an 'EEE' motif and broadly corresponds to peptide ASIHSEEEHQAIV, investigated computationally; the second, around positions 72–82, follows another E-rich region (corresponding to peptide SDSEEEAGEEVW investigated computationally) but contains instead a 'VWIG' motif that is highly conserved in all C lectins sequenced in biomineralized organisms (eggshell, sea-urchin spicules). By analogy with the structure of SCA-1, it is likely that this region is helical. Therefore, this points towards two preservation mechanisms: mineral-binding ('EEE' region) and dehydration ('VWIG'). Dehydration is also observed (water loss) as PThigh frequency in this region, further supporting this hypothesis

Extracted OES matrix heated at 140°C

SCA-1 and SCA-2 were not detected in OES matrix heated for 24 hr and 120 hr. Therefore here we show the results from the 2 hr and 8 hr heating experiments (Figure 3—figure supplement 1 and 2).

Around 50% of the sequence of SCA-1 survives 2 hr heating at 140°C in the absence of the mineral phase; the regions covered with higher frequency are: around residues 40−50, between two beta strands; and around residues 83–100, thus containing the D93-D96 motif. The dominant modification in this region, rather counter-intuitively, is water loss (−18.01 Da). By 8 hr heating this region is however almost disappeared; the AHLASIHT peptide (38–45) between the two alpha-helices is the most long-lived SCA-1 region when accelerated diagenesis takes place in water. SCA-2 appears to survive better during the experiment; most of the sequence can be detected after 8 hr heating, albeit with low spectral counts.

Overall, the patterns of degradation of SCA-1 and SCA-2 in water, as well as the signal detected from the whole OES matrix, are sufficiently different from those detected in fossil OES to support the hypothesis that different preservation mechanisms and degradation pathways operate in the OES and in aqueous environment (Figure 3—figure supplement 1 and 2). However, the survival of highly-acidic regions in SCA-1 combined with the frequency of observed dehydration might also support the idea that these peptides stick to the glass surface of the vial during the high-temperature experiments, and are thus protected from the water.

Appendix 5

Authenticity of the ancient sequences

A. Independent replication

In order to authenticate the fossil peptide sequences obtained from the oldest OES examined in this study (from Laetoli) and to obtain any additional information on possible bias introduced by our sample preparation and analytical methods, a subsample of bleached powder from each OES (LOT 13901, 13902, 13898) was sent to the Proteomics Laboratory, Center for Geogenetics, University of Copenhagen. The matrix was re-extracted and analysed as detailed in the Materials and methods section.

The results from the three analyses were merged in a single MGF file and searched against the Struthioniformes database using the same parameters as described for all other OES samples considered in this study. For comparison, a search was also performed on all Laetoli subsamples analysed in Oxford, combined in PEAKS to yield a single output deriving from all fractions. The results are reported in and Appendix 5—table 1.

Two of us (JTO and BD) performed manual de novo identification of the sequences from raw product ion spectra without prior knowledge of the PEAKS assignments (Supplementary file 2 reports all the raw spectra, manually annotated).

Appendix 5—table 1

Peptides identified in all Laetoli OES samples, prepared in York (analysed in Oxford) and Copenhagen. (*Gla) = this residue was found to be in the decarboxylated form, i.e. glutamate residues that have been post-translationally modified by vitamin K-dependent carboxylation to form gamma-carboxyglutamic acid (Gla), which binds calcium. We conducted manual de novo analyses of all product ion spectra and determined either complete or partial sequences for all of them, independently identifying sequences assigned (assisted de novo) by the software (PEAKS Studio).

https://doi.org/10.7554/eLife.17092.034
SequenceProteinLaboratoryScoreMS/MS count
AGAHLASIHTSEEHRSCA-1Copenhagen47.144
HYSALDDDDYPKGKSCA-1Copenhagen35.692
AGAHLASIHSCA-1Copenhagen31.453
ERNAFICKSCA-1Copenhagen28.121
GNCYGYFRSCA-1Copenhagen28.11
DVWIGLFRSCA-1Copenhagen26.385
ALDDDDYPKSCA-1York39.3928
ALDDDDYPKGSCA-1York41.3414
DDDDYPKGKSCA-1York40.793
DDDYPKGKSCA-1York32.891
HYSALDDDDYPKSCA-1York51.091
KHYSALDDDDYPKSCA-1York34.862
LDDDDYPKSCA-1York34.3512
LDDDDYPKGSCA-1York35.36
LDDDDYPKGKSCA-1York35.663
SALDDDDYPKSCA-1York41.0410
SALDDDDYPKGSCA-1York39.155
YSALDDDDYPKSCA-1York34.463
YSALDDDDYPKGSCA-1York31.93
RAEAWCRSCA-1York30.651
CYGFFPQELSWRSCA-2Copenhagen30.981
KPFICEYRTSCA-2Copenhagen25.031
GE(*Gla)EVWIGLHRPLGRSCA-2York37.332
LDYGSWYRSCA-2York35.11
AGE(*Gla)EVWIGLHRPLGRSCA-2York34.642

The results in Appendix 5—table 1 and Supplementary file 2 show that:

  1. 'DDDD'-containing peptides are found in samples prepared in Copenhagen, providing independent validation of the York results. However, the spectral count for these peptides is higher in the York set.

  2. Only one extra SCA-1 peptide (RAEAWCR, 26–32) was found in the combined York samples, while five additional SCA-1 sequences (AGAHLASIHTSEEHR and AGAHLASIH, ERNAFICK, GNCYGYFR, DVWIGLFR) were identified in the Copenhagen set. We attribute this to different sample preparation protocols and to the larger sample size used in Copenhagen.

  3. SCA-2 peptides CYGFFPQELSWR, KPFICEYRT were detected in the Copenhagen set, while GEEVWIGLHRPLGR/AGEEVWIGLHRPLGR and LDYGSWYR were detected in the York set. The spectral count was low in both sets.

B. Amino acid analyses

All samples were analysed for chiral amino acids, and the concentrations were always well above the limit of detection (Figure 4). Low concentrations in the oldest samples would have indicated that contamination is likely even in a theoretically closed system, but our values for Wonderwerk, Olduvai and Laetoli are ~150–200 times higher than the typical blank values calculated on a normal RP-HPLC analytical run in the NEaar laboratory (University of York) (Demarchi et al., 2011, 2015; Pierini et al., 2016). The concentration values are typically affected by a larger error than D/L values due to calculation errors introduced during sample preparation, explaining the variability observed. The Laetoli samples retain between 50–70% of the amino acids measured in modern OES. While leaching from the carbonate matrix cannot be completely excluded, in laboratory diagenesis experiments this leaching amounts to only ~0.5% of the original THAA concentration in 72-hr bleached modern OES heated at high temperature (Crisp et al., 2013). Therefore, we attribute the observed loss to decomposition processes affecting the amino acids in these highly-degraded samples. Furthermore, all D/L values and% FAA (and Appendix 4—table 1and 2) are consistent with the age of the samples, excluding the possibility that the proteins analysed are simply contamination.

C. Carry-over analysis

The analysis of procedural blanks analysed before and after each sample is fundamental for evaluating the authenticity of the fossil sequences.

We detected low levels (low number of spectra) of actin, ubiquitin/polyubiquitin, aggrecan core protein, vitelline membrane proteins, iron-binding protein, serotransferrin, von Willebrand factor, BPI-fold containing family B member 4, tenascin, neuronal pentraxin, apolipoprotein D, ovomucoid, serum albumin, endoplasmin, neural proliferation differentiation and control protein 1 and tubulin in most of the LC-MS/MS blanks. More significantly, both SCA-1 and SCA-2 were detected (high sequence coverage, high number of spectra) in blanks surrounding modern and fossil samples from Elands Bay Cave and Pinnacle Point (Figure 4). As SCA-1 and SCA-2 are the two major proteins in OES and the number of spectra in the blanks decreases with the age of the samples, this is indicative of carry-over, i.e. high-abundance peptides from a sample being retained in the LC column and being eluted during the successive blank run. This would obviously create a major issue for any claims of ancient sequences from fossil samples. Indeed, SCA sequences identified in blanks look remarkably similar to 'old' sequences (Figure 4D), because peptides with ion-binding characteristics (such as the 'DDDD'-containing peptide in SCA-1) would also exhibit equal affinity for the solid phase of the LC column. In order to exclude that the signal detected in ancient samples by LC-MS/MS may be due to carry-over we took the following precautions:

  1. A procedural blank was run before and after each sample and the signal obtained for samples and blanks was compared directly in order to ensure that a 200x fold decrease in signal intensity was observed

  2. Modern OES was analysed in November 2014; Elands Bay Cave and Pinnacle Point samples were analysed in December 2014

  3. Subsamples from Laetoli were analysed in two separate batches, >4 months after the Elands Bay Cave/Pinnacle Point series: April 2015 - 13901E, 13902T; June 2015: 13901T and 13901N, 13902E, 13898E and 13898T. The results were replicated independently in the separate analyses.

  4. Wonderwerk OES was analysed in August 2015.

  5. Olduvai OES was analysed in March 2016.

A more refined analysis was carried out on the blanks in order to estimate the extent of carry-over, because spectral counts data only give limited quantitative information. Firstly, two representative younger fossil samples (Elands Bay Cave LOT 1823 and Elands Bay Cave LOT 1819) and the blank analysed between the two runs were considered. The total ion chromatogram (Figure 4B) shows an obvious reduction in signal intensity between the sample and the following blank.

The ion chromatogram (Figure 4E) for peptide LDDDDYPK was extracted and compared for the three samples. The signal between the first sample and the blank decreases by a factor of 1330, indicating that a similar further decrease is expected between the blank and the following sample. However, in the following sample (EBC_1819) the signal intensity increases by a factor of 1254 compared to the previous blank. Therefore, the extent of carry-over is negligible but detectable by modern mass spectrometers, indicating that a qualitative analysis is insufficient to address sample carry-over. The signal in the blank is just above the threshold at which a MS/MS of this peptide is triggered, therefore the likelihood of the further carried over peptide resulting in a MS/MS in the following (sample) run is extremely low. As a consequence, the LDDDDYPK peptides identified in sample EBC_1819 will all be genuine IDs.

For the Laetoli and Wonderwerk OES samples (Laetoli shown in Figure 4F), a more thorough investigation was carried out using the software Progenesis QI (nonlinear Dynamics, Waters) and generating XICs for all features in all samples after alignment in order to obtain an estimate of the relative abundance of each identified peptide across samples. The abundances of all 'DDDD'-containing peptides (derived from natural cleavage, and trypsin/elastase digestion) in the April-15 and June-15 batches were summed in order to compensate for the different ionization potential of different species. These were then compared across samples and blanks and the fold changes calculated. The calculated maximal carry-over was well below 1% and is likely to be lower due to poor signal to noise ratios especially in the blank runs. Assuming that the carry-over from a blank into the following sample is again 1%, the effective carry-over from sample to sample can be estimated to be below 0.01%. Therefore, this is further supporting evidence for the authenticity of the peptide sequences in these samples.

D. Damage patterns

Post translational modifications (PTMs) of fossil proteins can result from diagenesis processes and therefore the expected trend is one of increase with increasing age of the sample for comparable amounts of sequence preservation. Labile PTMs originally present on the molecule (e.g. phosphorylation) are however expected to be lost over time. Finally, preparation-induced modifications (e.g. carbamidomethylation) should be found in all samples, irrespective of their age, if that part of the sequence is intact. We examined the patterns of PTMs in SCA-1 and SCA-2 for all the samples analysed in this study. The graphs in Supplementary file 3 show that the most common PTMs in bleached OES are:

  • Carbamidomethylation (57.02 Da) on cysteines, as a consequence of sample preparation (reduction-alkylation of disulfide bridges)

  • Deamidation (+0.98 Da) of Asn, Gln

  • Decomposition of Arg to ornithine (−42.02 Da)

  • Pyroglutamic acid formation from either N-terminal Gln or Glu by loss of ammonia/water (−17.01 Da, 18.01 Da)

  • Oxidation of Met, Trp, His (single/double: 15.99 Da, 31.99 Da), or hydroxylation of Lys, Pro, Arg, Tyr (15.99 Da)

  • Phosphorylation of Ser, Thr (79.97 Da)

  • Amidation of the C-terminus (−0.98 Da)

Other modifications were also detected less frequently; the full list and position of each PTM is given in Supplementary file 4. Low spectral counts from sample 4613 (Pinnacle Point) result in lower PTM frequencies.

For SCA-1 diagenesis-induced modifications increase in frequency in fossil samples as compared with modern samples, and then decrease for progressively older Pleistocene samples due to loss of protein sequence preservation. This increases confidence in the authenticity of these older sequences reported. SCA-2 shows a very similar pattern: deamidation of Asn and Gln and oxidation of Met, Trp and His are clearly present throughout, and their frequency increases with age (Supplementary file 3).

E. Volatiles

A number of peaks were present in the subfossil Laetoli sample that were not present in the blank, which corresponded with sulfur-containing VOCs in particular. The GC/QTOFMS has a mass resolution of <3 ppm, allowing extracted ion chromatograms to be used to isolate these sulfur-containing hydrocarbons from the complex VOC mix. The mass spectra across each suspected thiol / thioether peak were compared to the NIST MS library and the similarity and reverse fits for identified species are shown in Appendix 5—table 2. These values show how well the measured mass spectrum matches the library spectrum and are given out of 1000. Due to the low concentrations, even after background subtraction, some contaminant ions are still present, resulting in low similarity fits (comparison of the measured to library spectrum). However, by using the reverse fit (where the library spectrum is compared to the measured spectrum) much higher fits are achieved, since background contaminant ions are ignored. Therefore, the sulfur-containing VOCs have been assigned structures based on their accurate mass measurements, and whether the MS similarity or reverse fits are greater than 700. The Kovats retention index order of the identified species is consistent with previous observations (NIST chemistry webbook), increasing the certainty in the identifications.

The volatile S-containing alkanes identified, absent in the procedural blank, are consistent with degradation of organic matter within a closed system under anoxic conditions, and would not be retained over these timescales within an open system. The calcium carbonate biomineral must therefore effectively form a closed system for the entrapped organic material, enabling predictable degradation. Analysis of the volatile fraction therefore confirmed both the extreme degradation observed by amino acid racemization and predicted based on the thermal age, but also highlighted the superiority of ostrich eggshell as a closed system.

Appendix 5—table 2

Crushed eggshell sulfur-containing VOC emissions from 2.7 Ma OES (Laetoli LOT 13901). NA = no structural isomer can be determined.

https://doi.org/10.7554/eLife.17092.035
CompoundRetention time (min)Measured m/zChemical formulaNIST MS similarityReverse fit
Ethane thiol/dimethylsulfide3.4262.0196C2H6SNANA
Methylthioethane5.5976.0354C3H8S726821
Diethylsulfide8.2490.0511C4H10S725834
1-methylthiopropane8.6390.0509C4H10S646721
Unknown isomer8.86104.0663C5H12SNANANA
2-ethylthio-propane9.66104.0664C5H12S535723
2-methyl-1-(methylthio)-propane10.26104.0670C5H12S636798
ethylpropylsulfide10.85104.0671C5H12S628801
2-methyl-2-(methylthio)-butane11.89118.0823C6H14S567777
1-(ethylthio)-2-methyl-propane12.18118.0824C6H14S817856
2-methyl-3-(methylthio)-butane12.286118.0823C6H14S600820
Unknown isomer12.791118.0826C6H14SNANA
3-methyl-1-(methylthio)-butane12.821118.0827C6H14S779872
Tetrahydro-2,5-dimethylthiophene13.06116.0670C6H12S657741
Tetrahydro-2,5-dimethylthiophene13.14116.0670C6H12S622707

F. Ancient DNA

A single-stranded library was built (Gansauge and Meyer, 2013) and sequenced on an Illumina HiSeq 2500. Of the 282,924 reads sequenced, 68,204 remained after quality filtering and filtering for reads larger than 30 bp using AdapterRemoval (Lindgreen, 2012). Remaining reads were mapped to an ostrich reference nuclear and mitochondrial genomes (Zhang et al., 2015) using BWA-aln (Version: 0.7.5a-r405) with default settings (Li and Durbin, 2009) and the results examined using SAMtools (Li et al., 2009). No sequences were successfully aligned to the reference genomes.

Data availability

The following data sets were generated
    1. Haile J
    (2016) BioProject ID PRJNA314978
    Publicly available at NCBI BioSample (accession no: SAMN04546447).
    1. Harding J
    2. Hall S
    3. Freeman C
    (2016) Simulations on calcite -protein interactions
    Publicly available at The University of Sheffield figshare (https://sheffield.figshare.com/).

References

    1. Abdou AM
    2. Sato K
    3. Kim M
    (2013)
    Bioactive Food Peptides in Health and Disease
    Functional proteins and peptides of hen’s egg origin, Bioactive Food Peptides in Health and Disease, INTECH Open Access Publisher, 10.5772/54030.
    1. Berthelot D
    (1898)
    Sur le mélange des gaz
    Comptes Rendus 126:1703–1706.
    1. Burrus LW
    2. Olwin BB
    (1989)
    Isolation of a receptor for acidic and basic fibroblast growth factor from embryonic chick
    The Journal of Biological Chemistry 264:18647–18653.
  1. Book
    1. Collins M
    2. Riley M
    (2000)
    Amino acid racemization in biominerals: the impact of protein degradation and loss
    In: Goodfriend G, editors. Perspectives in Amino Acid and Protein Geochemistry. Oxford University Press. pp. 120–142.
    1. Deino AL
    (2011)
    Paleontology and Geology of Laetoli Human Evolution in Context . Vertebrate Paleobiology and Paleoanthropology Series
    77–97, 40Ar/ 39 Ar Dating of Laetoli, Tanzania., Paleontology and Geology of Laetoli Human Evolution in Context . Vertebrate Paleobiology and Paleoanthropology Series, Netherlands, Springer, 10.1007/978-90-481-9956-3_4.
    1. Ecker M
    2. Botha-Brink J
    3. Lee-Thorp J
    4. Piuz A
    5. Horwitz L
    (2015)
    Changing Climates, Ecosystems and Environments Within Arid Southern Africa and Adjoining Regions: Palaeoecology of Africa 33
    95–115, Ostrich eggshell as a source of Palaeoenvironmental information in the arid interior of South Africa: A case study from Wonderwerk Cave, Changing Climates, Ecosystems and Environments Within Arid Southern Africa and Adjoining Regions: Palaeoecology of Africa 33, 10.1201/b19410-8.
    1. Hagelberg E
    2. Hofreiter M
    3. Keyser C
    (2015) Introduction. Ancient DNA: the first three decades
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 370:20130371.
    https://doi.org/10.1098/rstb.2013.0371
    1. Hansen J
    2. Sato M
    3. Russell G
    4. Kharecha P
    (2013) Climate sensitivity, sea level and atmospheric carbon dioxide
    Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371:20120294.
    https://doi.org/10.1098/rsta.2012.0294
    1. Harrison T
    (2011)
    Geology, Geochronology, Paleoecology and Paleoenvironment, Volume 1
    Paleontology and geology of laetoli: Human evolution in context, Geology, Geochronology, Paleoecology and Paleoenvironment, Volume 1, Netherlands, Springer.
    1. Harrison T
    (2011)
    Fossil Hominins and the Associated Fauna, Volume 2
    Paleontology and geology of laetoli: Human evolution in context, Fossil Hominins and the Associated Fauna, Volume 2, Netherlands, Springer.
  2. Data
    1. Karl TR
    2. Williams CN
    3. Quinlan FT
    4. Boden TA
    (authors) (1990) United States historical climatology network (HCN) serial temperature and precipitation data
    Environmental Science Division, Publication No. 3404, Carbon Dioxide Information and Analysis Center, Oak Ridge National Laboratory, Oak Ridge, TN, 389.
    1. Logan GA
    2. Collins MJ
    3. Eglinton G
    (1991)
    Taphonomy: Releasing the Data Locked in the Fossil Record. Springer London, Limited
    1–24, Preservation of organic biomolecules, Taphonomy: Releasing the Data Locked in the Fossil Record. Springer London, Limited, 10.1007/978-1-4899-5034-5_1.
  3. Conference
    1. Parkington JE
    (1980)
    The Elands Bay cave sequence: cultural stratigraphy and subsistence strategies
    Proceedings of the 8th Pan-African Congress of Prehistory and Quaternary Studies, Nairobi. pp. 315–320.
    1. Patnaik R
    (2009)
    Ostrich-like eggshells from a 10. 1 million-yr-old Miocene ape locality, Haritalyangar, Himachal Pradesh, India
    Current Science 96:1485–1495.
    1. Rassart E
    2. Bedirian A
    3. Do Carmo S
    4. Guinard O
    5. Sirois J
    6. Terrisse L
    7. Milne R
    (2000) Apolipoprotein D
    Biochimica Et Biophysica Acta (BBA) - Protein Structure and Molecular Enzymology 1482:185–198.
    https://doi.org/10.1016/S0167-4838(00)00162-X

Article and author information

Author details

  1. Beatrice Demarchi

    BioArCh, Department of Archaeology, University of York, York, United Kingdom
    Contribution
    BD, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    beatrice@palaeo.eu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8398-4409
  2. Shaun Hall

    Department of Material Science and Engineering, University of Sheffield, Sheffield, United Kingdom
    Contribution
    SH, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  3. Teresa Roncal-Herrero

    Department of Physics, University of York, York, United Kingdom
    Contribution
    TR-H, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  4. Colin L Freeman

    Department of Material Science and Engineering, University of Sheffield, Sheffield, United Kingdom
    Contribution
    CLF, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    c.l.freeman@sheffield.ac.uk
    Competing interests
    The authors declare that no competing interests exist.
  5. Jos Woolley

    BioArCh, Department of Archaeology, University of York, York, United Kingdom
    Contribution
    JWo, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  6. Molly K Crisp

    Department of Chemistry, University of York, York, United Kingdom
    Contribution
    MKC, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  7. Julie Wilson

    1. Department of Chemistry, University of York, York, United Kingdom
    2. Department of Mathematics, University of York, York, United Kingdom
    Contribution
    JWi, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  8. Anna Fotakis

    Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
    Contribution
    AF, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  9. Roman Fischer

    Advanced Proteomics Facility, Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    Contribution
    RF, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  10. Benedikt M Kessler

    Advanced Proteomics Facility, Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
    Contribution
    BMK, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  11. Rosa Rakownikow Jersie-Christensen

    Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
    Contribution
    RRJ-C, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  12. Jesper V Olsen

    Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
    Contribution
    JVO, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  13. James Haile

    Research Laboratory for Archaeology and the History of Art, University of Oxford, Oxford, United Kingdom
    Contribution
    JH, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  14. Jessica Thomas

    1. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
    2. Molecular Ecology and Fisheries Genetics Laboratory, School of Biological Sciences, Bangor University, Bangor, United Kingdom
    Contribution
    JT, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  15. Curtis W Marean

    1. Institute of Human Origins, SHESC, Arizona State University, Tempe, United States
    2. Centre for Coastal Palaeoscience, Nelson Mandela Metropolitan University, Port Elizabeth, South Africa
    Contribution
    CWM, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  16. John Parkington

    Department of Archaeology, University of Cape Town, Cape Town, South Africa
    Contribution
    JP, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  17. Samantha Presslee

    BioArCh, Department of Archaeology, University of York, York, United Kingdom
    Contribution
    SP, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  18. Julia Lee-Thorp

    Research Laboratory for Archaeology and the History of Art, University of Oxford, Oxford, United Kingdom
    Contribution
    JL-T, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  19. Peter Ditchfield

    Research Laboratory for Archaeology and the History of Art, University of Oxford, Oxford, United Kingdom
    Contribution
    PD, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  20. Jacqueline F Hamilton

    Wolfson Atmospheric Chemistry Laboratories, Department of Chemistry, University of York, York, United Kingdom
    Contribution
    JFH, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  21. Martyn W Ward

    Wolfson Atmospheric Chemistry Laboratories, Department of Chemistry, University of York, York, United Kingdom
    Contribution
    MWW, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  22. Chunting Michelle Wang

    Wolfson Atmospheric Chemistry Laboratories, Department of Chemistry, University of York, York, United Kingdom
    Contribution
    CMW, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  23. Marvin D Shaw

    Wolfson Atmospheric Chemistry Laboratories, Department of Chemistry, University of York, York, United Kingdom
    Contribution
    MDS, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  24. Terry Harrison

    Center for the Study of Human Origins, Department of Anthropology, New York University, New York, United States
    Contribution
    TH, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  25. Manuel Domínguez-Rodrigo

    Department of Prehistory, Complutense University of Madrid, Madrid, Spain
    Contribution
    MD-R, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  26. Ross DE MacPhee

    Department of Mammalogy, American Museum of Natural History, New York, United States
    Contribution
    RDEM, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  27. Amandus Kwekason

    National Museum of Tanzania, Dar es Salaam, Tanzania
    Contribution
    AK, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  28. Michaela Ecker

    Research Laboratory for Archaeology and the History of Art, University of Oxford, Oxford, United Kingdom
    Contribution
    ME, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9581-1882
  29. Liora Kolska Horwitz

    National Natural History Collections, Faculty of Life Sciences, The Hebrew University, Jerusalem, Israel
    Contribution
    LKH, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  30. Michael Chazan

    1. Department of Anthropology, University of Toronto, Toronto, Canada
    2. Evolutionary Studies Institute, University of the Witwatersrand, Braamfontein, South Africa
    Contribution
    MC, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  31. Roland Kröger

    Department of Physics, University of York, York, United Kingdom
    Contribution
    RK, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  32. Jane Thomas-Oates

    1. Department of Chemistry, University of York, York, United Kingdom
    2. Centre of Excellence in Mass Spectrometry, University of York, New York, United States
    Contribution
    JT-O, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  33. John H Harding

    Department of Material Science and Engineering, University of Sheffield, Sheffield, United Kingdom
    Contribution
    JHH, Conception and design, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    For correspondence
    j.harding@sheffield.ac.uk
    Competing interests
    The authors declare that no competing interests exist.
  34. Enrico Cappellini

    Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
    Contribution
    EC, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  35. Kirsty Penkman

    Department of Chemistry, University of York, York, United Kingdom
    Contribution
    KP, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  36. Matthew J Collins

    BioArCh, Department of Archaeology, University of York, York, United Kingdom
    Contribution
    MJC, Conception and design, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    For correspondence
    matthew.collins@york.ac.uk
    Competing interests
    The authors declare that no competing interests exist.

Funding

Engineering and Physical Sciences Research Council (EP/I001514/1)

  • Beatrice Demarchi
  • Teresa Roncal-Herrero
  • Colin L Freeman
  • Roland Kröger
  • John H Harding
  • Matthew J Collins

Natural Environment Research Council (NERC NE/G004625/1)

  • Beatrice Demarchi
  • Molly K Crisp
  • Julia Lee-Thorp
  • Kirsty Penkman
  • Matthew J Collins

Arts and Humanities Research Council (AHRC AH/L006979)

  • Beatrice Demarchi
  • Matthew J Collins

European Research Council (PERG07-GA-2010-268429)

  • Beatrice Demarchi

European Research Council (SMILEY FP7-NMP-2012-SMALL-6-310637)

  • Teresa Roncal-Herrero
  • Roland Kröger

Engineering and Physical Sciences Research Council (EP/K000225/1)

  • Colin L Freeman
  • John H Harding

Engineering and Physical Sciences Research Council (EP/L000202)

  • Colin L Freeman
  • John H Harding

Novo Nordisk (NNF14CC0001)

  • Rosa Rakownikow Jersie-Christensen
  • Jesper V Olsen

Danish National Research Foundation (DNRF94)

  • James Haile

National Science Foundation (BCS-0524087)

  • Curtis W Marean

Hyde Family Foundations

  • Curtis W Marean

John Templeton Foundation

  • Curtis W Marean

Arizona State University

  • Curtis W Marean

National Science Foundation (BCS-1138073)

  • Curtis W Marean

National Science Foundation (547414)

  • Samantha Presslee
  • Ross DE MacPhee
  • Matthew J Collins

National Geographic Society

  • Terry Harrison
  • Amandus Kwekason

Leakey Foundation

  • Terry Harrison
  • Amandus Kwekason

National Science Foundation (BCS-9903434)

  • Terry Harrison
  • Amandus Kwekason

National Science Foundation (BCS-0309513)

  • Terry Harrison
  • Amandus Kwekason

National Science Foundation (BCS-0216683)

  • Terry Harrison
  • Amandus Kwekason

Tanzanian Department of Antiquities (Permit)

  • Manuel Domínguez-Rodrigo

Social Sciences and Humanities Research Council of Canada

  • Liora Kolska Horwitz
  • Michael Chazan

Leverhulme Trust

  • Kirsty Penkman

Spanish Ministry of Economy and Competitivity (HAR2013-45246-C3-1-P)

  • Manuel Domínguez-Rodrigo

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors are grateful to Tom Gilbert, Jessica Hendy and Darrell Kaufman for useful discussion and support. Additional funding: work at the Centre for GeoGenetics was financed by the Danish National Research Foundation (DNRF94).

Copyright

© 2016, Demarchi et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 8,325
    views
  • 1,254
    downloads
  • 185
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Beatrice Demarchi
  2. Shaun Hall
  3. Teresa Roncal-Herrero
  4. Colin L Freeman
  5. Jos Woolley
  6. Molly K Crisp
  7. Julie Wilson
  8. Anna Fotakis
  9. Roman Fischer
  10. Benedikt M Kessler
  11. Rosa Rakownikow Jersie-Christensen
  12. Jesper V Olsen
  13. James Haile
  14. Jessica Thomas
  15. Curtis W Marean
  16. John Parkington
  17. Samantha Presslee
  18. Julia Lee-Thorp
  19. Peter Ditchfield
  20. Jacqueline F Hamilton
  21. Martyn W Ward
  22. Chunting Michelle Wang
  23. Marvin D Shaw
  24. Terry Harrison
  25. Manuel Domínguez-Rodrigo
  26. Ross DE MacPhee
  27. Amandus Kwekason
  28. Michaela Ecker
  29. Liora Kolska Horwitz
  30. Michael Chazan
  31. Roland Kröger
  32. Jane Thomas-Oates
  33. John H Harding
  34. Enrico Cappellini
  35. Kirsty Penkman
  36. Matthew J Collins
(2016)
Protein sequences bound to mineral surfaces persist into deep time
eLife 5:e17092.
https://doi.org/10.7554/eLife.17092

Share this article

https://doi.org/10.7554/eLife.17092

Further reading

    1. Biochemistry and Chemical Biology
    2. Evolutionary Biology
    Adam F Wallace, James D Schiffbauer
    Insight

    Some fragments of ancient protein are less prone to degradation because they bind strongly to the surfaces of minerals.

    1. Evolutionary Biology
    Edited by Ian Baldwin et al.
    Collection Updated

    eLife has published papers on topics as diverse as paleoproteomics, ancient insects and the discovery of a new hominin species.