An interview about FAIR software, workflows, and virtual research environments (VREs) / science gateways with Sandra Gesing, currently a Senior Research Scientist and Scientific Outreach and Diversity, Equity, and Inclusion (DEI) Lead at the Discovery Partners Institute at the University of Illinois, Chicago.
https://galaxyproject.org/https://dpi.uillinois.edu/https://sciencegateways.org/https://www.rd-alliance.org/groups/fair-virtual-research-environments-wg -
https://doi.org/20.500.14132/chris -->
https://doi.org/20.500.14132/chris?noredirect -->
https://www.dona.net/team/christophe-blanchiDigital Object Identifier Resolution Protocol (DO-IRP): https://www.dona.net/sites/default/files/2022-06/DO-IRPV3.0--2022-06-30.pdf
Episodi mancanti?
DIKW pyramid / DIKW hierarchy - https://en.wikipedia.org/wiki/DIKW_pyramid
"Data becomes information when it is stored *in* a given *formation*."
From B. Fong and D. I. Spivak, “Seven Sketches in Compositionality: An Invitation to Applied Category Theory,” Ch. 3 - Databases, arXiv, Oct. 12, 2018. doi: 10.48550/arXiv.1803.05316."There are only three things we can do with data. We can accrete data by adding it to an existing collection, reduce data by discarding information from an existing collection, or reshape data by placing it in a different kind of collection."
From Z. Tellman, *Elements of Clojure*, Ch. 4 - Composition. Monee, IL: Lulu.com, 2019.types of information: situational, methodological, philosophical (epistemological, axiological, ontological)
From Dorian Taylor, "2022-05-11 types of information", (May 11, 2022). Accessed: Sep. 27, 2022. [Online Video]. Available: https://www.youtube.com/watch?v=zNUNgZ6RTmQInductions vs deductions vs abductions
Informed by M. K. Bergman, A Knowledge Representation Practionary: Guidelines Based on Charles Sanders Peirce. Cham: Springer International Publishing, 2018. doi: 10.1007/978-3-319-98092-8."programs must be written for people to read, and only incidentally for machines to execute."
From preface to first edition (and included in subsequent editions) of H. Abelson, G. J. Sussman, and J. Sussman, *Structure and interpretation of computer programs*, Cambridge, Mass.: MIT Press. -
`.split()`s on strings and `filter`s on `None`
I fought the Law and the Law won
I fought the Law and the Law won
I needed spec compliance; I got none
I fought the Law and the Law won
I fought the Law and the Law wonI varied my output with the latest fad
Breakin' every downstream run
Needed Postel more than I ever had
I fought the Law and the Law won
I fought the Law and theScatterin' parsing like a shotgun
I fought the Law and the Law won
I fought the Law and the Law won
I lost robustness and I lost my fun
I fought the Law and the Law won
I fought the Law and the Law wonI varied my output with the latest fad
Breakin' every downstream run
Needed Postel more than I ever had
I fought the Law and the Law won
I fought the Law and the -
- Linked Data
- Project Jupyter (Notebook, Lab, etc.)
- UI Blocks: Block Protocol
- Personal Knowledge Graphs: Roam, Logseq, Obsidian
- Solid: decentralized data stores
- Resource Description Framework (RDF)
- Twitter: Martynas, AtomGraph
- LinkedDataHub (Apache-2.0 license)
- AtomGraph: Website, GitHub -
I was thinking about FAIR-enabling resources and wanted to distinguish between things that actually have to be running in order for data to be alive and for you to actually find it, access it, interoperate with it, and reuse it, versus "one-time" things that those services will need.
Just about a week ago,
I set out to download.
Seekin' supplementary data,
lookin' for a pot of gold.Things got bad, and things got worse,
I guess you will know the tune.
Oh lord, stuck data mining again.Rode in on semantics,
I'll be hand-waving out if I go.
Trying controlled vocabularies,
must've been seven of 'em or more.
No corresponding authors
have replied to my emails yet.
Oh lord, I'm stuck data mining again.The man from Stack Overflow
said I was on my way.
My code kept raising exceptions.
I was reading tracebacks for days.
I wanted to run a one-off benchmark.
Looks like my plans fell through.
Oh lord, stuck data mining again.If I only had metadata
that was machine-actionable
every time I've had a dataset
that I's told was interoperable.
You know I'd catch the FAIR train
and breeze through my planned reuse.
Oh lord, I'm stuck data mining again.
Oh lord, I'm stuck data mining again. -
Oh give me mappings, lots of mappings, with resolving URIs. Don’t silo me in.
Let me prance through semantics of namespaces that I love. Don’t silo me in.
Let me use an open protocol to access these bytes, and for metadata promise me you’ll keep on the lights. Authenticate me repeatedly, but give clear usage rights. Don’t silo me in.
Just give me data bare. Let me reuse my old CPUs and mint my URIs.
With my own software, let me wander over yonder with least surprise.
I want to probe the provenance of metadata rich and plural, and represent my knowledge to be machine actionable. And I can’t look at schemas if they’re not interoperable. Don’t silo me in.
* [Materials Project](https://materialsproject.org/)
* [Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE)](https://ess-dive.lbl.gov/)
* [National Microbiome Data Collaborative (NMDC)](https://microbiomedata.org/)
* [W3C Provenance (PROV) specs](https://www.w3.org/TR/prov-overview/)
* [Research Equals (R=)](https://www.researchequals.com/)
* [JSON-LD](https://json-ld.org/)
* [Ecological Metadata Language (EML)](https://eml.ecoinformatics.org/)
* [DataCite](https://datacite.org/)
* [OSTI](https://www.osti.gov/)
* [DOI](https://www.doi.org/)
* schema.org
* [OAuth](https://oauth.net/2/)
* [OpenID Connect (OIDC)](https://openid.net/connect/)
* [OpenAPI](https://www.openapis.org/)
* [REST](https://en.wikipedia.org/wiki/Representational_state_transfer)
* [IGSN](https://www.igsn.org/)
* [Data Observation Network for Earth (DataONE)](https://www.dataone.org/)
* [Frictionless Data](https://frictionlessdata.io/)
Materials Project (MP) website: https://materialsproject.org/
Novel Materials Discovery (NOMAD) Laboratory: https://nomad-lab.eu/
Contributor Roles Taxonomy: https://credit.niso.org/
Authentication resources (FAIR A1.2):
- https://portier.github.io/using.html
- https://github.com/simov/grant
- https://docs.konghq.com/U.S. Department of Energy resources:
- Office of Scientific and Technical Information (OSTI) Data ID Service: https://www.osti.gov/data-services
- https://www.energy.gov/science/office-science-pure-data-resourcesConnecting with Patrick:
- https://www.linkedin.com/in/tschaume/
- https://twitter.com/tschaume
- https://appliedenergyscience.lbl.gov/people/patrick-huck -
The FAIR Implementation Profile (FIP) Ontology: https://w3id.org/fair/fip/terms/FIP-Ontology
Linked Open Vocabularies (LOV): https://lov.linkeddata.es/dataset/lov/
FAIRSharing: https://fairsharing.org/
PageRank of Linked Open Vocabularies (LOV): https://donnywinston.com/posts/pagerank-of-linked-open-vocabularies-lov/
Principles of Open Scholarly Infrastructure (POSI): https://openscholarlyinfrastructure.org/
# Component 1: Entities/Activities:
Type: Entity
Type: Activity
Relation: Generation/Invalidation (E-Act)
Relation: Usage (Act-E)
Relation: Communication (Act1-[E]-Act2)Relation: Trigger/Starter of Start of Act (trigger E, starter Act)
Relation: Trigger/Ender of End of Act End of Act (trigger E, ender Act)
# Component 2: Derivations:
Relation: Derivation (E-E, E-Act)Relation: Revision (E-E)
Relation: Quotation (E-E)
Relation: Primary Source (E-E)# Component3 : Agents, Responsibility, and Influence
Type: Agent
Relation: Attribution (E-Agt)
Relation: Association (Act-Agt (role), Act-E (plan))
Relation: Delegation (Agt-Act) - acted on behalf ofRelation: Influencer/Influencee ({E,Act,Agt}-[usage,start,end,generation,invalidation,communication,derviation,attribution,association,delgation]-{E,Act,Agt})
3 core types: entities, activities, agents. “instantaneous events” are put in context of activities.
wrt "time instants":
- generation is at instant of completion of production
- usage is at instant of beginning of utilization
- start, when activity is deemed started, is an instant
- end, when activity is deemed ended, is an instant
- invalidation is at instant of start of destruction, cessation, or expiry10 influencing relations (not including 3 included subtypes of derivation - (1) [was] revision [of], (2) quotation ("was quoted from"), (3) [had] primary source).
The Creative Commons suite of licenses: CC0, CC BY, CC BY-SA, CC-BY-ND, CC BY-NC, CC BY-NC-SA, CC BY-NC-ND.
Code licenses: Server Side Public License, Affero GPL (AGPL), Lesser GPL (LGPL), Mozilla Public License (MPL), Business Source License (used e.g. by Sentry, <https://github.com/getsentry/sentry/blob/master/LICENSE>), Elastic License (for Elasticsearch), Apache 2.0, BSD, MIT. Spectrum of user freedom and redistributor freedom.
"The CRAPL: An academic-strength open source license": <https://matt.might.net/articles/crapl/>
In the W3C Provenance Ontology:
https://www.w3.org/TR/prov-o/#wasDerivedFromThe HTML Anchor Element:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a -
Heather Hedden, "Foundation for a Knowledge Graph Taxonomy Design Best Practices", slides at https://zenodo.org/record/6510205
Teodora Petkova, "The Dialogic Potential of the Web of Data", slides at https://zenodo.org/record/6518557
Tim Berners-Lee's bag of chips
Archival Resource Key (ARK) specification (section on policy metadata): https://datatracker.ietf.org/doc/html/draft-kunze-ark-34#section-5.1.1.
Permanence Levels and the Archives for NIH NLM's Permanent Web Documents: https://www.nlm.nih.gov/pubs/techbull/ma05/ma05_archive.html.
- Mostra di più