Data Science

Python Text Analysis: Topic Modeling

April 4, 2024, 10:00am
In this part, we study unsupervised learning of text data. This is a stand alone work that builds from the two-part text analysis series.

A Basic Introduction to Hierarchical Linear Modeling

March 4, 2024
by Mingfeng Xue. Hierarchical Linear Modeling (HLM) is an extension of linear models, which offers an approach to analyzing data structures with nested levels. This blog elucidates HLM's significance over traditional linear regression models, particularly in handling clustered data and multilevel predictors. Illustrated with an example from educational research, the blog demonstrates model implementation and interpretation steps. It showcases how HLM accommodates both independent variables from different levels and hierarchical structure data, providing insights into their impacts on the outcome variable. Recommended resources further aid readers in mastering HLM techniques.

R Fundamentals: Parts 1-4 (Evening Workshop)

March 5, 2024, 4:00pm
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

What Are Vowels Made Of? Graphing a Classic Dataset with R

February 13, 2024
by Anna Björklund. Vowels are all around us. Mainstream US English has around twelve unique vowels. How can our brains tell these sounds apart? This blog post will help you answer this question by plotting vowel data from a classic American English dataset by Peterson and Barney (1952).

How can we use big data from iNaturalist to address important questions in Entomology?

February 26, 2024
by Leah Lee. Large-scale geographic data over time on insect diversity can be used to answer important questions in Entomology. Open-source, open-access citizen science platforms like iNaturalist generate huge amounts of data on species diversity and distribution at accelerating rates. However, unstructured citizen science data contain inherent biases and need to be used with care. One of the efforts to validate big data from iNaturalist is to cross-check with systematically collected data, such as museum specimens.

Jailynne Estevez

Consulting Drop-In Hours: Fri 3pm-5pm

Consulting Areas: Python, SQL, Stata, HTML / CSS, Javascript, Google AppScripts, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Python Programming, Surveys, Sampling & Interviews, Text Analysis, , Bash or Command Line, Excel, Git or Github, Stata

Quick-tip: the fastest way to speak to a consultant is to first ...

R Fundamentals: Parts 1-4

March 5, 2024, 10:00am
This workshop is a four-part introductory series that will teach you R from scratch with clear introductions, concise examples, and support documents. You will learn how to download and install the open-sourced R Studio software, understand data and basic manipulations, import and subset data, explore and visualize data, and understand the basics of automation in the form of loops and functions. After completion of this workshop you will have a foundational understanding to create, organize, and utilize workflows for your personal research.

Anna Björklund

Data Science Fellow
Linguistics

I am a fifth-year PhD student in the Department of Linguistics with an areal interest in the Wintuan languages, traditionally spoken in the northern Sacramento Valley and now undergoing revitalization. My primary research interests are in leveraging archival recordings for the phonetic analysis of these under-documented languages, as well as designing tools to assist in their revitalization. I have worked as a linguistic consultant for the Paskenta Band of Nomlaki Indians since 2020 and the Wintu Tribe of Northern California since 2022. I received my MA in linguistics from UC...

Chirag Manghani

Consulting Drop-In Hours: Wed 1pm-3pm

Consulting Areas: Python, R, SQL, Stata, SAS, LaTeX, HTML / CSS, Javascript, C++, APIs, Cloud & HPC Computing, Cybersecurity & Data Security, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Sources, Data Visualization, Deep Learning, Machine Learning, Natural Language Processing, Python Programming, R Programming, Software Tools, Text Analysis, Web Scraping, Regression Analysis, Software Output Interpretation, Bash or Command Line, Excel, Git or Github, Qualtrics, RStudio, RStudio...

Nicolas Nunez-Sahr

Consulting Drop-In Hours: By appointment only

Consulting Areas: Python, R, SQL, C++, APIs, Databases & SQL, Data Manipulation and Cleaning, Data Science, Data Visualization, Deep Learning, Machine Learning, Natural Language Processing, Python Programming, R Programming, Text Analysis, Regression Analysis, Software Output Interpretation, Bash or Command Line, Git or Github, RStudio, Google Cloud, PostgreSQL, Python Django

Quick-tip: the fastest way to speak to a consultant is to first ...