Title: | Consensus Seriation for Binary Data |
---|---|
Description: | Determining consensus seriations for binary incidence matrices, using a two-step process of Procrustes-fit correspondence analysis for heuristic selection of partial seriations and iterative regression to establish a single consensus. Contains the Lakhesis Calculator, a graphical platform for identifying seriated sequences. Collins-Elliott (2024) <https://volweb.utk.edu/~scolli46/sceLakhesis.pdf>. |
Authors: | Stephen A. Collins-Elliott [aut, cre]
|
Maintainer: | Stephen A. Collins-Elliott <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.2 |
Built: | 2025-02-17 05:31:37 UTC |
Source: | https://github.com/scollinselliott/lakhesis |
The Kendall-Doran measure of concentration (Kendall 1963; Doran 1971). In a seriated matrix, this function computes the total number cells between the first and last non-zero value, column by column.
conc_col(obj)
conc_col(obj)
obj |
A seriated binary matrix. |
The measure of concentration.
Doran J (1971).
“Computer Analysis of Data from the la Tène Cemetry at Münsingen-Rain.”
In Hodson FR, Kendall DG, Táutu P (eds.), Mathematics in the Archaeological and Historical Sciences, 422–431.
Edinburgh University Press, Edinburgh.
Kendall DG (1963).
“A Statistical Approach to Flinders Petrie's Sequence Dating.”
Bulletin of the International Statistical Institute, 40, 657–680.
data("quattrofontanili") conc_col(quattrofontanili)
data("quattrofontanili") conc_col(quattrofontanili)
The concentration coefficient , which extends the Kendall-Doran measure of concentration to include rows and then weights the total measure by the total sum of values in the matrix. See
conc_col
.
conc_kappa(obj)
conc_kappa(obj)
obj |
A seriated binary matrix. |
The coefficient of concentration.
data("quattrofontanili") conc_kappa(quattrofontanili)
data("quattrofontanili") conc_kappa(quattrofontanili)
Performs a goodness-of-fit test on individual row and column elements using deviance, using a quadratic-logistic model to fit row and column occurrences. In the case of perfect separation of 0/1 values, an NA
value is assigned. Results are reported as values for each row and column.
element_eval(obj) ## S3 method for class 'matrix' element_eval(obj) ## S3 method for class 'incidence_matrix' element_eval(obj)
element_eval(obj) ## S3 method for class 'matrix' element_eval(obj) ## S3 method for class 'incidence_matrix' element_eval(obj)
obj |
A seriated binary matrix. |
A list
containing results in data frames for row and column elements:
RowFit
a data frame containing
id
Row element
p.val
values of the row elements
ColFit
a data frame containing
id
Column element
p.val
values of the column elements
data("quattrofontanili") element_eval(quattrofontanili)
data("quattrofontanili") element_eval(quattrofontanili)
Take an incidence matrix and convert it to a data frame of two columns, where the first column represents the row elements of the incidence matrix and the second column represents the column elements of the incidence matrix. Each row pair represents the incidence (or occurrence) of that row and column element together.
im_long(obj) ## S3 method for class 'matrix' im_long(obj) ## S3 method for class 'incidence_matrix' im_long(obj)
im_long(obj) ## S3 method for class 'matrix' im_long(obj) ## S3 method for class 'incidence_matrix' im_long(obj)
obj |
An incidence matrix. |
A data frame of two columns (row and column of the incidence matrix), in which row of the data frame represents a pair of an
data(quattrofontanili) qf <- im_long(quattrofontanili) # to export for uploading into the Lakhesis Calculator, use write.table() to # remove both row and column names: # write.table(qf, file = 'qf.csv', row.names = FALSE, col.names = FALSE, sep = ",")
data(quattrofontanili) qf <- im_long(quattrofontanili) # to export for uploading into the Lakhesis Calculator, use write.table() to # remove both row and column names: # write.table(qf, file = 'qf.csv', row.names = FALSE, col.names = FALSE, sep = ",")
From two incidience matrices, create a single incidence matrix. Matrices may contain same row or column elements.
im_merge(obj1, obj2) ## S3 method for class 'matrix' im_merge(obj1, obj2) ## S3 method for class 'incidence_matrix' im_merge(obj1, obj2)
im_merge(obj1, obj2) ## S3 method for class 'matrix' im_merge(obj1, obj2) ## S3 method for class 'incidence_matrix' im_merge(obj1, obj2)
obj1 , obj2
|
Two incidence matrices of any size. |
A single incidence matrix.
data(quattrofontanili) qf1 <- quattrofontanili[1:20, 1:40] qf1 <- qf1[rowSums(qf1) != 0, colSums(qf1) != 0] qf2 <- quattrofontanili[30:50, 20:60] qf2 <- qf2[rowSums(qf2) != 0, colSums(qf2) != 0] im_merge(qf1, qf2)
data(quattrofontanili) qf1 <- quattrofontanili[1:20, 1:40] qf1 <- qf1[rowSums(qf1) != 0, colSums(qf1) != 0] qf2 <- quattrofontanili[30:50, 20:60] qf2 <- qf2[rowSums(qf2) != 0, colSums(qf2) != 0] im_merge(qf1, qf2)
Wrapper around the read_csv
function from the readr
package (Wickham et al. 2024). Read a .csv
file in which the first column represents row elements and the second column represents column elements, and convert it into an incidence matrix.
im_read_csv( filename, header = FALSE, characterencoding = "iso-8859-1", remove.hapax = FALSE ) ## S3 method for class 'incidence_matrix' plot(im_seriated)
im_read_csv( filename, header = FALSE, characterencoding = "iso-8859-1", remove.hapax = FALSE ) ## S3 method for class 'incidence_matrix' plot(im_seriated)
filename |
The filename to uploaded (must be in |
header |
If the |
characterencoding |
File encoding as used by |
remove.hapax |
Remove any row or column which has a sum of 1 (i.e., is only attested once), since they do not directly contribute to the result of the seriation. Default is |
A matrix of binary values (0 = row/column occurrence is absence; 1 = row/column occurrence is present).
Wickham H, Hester J, Bryan J (2024). readr: Read Rectangular Text Data. R package version 2.1.5, https://github.com/tidyverse/readr, https://readr.tidyverse.org.
Create an ideal reference matrix of well-seriated values of the same size as the input matrix.
im_ref(obj) ## S3 method for class 'matrix' im_ref(obj)
im_ref(obj) ## S3 method for class 'matrix' im_ref(obj)
obj |
A matrix of size |
A matrix of size with 1s along the diagonal. If
, 1s are placed from cell
to
, with 0 in all other cells.
im_ref(matrix(NA, 5, 5)) im_ref(matrix(1, 7, 12))
im_ref(matrix(NA, 5, 5)) im_ref(matrix(1, 7, 12))
This function returns the row and column consensus seriation for a list
object of the strands
class, containing their rankings and coefficients of association and concentration.
lakhesize(strands, ...) ## S3 method for class 'strands' lakhesize(strands, pbar = TRUE) ## Default S3 method: lakhesize(strands, pbar = TRUE) ## S3 method for class 'lakhesis' plot(result, display = "im_seriated")
lakhesize(strands, ...) ## S3 method for class 'strands' lakhesize(strands, pbar = TRUE) ## Default S3 method: lakhesize(strands, pbar = TRUE) ## S3 method for class 'lakhesis' plot(result, display = "im_seriated")
strands |
A |
pbar |
Displaying a progress bar. Default is |
Consensus seriation is achieved by iterative simple linear regression to handle NA
vales in each strand. To initialize, a regression is performed pairwise, with every strand as the dependent variate and every other strand as the independent
variate. The independent variate's rankings are then regressed onto
. If
, the mean of
and
is used. Then, the values of dependent variate and those of regressed independent are re-ranked, which serves as the dependent variate on the next iteration. The process is repeated, regressing each strand which yields the lowest concentration measure.
A list
of class lakhesis
containing the following:
row
A seriated vector of row elements.
col
A seriated vector of column elements
coef
A data frame
containing the following columns:
Strand
The number of the strand.
Agreement
The measure of agreement, i.e., how well each strand accords with the consensus seriation. Using the square of Spearman's rank correlation coefficient, , between each strand and the consensus ranking, agreement is computed as the product of
for their row and column rankings,
.
Concentration
the concentration coefficient , which provides a measure of the optimality of each strand (see
conc_kappa
).
im_seriated
The seriated incidence matrix, of class incidence_matrix
.
data("qfStrands") x <- lakhesize(qfStrands, pbar = FALSE) # summary(x)
data("qfStrands") x <- lakhesize(qfStrands, pbar = FALSE) # summary(x)
Launch Lakhesis Calculator, a graphical interface to explore binary matrices via correspondence analysis, select potentially well-seriated sequences, and perform consensus seriation. Interface is made with ggplot2
, shiny
, shinydashboard
, and bslib
(Wickham 2016; Chang et al. 2024; Chang and Borges Ribeiro 2021; Sievert et al. 2024).
LC()
LC()
Input is done in the calculator, via a "long" format a two-column .csv
file giving pairs of row and column incidences. See im_read_csv
for details. Conversion of a pre-existing incidence matrix to long format can be performed with im_long
.
Results can be downloaded from the calculator as an .rds
file containing a list
of the following:
consensus
The consensus seriations, PCA, coefficients of agreement and concentration, and seriated incidence matrix.
(lakhesize
).
strands
The strands selected by the investigator.
Opens the Lakhesis Calculator.
Chang W, Borges Ribeiro B (2021).
shinydashboard: Create Dashboards with 'Shiny'.
https://CRAN.R-project.org/package=shinydashboard.
Chang W, Cheng J, Allaire JJ, Sievert C, Schloerke B, Xie Y, Allen J, McPherson J, Dipert A, Borges B (2024).
shiny: Web Application Framework for R.
R package version 1.8.1.9001; https://github.com/rstudio/shiny, https://shiny.posit.co.
Sievert C, Cheng J, Aden-Buie G (2024).
bslib: Custom ‘Bootstrap’ ‘Sass’ Themes for ‘shiny’ and ‘rmarkdown’.
R package version 0.7.0, https://github.com/rstudio/bslib, https://rstudio.github.io/bslib/.
Wickham H (2016).
ggplot2: Elegant Graphics for Data Analysis.
Springer, New York.
Fit scores of correspondence analysis on an incidence matrix to those produced by reference matrix which contain an ideal seriation using a Procrustes method (on the reference matrix, see im_ref
). Rotation is determined by minimizing Euclidean distance from each row score to the nearest reference row score. Correspondence analysis is performed using the ca
package (Nenadic and Greenacre 2007).
## S3 method for class 'procrustean' plot(result) ca_procrustes(obj) ## S3 method for class 'matrix' ca_procrustes(obj) ## S3 method for class 'incidence_matrix' ca_procrustes(obj)
## S3 method for class 'procrustean' plot(result) ca_procrustes(obj) ## S3 method for class 'matrix' ca_procrustes(obj) ## S3 method for class 'incidence_matrix' ca_procrustes(obj)
obj |
An incidence matrix of size n x k. |
A list
object of class strand
containing the following:
ref
The Procrustes-fit coordinates of the scores of the reference seriation.
x
The Procrustes-fit coordinates of the row scores of the data.
y
The Procrustes-fit coordinates of the column scores of the data.
Nenadic O, Greenacre MJ (2007). “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package.” Journal of Statistical Software, 20, 1–13. doi:10.18637/jss.v020.i03.
data("quattrofontanili") s <- ca_procrustes(quattrofontanili) # print(s)
data("quattrofontanili") s <- ca_procrustes(quattrofontanili) # print(s)
Obtain a ranking of row and column scores projected onto a reference curve of an ideal seriation (row and column scores are ranked separately). Scores of correspondence analysis have been fit to those produced by reference matrix contain an ideal seriation using a Procrustes method, projecting them. Rotation is determined by minimizing Euclidean distance from each row score to the nearest reference row score. Correspondence analysis is performed using the ca
package (Nenadic and Greenacre 2007).
## S3 method for class 'strand' plot(strand, display = "ca") ca_procrustes_ser(obj, samples = 10^5) ## S3 method for class 'incidence_matrix' ca_procrustes_ser(obj, samples = 10^5) ## S3 method for class 'matrix' ca_procrustes_ser(obj, samples = 10^5)
## S3 method for class 'strand' plot(strand, display = "ca") ca_procrustes_ser(obj, samples = 10^5) ## S3 method for class 'incidence_matrix' ca_procrustes_ser(obj, samples = 10^5) ## S3 method for class 'matrix' ca_procrustes_ser(obj, samples = 10^5)
obj |
An incidence matrix of size n x k. |
samples |
Number of samples to use for plotting points along polynomial curve. Default is |
A list
of class strand
containing the following:.
$dat
A data frame with the following columns:
Procrustes1, Procrustes2
The location of the point on the biplot after fitting.
CurveIndex
The orthogonal projection of the point onto the reference curve, given as the index of the point sampled along .
Distance
The squared Euclidean distance of the point to the nearest point on the reference curve.
Rank
The ranking of the row or column, a range of 1:nrow`` and
1:ncol“.
Type
Either row
or col
.
sel
Data frame column used in shiny
app to indicate whether point is selected in biplot/curve projection.
$im_seriated
The seriated incidence matrix, of class incidence_matrix
.
Nenadic O, Greenacre MJ (2007). “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package.” Journal of Statistical Software, 20, 1–13. doi:10.18637/jss.v020.i03.
data("quattrofontanili") s <- ca_procrustes_ser(quattrofontanili) # print(s) # summary(s)
data("quattrofontanili") s <- ca_procrustes_ser(quattrofontanili) # print(s) # summary(s)
Three seriated strands selected from quattrofontanili
data, identified by the package author as an example for the documentation of functions.
data("qfStrands")
data("qfStrands")
A list containing data frames output by ca.procrustes.curve
.
data("qfStrands") print(qfStrands)
data("qfStrands") print(qfStrands)
The seriation of tombs from necropoleis at Veii, primarily Quattro Fontanili, but also Valle la Fata, Vaccareccia, and Picazzano, in southern Etruria, established by Close-Brooks and Ridgway (1979).
data("quattrofontanili")
data("quattrofontanili")
A seriated incidence matrix of 81 rows (tombs) and 82 columns (types).
Data entered from Close-Brooks and Ridgway (1979), an English translation of the authors' original publication in Notizie degli Scavi (1963). Descriptions of types may be found in that paper.
Close-Brooks J, Ridgway D (1979). “Veii in the Iron Age.” In Ridgway D, Ridgway FR (eds.), Italy Before the Romans, 95–127. Academic Press, London.
data("quattrofontanili") print(quattrofontanili)
data("quattrofontanili") print(quattrofontanili)
The square of Spearman's rank correlation coefficient applied to two rankings (Spearman 1904). Rows with NA
values are automatically removed.
spearman_sq(r1, r2)
spearman_sq(r1, r2)
r1 , r2
|
Two vectors of paired ranks. |
The square of Spearman's rank correlation coefficient with NA values removed.
Spearman C (1904). “The Proof and Measurement of Association between Two Things.” American Journal of Psychology, 15, 72–101. doi:10.2307/1412159.
# e.g., for two partial seriations: x <- c(1, 2, 3, 4, NA, 5, 6, NA, 7.5, 7.5, 9) y <- c(23, 17, 19, NA, 21, 22, 25, 26, 27, 36, 32) spearman_sq(x, y)
# e.g., for two partial seriations: x <- c(1, 2, 3, 4, NA, 5, 6, NA, 7.5, 7.5, 9) y <- c(23, 17, 19, NA, 21, 22, 25, 26, 27, 36, 32) spearman_sq(x, y)
Given a list of strands, remove a row or column element and re-run seriation by correspondence analysis with Procrustes fitting (ca_procrustes_ser
) to generate a new list of strands that exclude the specified elements. If the resulting strand lacks sufficient points to perform correspondence analysis, that strand is deleted in the output.
strand_add(strand, ...) ## S3 method for class 'strand' strand_add(strand, strands)
strand_add(strand, ...) ## S3 method for class 'strand' strand_add(strand, strands)
strand |
An object of class |
strands |
A |
A list
of class strands
.
From a list
of strands produced by ca_procrustes_ser
, extract two matrices containing the ranks of the rows and columns. The row/column elements are contained in the rows, and the strands are contained in the columns. NA values are entered where a given row/column element is missing from that strand.
strand_extract(strands, ...) ## S3 method for class 'strands' strand_extract(strands)
strand_extract(strands, ...) ## S3 method for class 'strands' strand_extract(strands)
strands |
A |
A list of two matrices:
Row
A matrix of the ranks of the row elements.
Col
A matrix of the ranks of the column elements.
data("quattrofontanili") data("qfStrands") strand_extract(qfStrands)
data("quattrofontanili") data("qfStrands") strand_extract(qfStrands)
Given a list of strands, remove a row or column element and re-run seriation by correspondence analysis with Procrustes fitting (ca_procrustes_ser
) to generate a new list of strands that exclude the specified elements. If the resulting strand lacks sufficient points to perform correspondence analysis, that strand is deleted in the output.
strand_suppress(strands, ...) ## S3 method for class 'strands' strand_suppress(strands, ...) ## Default S3 method: strand_suppress(strands, elements)
strand_suppress(strands, ...) ## S3 method for class 'strands' strand_suppress(strands, ...) ## Default S3 method: strand_suppress(strands, elements)
strands |
A |
elements |
A vector of one or more row or column ids to suppress. |
A list of the strands.
data("quattrofontanili") data("qfStrands") strand_suppress(qfStrands, "QF II 15-16") strand_suppress(qfStrands, c("QF II 15-16", "I", "XIV"))
data("quattrofontanili") data("qfStrands") strand_suppress(qfStrands, "QF II 15-16") strand_suppress(qfStrands, c("QF II 15-16", "I", "XIV"))