-
Notifications
You must be signed in to change notification settings - Fork 0
/
SDSS DR12 Fetching data using R.Rmd
91 lines (60 loc) · 3.3 KB
/
SDSS DR12 Fetching data using R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
SDSS DR12 - Fetching data using R
========================================================
Bart Buelens, 26 March 2015.
[www.astrostatistics.org](http://www.astrostatistics.org/projects/sdss-dr-12)
Data Release 12 (DR12) of the Sloan Digital Sky Survey is now available (since early 2015). We obtain some data using SQL queries executed from within R and conduct an initial exploration of the data.
Data are available from the [SDSS SkyServer](http://skyserver.sdss.org/dr12/en/tools/toolshome.aspx) which contains a lot of information about accessing SDSS data. Of particular relevance is a tutorial on [how to fetch data using SQL](http://skyserver.sdss.org/dr12/en/help/howto/search/searchhowtohome.aspx).
Fetching data using SQL queries
-------------------------------
First we load some libraries used in this document.
```{r, message=FALSE}
library(RCurl)
library(ggplot2)
```
We formulate a SQL query, for example the following.
```{r}
mySqlQuery =
"SELECT objID, htmID, z, ra, dec, cx, cy, cz, class, subClass, type, survey, programname
FROM SpecPhoto
WHERE (htmid*37 & 0x000000000000FFFF) < (65)"
```
This query will select the listed columns from the SpecPhoto view. The WHERE statement makes a random selection of 0.1% of the data. How this works is explained [here](http://www.sdss.org/dr12/tutorials/random/).
Very useful is the [schema browser](http://skyserver.sdss.org/dr12/en/help/browser/browser.aspx) giving an overview of all tables and view of the data release. For example, details of the SpecPhoto view are shown [here](http://skyserver.sdss.org/dr12/en/help/browser/browser.aspx#&&history=description+SpecPhoto+V).
The SQL query is executed from within R and returns text in csv format, which can be written to disk:
```{r}
mySqlQuery = gsub(pattern="\n",replacement=" ",x=mySqlQuery)
urlBase = "http://skyserver.sdss.org/dr12/en/tools/search/x_sql.aspx"
X = getForm(urlBase, cmd = mySqlQuery, format = "csv")
write(X,file="SDSSsample.csv")
```
We now have a data file containg the requested SDSS data. The above need only to be conducted once. Thereafter, the data can be loaded from the local hard disk.
Initial data exploration
------------------------
The only aim of this short section is to ascertain that the SQL query has produced sensible data that can be used in other analyses.
Start clean and load the data file:
```{r}
rm(list=ls()) # clear everything in the workspace
A = read.table("SDSSsample.csv", header=TRUE, sep=",", dec=".", comment.char="#")
```
We now have data frame A containing our data. Let's look at what we have:
```{r}
str(A)
xtabs(~class, A)
xtabs(~survey, A)
xtabs(~survey+class, A)
```
Plot positions on celestial sphere using the Mollweide projection. Colour code objects by their class: galaxy, quasar or star.
```{r}
ggplot() +
geom_point(data=A, aes(x=ra, y=dec, colour=class), size=1) +
scale_colour_hue(l=50) +
coord_map("mollweide",xlim=c(0,360),ylim=c(-90,90)) +
scale_y_continuous(breaks = seq(-90, 90, 30)) +
scale_x_continuous(breaks = seq(0, 360, 60)) +
theme(axis.title = element_blank()) +
theme(axis.ticks = element_blank(), axis.text = element_blank()) +
theme(panel.border = element_blank()) +
theme(panel.grid.major = element_line(colour="darkgrey")) +
ggtitle("SDSS DR12 coverage") +
theme(plot.title = element_text(size = rel(2)))
```