OS X is based on Mach and BSD. Like Mach and most BSD UNIX systems, it contains an advanced scheduler based on the CMU Mach 3 scheduler. This chapter describes the scheduler from the perspective of both a kernel programmer and an application developer attempting to set scheduling parameters.
This chapter begins with the Overview of Scheduling, which describes the basic concepts behind Mach scheduling at a high level, including real-time priority support.
The second section, Using Mach Scheduling From User Applications, describes how to access certain key Mach scheduler routines from user applications and from other parts of the kernel outside the scheduler.
The OS X machine is on and not locked / logged out / etc. Also, please disregard my previous note regarding the MeshCentral server running on a Docker container within that machine. I spun up a public/cloud instance to join it to and replicated the same issue without the 'odd' network scenario. The files are updated each week day Monday-Friday by 8AM EST.
- MeshLab is an open source, portable, and extensible system for the processing and editing of unstructured 3D triangular meshes. The system is aimed to help the processing of the typical not-so-small unstructured models arising in 3D scanning, providing a set of tools for editing, cleaning, healing, inspecting, rendering and converting this kind of meshes.
- Read the Mac OS eula. It's free to use on any compatible hardware since 2016. CUDA is also free to use, as in free of charge. That's not what people are complaining about. They're complaining about it being a closed proprietary technology, but it's no more closed and proprietary than OSX is becoming given depreciation of industry-standard APIs.
- MACS2 parameters. There are seven major functions available in MACS2 serving as sub-commands. We will only cover callpeak in this lesson, but you can use macs2 COMMAND -h to find out more, if you are interested. Callpeak is the main function in MACS2 and can be invoked by typing macs2 callpeak.If you type this command without parameters, you will see a full description of commandline options.
The third section, Kernel Thread APIs, explains scheduler-related topics including how to create and terminate kernel threads and describes the BSD spl
macros and their limited usefulness in OS X.
Overview of Scheduling
The OS X scheduler is derived from the scheduler used in OSFMK 7.3. In general, much documentation about prior implementations applies to the scheduler in OS X, although you will find numerous differences. The details of those differences are beyond the scope of this overview.
Mach scheduling is based on a system of run queues at various priorities that are handled in different ways. The priority levels are divided into four bands according to their characteristics, as described in Table 10-1.
Priority Band | Characteristics |
---|---|
Normal | normal application thread priorities |
System high priority | threads whose priority has been raised above normal threads |
Kernel mode only | reserved for threads created inside the kernel that need to run at a higher priority than all user space threads (I/O Kit workloops, for example) |
Real-time threads | threads whose priority is based on getting a well-defined fraction of total clock cycles, regardless of other activity (in an audio player application, for example). |
Threads can migrate between priority levels for a number of reasons, largely as an artifact of the time sharing algorithm used. However, this migration is within a given band.
Threads marked as being real-time priority are also special in the eyes of the scheduler. A real-time thread tells the scheduler that it needs to run for A
cycles out of the next B
cycles. For example, it might need to run for 3000 out of the next 7000 clock cycles in order to keep up. It also tells the scheduler whether those cycles must be contiguous. Using long contiguous quanta is generally frowned upon but is occasionally necessary for specialized real-time applications.
The kernel will make every effort to honor the request, but since this is soft real-time, it cannot be guaranteed. In particular, if the real-time thread requests something relatively reasonable, its priority will remain in the real-time band, but if it lies blatantly about its requirements and behaves in a compute-bound fashion, it may be demoted to the priority of a normal thread.
Changing a thread's priority to turn it into a real-time priority thread using Mach calls is described in more detail in Using Mach Scheduling From User Applications.
In addition to the raw Mach RPC interfaces, some aspects of a thread's priority can be controlled from user space using the POSIX thread priority API. The POSIX thread API is able to set thread priority only within the lowest priority band (0–63). For more information on the POSIX thread priority API, see Using the pthreads API to Influence Scheduling.
Why Did My Thread Priority Change?
There are many reasons that a thread's priority can change. This section attempts to explain the root cause of these thread priority changes.
A real-time thread, as mentioned previously, is penalized (and may even be knocked down to normal thread priority) if it exceeds its time quantum without blocking repeatedly. For this reason, it is very important to make a reasonable guess about your thread's workload if it needs to run in the real-time band.
Threads that are heavily compute-bound are given lower priority to help minimize response time for interactive tasks so that high–priority compute–bound threads cannot monopolize the system and prevent lower–priority I/O-bound threads from running. Even at a lower priority, the compute–bound threads still run frequently, since the higher–priority I/O-bound threads do only a short amount of processing, block on I/O again, then allow the compute-bound threads to execute.
All of these mechanisms are operating continually in the Mach scheduler. This means that threads are frequently moving up or down in priority based upon their behavior and the behavior of other threads in the system.
Using Mach Scheduling From User Applications
There are three basic ways to change how a user thread is scheduled. You can use the BSD pthreads
API to change basic policy and importance. You can also use Mach RPC calls to change a task's importance. Finally, you can use RPC calls to change the scheduling policy to move a thread into a different scheduling band. This is commonly used when interacting with CoreAudio.
The pthreads
API is a user space API, and has limited relevance for kernel programmers. The Mach thread and task APIs are more general and can be used from anywhere in the kernel. The Mach thread and task calls can also be called from user applications.
Using the pthreads
API to Influence Scheduling
OS X supports a number of policies at the POSIX threads API level. If you need real-time behavior, you must use the Mach thread_policy_set
call. This is described in Using the Mach Thread API to Influence Scheduling.
The pthreads
API adjusts the priority of threads within a given task. It does not necessarily impact performance relative to threads in other tasks. To increase the priority of a task, you can use nice
or renice
from the command line or call getpriority
and setpriority
from your application.
The API provides two functions: pthread_getschedparam
and pthread_setschedparam
. Their prototypes look like this:
The arguments for pthread_getschedparam
are straightforward. The first argument is a thread ID, and the others are pointers to memory where the results will be stored.
The arguments to pthread_setschedparam
are not as obvious, however. As with pthread_getschedparam
, the first argument is a thread ID.
The second argument to pthread_setschedparam
is the desired policy, which can currently be one of SCHED_FIFO
(first in, first out), SCHED_RR
(round-robin), or SCHED_OTHER
. The SCHED_OTHER
policy is generally used for extra policies that are specific to a given operating system, and should thus be avoided when writing portable code.
The third argument is a structure that contains various scheduling parameters.
Here is a basic example of using pthreads
functions to set a thread's scheduling policy and priority.
This code snippet sets the scheduling policy for the current thread to round-robin scheduling, and sets the thread's relative importance within the task to the value passed in through the priority
argument.
For more information, see the manual page for pthread
.
Using the Mach Thread API to Influence Scheduling
This API is frequently used in multimedia applications to obtain real-time priority. It is also useful in other situations when the pthread
scheduling API cannot be used or does not provide the needed functionality.
The API consists of two functions, thread_policy_set
and thread_policy_get
.
The parameters of these functions are roughly the same, except that the thread_policy_get
function takes pointers for the count
and the get_default
arguments. The count is an inout
parameter, meaning that it is interpreted as the maximum amount of storage (in units of int32_t
) that the calling task has allocated for the return, but it is also overwritten by the scheduler to indicate the amount of data that was actually returned.
These functions get and set several parameters, according to the thread policy chosen. The possible thread policies are listed in Table 10-2.
Policy | Meaning |
---|---|
| Default value |
| Used to specify real-time behavior. |
| Used to indicate the importance of computation relative to other threads in a given task. |
The following code snippet shows how to set the priority of a task to tell the scheduler that it needs real-time performance. The example values provided in comments are based on the estimated needs of esd
(the Esound daemon).
The time values are in terms of Mach absolute time units. Since these values differ on different CPUs, you should generally use numbers relative to HZ (a global variable in the kernel that contains the current number of ticks per second). You can either handle this conversion yourself by dividing this value by an appropriate quantity or use the conversion routines described in Using Kernel Time Abstractions .
Say your computer reports 133 million for the value of HZ. If you pass the example values given as arguments to this function, your thread tells the scheduler that it needs approximately 40,000 (HZ/3300) out of the next 833,333 (HZ/160) bus cycles. The preemptible
value (1) indicates that those 40,000 bus cycles need not be contiguous. However, the constraint
value (HZ/2200) tells the scheduler that there can be no more than 60,000 bus cycles between the start of computation and the end of computation.
Note: Because the constraint sets a maximum bound for computation, it must be larger than the value for computation.
A straightforward example using this API is code that displays video directly to the framebuffer hardware. It needs to run for a certain number of cycles every frame to get the new data into the frame buffer. It can be interrupted without worry, but if it is interrupted for too long, the video hardware starts displaying an outdated frame before the software writes the updated data, resulting in a nasty glitch. Audio has similar behavior, but since it is usually buffered along the way (in hardware and in software), there is greater tolerance for variations in timing, to a point.
Another policy call is THREAD_PRECEDENCE_POLICY
. This is used for setting the relative importance of non-real-time threads. Its calling convention is similar, except that its structure is thread_precedence_policy
, and contains only one field, an integer_t
called importance
. While this is a signed 32-bit value, the minimum legal value is zero (IDLE_PRI
). threads set to IDLE_PRI
will only execute when no other thread is scheduled to execute.
In general, larger values indicate higher priority. The maximum limit is subject to change, as are the priority bands, some of which have special purposes (such as real-time threads). Thus, in general, you should use pthreads APIs to achieve this functionality rather than using this policy directly unless you are setting up an idle thread.
Using the Mach Task API to Influence Scheduling
This relatively simple API is not particularly useful for most developers. However, it may be beneficial if you are developing a graphical user interface for Darwin. It also provides some insight into the prioritization of tasks in OS X. It is presented here for completeness.
The API consists of two functions, task_policy_set
and task_policy_get
.
As with thread_policy_set
and thread_policy_get
, the parameters are similar, except that the task_policy_get
function takes pointers for the count
and the get_default
arguments. The count
argument is an inout
parameter. It is interpreted as the maximum amount of storage that the calling task has allocated for the return, but it is also overwritten by the scheduler to indicate the amount of data that was actually returned.
These functions get and set a single parameter, that of the role of a given task, which changes the way the task's priority gets altered over time. The possible roles of a task are listed in Table 10-3.
Role | Meaning |
---|---|
| Default value |
| This is set when a process is executed with |
| GUI application in the foreground. There can be more than one foreground application. |
| GUI application in the background. |
| Reserved for the dock or equivalent (assigned FCFS). |
| Reserved for |
The following code snippet shows how to set the priority of a task to tell the scheduler that it is a foreground application (regardless of whether it really is).
Kernel Thread APIs
The OS X scheduler provides a number of public APIs. While many of these APIs should not be used, the APIs to create, destroy, and alter kernel threads are of particular importance. While not technically part of the scheduler itself, they are inextricably tied to it.
The scheduler directly provides certain services that are commonly associated with the use of kernel threads, without which kernel threads would be of limited utility. For example, the scheduler provides support for wait queues, which are used in various synchronization primitives such as mutex locks and semaphores.
Creating and Destroying Kernel Threads
The recommended interface for creating threads within the kernel is through the I/O Kit. It provides IOCreateThread
, IOThreadSelf
, and IOExitThread
functions that make it relatively painless to create threads in the kernel.
The basic functions for creating and terminating kernel threads are:
With the exception of IOCreateThread
(which is a bit more complex), the I/O Kit functions are fairly thin wrappers around Mach thread functions. The types involved are also very thin abstractions. IOThread
is really the same as thread_t
.
The IOCreateThread
function creates a new thread that immediately begins executing the function that you specify. It passes a single argument to that function. If you need to pass more than one argument, you should dynamically allocate a data structure and pass a pointer to that structure.
For example, the following code creates a kernel thread and executes the function myfunc
in that thread:
One other useful function is thread_terminate
. This can be used to destroy an arbitrary thread (except, of course, the currently running thread). This can be extremely dangerous if not done correctly. Before tearing down a thread with thread_terminate
, you should lock the thread and disable any outstanding timers against it. If you fail to deactivate a timer, a kernel panic will occur when the timer expires.
With that in mind, you may be able to terminate a thread as follows:
There thread is of type thread_t
. In general, you can only be assured that you can kill yourself, not other threads in the system. The function thread_terminate
takes a single parameter of type thread_act_t
(a thread activation). The function getact_thread
takes a thread shuttle (thread_shuttle_t
) or thread_t
and returns the thread activation associated with it.
SPL
and Friends
BSD–based and Mach–based operating systems contain legacy functions designed for basic single-processor synchronization. These include functions such as splhigh
, splbio
, splx
, and other similar functions. Since these functions are not particularly useful for synchronization in an SMP situation, they are not particularly useful as synchronization tools in OS X.
If you are porting legacy code from earlier Mach–based or BSD–based operating systems, you must find an alternate means of providing synchronization. In many cases, this is as simple as taking the kernel or network funnel. In parts of the kernel, the use of spl
functions does nothing, but causes no harm if you are holding a funnel (and results in a panic if you are not). In other parts of the kernel, spl
macros are actually used. Because spl
cannot necessarily be used for its intended purpose, it should not be used in general unless you are writing code it a part of the kernel that already uses it. You should instead use alternate synchronization primitives such as those described in Synchronization Primitives.
Wait Queues and Wait Primitives
The wait queue API is used extensively by the scheduler and is closely tied to the scheduler in its implementation. It is also used extensively in locks, semaphores, and other synchronization primitives. The wait queue API is both powerful and flexible, and as a result is somewhat large. Not all of the API is exported outside the scheduler, and parts are not useful outside the context of the wait queue functions themselves. This section documents only the public API.
The wait queue API includes the following functions:
Most of the functions and their arguments are straightforward and are not presented in detail. However, a few require special attention.
Most of the functions take an event_t as an argument. These can be arbitrary 32-bit values, which leads to the potential for conflicting events on certain wait queues. The traditional way to avoid this problem is to use the address of a data object that is somehow related to the code in question as that 32-bit integer value.
For example, if you are waiting for an event that indicates that a new block of data has been added to a ring buffer, and if that ring buffer's head pointer was called rb_head
, you might pass the value &rb_head
as the event ID. Because wait queue usage does not generally cross address space boundaries, this is generally sufficient to avoid any event ID conflicts.
Notice the functions ending in _locked
. These functions require that your thread be holding a lock on the wait queue before they are called. Functions ending in _locked
are equivalent to their nonlocked counterparts (where applicable) except that they do not lock the queue on entry and may not unlock the queue on exit (depending on the value of unlock
). The remainder of this section does not differentiate between locked and unlocked functions.
The wait_queue_alloc
and wait_queue_init
functions take a policy parameter, which can be one of the following:
SYNC_POLICY_FIFO
—first-in, first-outSYNC_POLICY_FIXED_PRIORITY
—policy based on thread prioritySYNC_POLICY_PREPOST
—keep track of number of wakeups where no thread was waiting and allow threads to immediately continue executing without waiting until that count reaches zero. This is frequently used when implementing semaphores.
You should not use the wait_queue_init
function outside the scheduler. Because a wait queue is an opaque object outside that context, you cannot determine the appropriate size for allocation. Thus, because the size could change in the future, you should always use wait_queue_alloc
and wait_queue_free
unless you are writing code within the scheduler itself.
Similarly, the functions wait_queue_member
, wait_queue_member_locked
, wait_queue_link
, wait_queue_unlink
, and wait_queue_unlink_one
are operations on subordinate queues, which are not exported outside the scheduler.
The function wait_queue_member
determines whether a subordinate queue is a member of a queue.
The functions wait_queue_link
and wait_queue_unlink
link and unlink a given subordinate queue from its parent queue, respectively.
The function wait_queue_unlink_one
unlinks the first subordinate queue in a given parent and returns it.
The function wait_queue_assert_wait
causes the calling thread to wait on the wait queue until it is either interrupted (by a thread timer, for example) or explicitly awakened by another thread. The interruptible
flag indicates whether this function should allow an asynchronous event to interrupt waiting.
The function wait_queue_wakeup_all
wakes up all threads waiting on a given queue for a particular event.
The function wait_queue_peek_locked
returns the first thread from a given wait queue that is waiting on a given event. It does not remove the thread from the queue, nor does it wake the thread. It also returns the wait queue where the thread was found. If the thread is found in a subordinate queue, other subordinate queues are unlocked, as is the parent queue. Only the queue where the thread was found remains locked.
The function wait_queue_pull_thread_locked
pulls a thread from the wait queue and optionally unlocks the queue. This is generally used with the result of a previous call to wait_queue_peek_locked
.
The function wait_queue_wakeup_identity_locked
wakes up the first thread that is waiting for a given event on a given wait queue and starts it running but leaves the thread locked. It then returns a pointer to the thread. This can be used to wake the first thread in a queue and then modify unrelated structures based on which thread was actually awakened before allowing the thread to execute.
The function wait_queue_wakeup_one
wakes up the first thread that is waiting for a given event on a given wait queue.
The function wait_queue_wakeup_thread
wakes up a given thread if and only if it is waiting on the specified event and wait queue (or one of its subordinates).
The function wait_queue_remove
wakes a given thread without regard to the wait queue or event on which it is waiting.
Meshcatchrelaoded Mac Os Download
Copyright © 2002, 2013 Apple Inc. All Rights Reserved. Terms of Use | Privacy Policy | Updated: 2013-08-08
Contributors: Meeta Mistry, Radhika Khetani
Approximate time: 80 minutes
Learning Objectives
- Describe the different components of the MACS2 peak calling algorithm
- Describe the parameters involved in running MACS2
- List and describe the output files from MACS2
Peak calling, the next step in our workflow, is a computational method used to identify areas in the genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing experiment.
For ChIP-seq experiments, what we observe from the alignment files is a strand asymmetry with read densities on the +/- strand, centered around the binding site. The 5' ends of the selected fragments will form groups on the positive- and negative-strand. The distributions of these groups are then assessed using statistical measures and compared against background (input or mock IP samples) to determine if the site of enrichment is likely to be a real binding site.
Image source: Wilbanks and Faccioti, PLoS One 2010
There are various tools that are available for peak calling. One of the more commonly used peak callers is MACS2, and we will demonstrate it in this session. Note that in this Session the term ‘tag' and sequence ‘read' are used interchangeably.
NOTE: Our dataset is investigating two transcription factors and so our focus is on identifying short degenerate sequences that present as punctate binding sites. ChIP-seq analysis algorithms are specialized in identifying one of two types of enrichment (or have specific methods for each): broad peaks or broad domains (i.e. histone modifications that cover entire gene bodies) or narrow peaks (i.e. a transcription factor binding). Narrow peaks are easier to detect as we are looking for regions that have higher amplitude and are easier to distinguish from the background, compared to broad or dispersed marks. There are also ‘mixed' binding profiles which can be hard for algorithms to discern. An example of this is the binding properties of PolII, which binds at promotor and across the length of the gene resulting in mixed signals (narrow and broad).
MACS2
A commonly used tool for identifying transcription factor binding sites is named Model-based Analysis of ChIP-seq (MACS). The MACS algorithm captures the influence of genome complexity to evaluate the significance of enriched ChIP regions. Although it was developed for the detection of transcription factor binding sites it is also suited for larger regions.
MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used either for the ChIP sample alone, or along with a control sample which increases specificity of the peak calls. The MACS workflow is depicted below. In this lesson, we will describe the steps in more detail.
Removing redundancy
MACS provides different options for dealing with duplicate tags at the exact same location, that is tags with the same coordination and the same strand. The default is to keep a single read at each location. The auto
option, which is very commonly used, tells MACS to calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as the pvalue cutoff. An alternative is to set the all
option, which keeps every tag. If an integer
is specified, then at most that many tags will be kept at the same location. This redundancy is consistently applied for both the ChIP and input samples.
Why worry about duplicates?Reads with the same start position are considered duplicates. These duplicates can arise from experimental artifacts, but can also contribute to genuine ChIP-signal.
- The bad kind of duplicates: If initial starting material is low this can lead to overamplification of this material before sequencing. Any biases in PCR will compound this problem and can lead to artificially enriched regions. Also blacklisted (repeat) regions with ultra high signal will also be high in duplicates. Masking these regions prior to analysis can help remove this problem.
- The good kind of duplicates: You can expect some biological duplicates with ChIP-seq since you are only sequencing a small part of the genome. This number can increase if your depth of coverage is excessive or if your protein only binds to few sites. If there are a good proportion of biological dupicates, removal can lead to an underestimation of the ChIP signal.
The take-home:
- Consider your enrichment efficiency and sequencing depth. Try to discriminate via genome browser of your non-deduplicated data. Bona fide peaks will have multiple overlapping reads with offsets, while samples with only PCR duplicates will stack up perfectly without offsets. A possible solution to distinguishing biological duplicate from PCR artifact would be to include UMIs into your experimental setup.
- Retain duplicates for differential binding analysis.
- If you are expecting binding in repetitive regions, use paired-end sequencing and keep duplicates.
Otherwise, best practice is to remove duplicates prior to peak calling.
Modeling the shift size
The tag density around a true binding site should show a bimodal enrichment pattern (or paired peaks). MACS takes advantage of this bimodal pattern to empirically model the shifting size to better locate the precise binding sites.
To find paired peaks to build the model, MACS first scans the whole dataset searching for highly significant enriched regions. This is done only using the ChIP sample! Given a sonication size (bandwidth
) and a high-confidence fold-enrichment (mfold
), MACS slides two bandwidth
windows across the genome to find regions with tags more than mfold
enriched relative to a random tag genome distribution.
MACS randomly samples 1,000 of these high-quality peaks, separates their positive and negative strand tags, and aligns them by the midpoint between their centers. The distance between the modes of the two peaks in the alignment is defined as ‘d' and represents the estimated fragment length. MACS shifts all the tags by d/2 toward the 3' ends to the most likely protein-DNA interaction sites.
Scaling libraries
For experiments in which sequence depth differs between input and treatment samples, MACS linearly scales the total control tag count to be the same as the total ChIP tag count. The default behaviour is for the larger sample to be scaled down.
Effective genome length
To calculate λBG from tag count, MAC2 requires the effective genome size or the size of the genome that is mappable. Mappability is related to the uniqueness of the k-mers at a particular position the genome. Low-complexity and repetitive regions have low uniqueness, which means low mappability. Therefore we need to provide the effective genome length to correct for the loss of true signals in low-mappable regions.
How do I obtain the effective genome length?
The MACS2 software has some pre-computed values for commonly used organisms (human, mouse, worm and fly). If you wanted you could compute a more accurate values based on your organism and build. The deepTools docs has additional pre-computed values for more recent builds but also has some good materials on how to go about computing it.
Peak detection
After MACS shifts every tag by d/2, it then slides across the genome using a window size of 2d to find candidate peaks. The tag distribution along the genome can be modeled by a Poisson distribution. The Poisson is a one parameter model, where the parameter λ is the expected number of reads in that window.
Peak calling, the next step in our workflow, is a computational method used to identify areas in the genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing experiment.
For ChIP-seq experiments, what we observe from the alignment files is a strand asymmetry with read densities on the +/- strand, centered around the binding site. The 5' ends of the selected fragments will form groups on the positive- and negative-strand. The distributions of these groups are then assessed using statistical measures and compared against background (input or mock IP samples) to determine if the site of enrichment is likely to be a real binding site.
Image source: Wilbanks and Faccioti, PLoS One 2010
There are various tools that are available for peak calling. One of the more commonly used peak callers is MACS2, and we will demonstrate it in this session. Note that in this Session the term ‘tag' and sequence ‘read' are used interchangeably.
NOTE: Our dataset is investigating two transcription factors and so our focus is on identifying short degenerate sequences that present as punctate binding sites. ChIP-seq analysis algorithms are specialized in identifying one of two types of enrichment (or have specific methods for each): broad peaks or broad domains (i.e. histone modifications that cover entire gene bodies) or narrow peaks (i.e. a transcription factor binding). Narrow peaks are easier to detect as we are looking for regions that have higher amplitude and are easier to distinguish from the background, compared to broad or dispersed marks. There are also ‘mixed' binding profiles which can be hard for algorithms to discern. An example of this is the binding properties of PolII, which binds at promotor and across the length of the gene resulting in mixed signals (narrow and broad).
MACS2
A commonly used tool for identifying transcription factor binding sites is named Model-based Analysis of ChIP-seq (MACS). The MACS algorithm captures the influence of genome complexity to evaluate the significance of enriched ChIP regions. Although it was developed for the detection of transcription factor binding sites it is also suited for larger regions.
MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used either for the ChIP sample alone, or along with a control sample which increases specificity of the peak calls. The MACS workflow is depicted below. In this lesson, we will describe the steps in more detail.
Removing redundancy
MACS provides different options for dealing with duplicate tags at the exact same location, that is tags with the same coordination and the same strand. The default is to keep a single read at each location. The auto
option, which is very commonly used, tells MACS to calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as the pvalue cutoff. An alternative is to set the all
option, which keeps every tag. If an integer
is specified, then at most that many tags will be kept at the same location. This redundancy is consistently applied for both the ChIP and input samples.
Why worry about duplicates?Reads with the same start position are considered duplicates. These duplicates can arise from experimental artifacts, but can also contribute to genuine ChIP-signal.
- The bad kind of duplicates: If initial starting material is low this can lead to overamplification of this material before sequencing. Any biases in PCR will compound this problem and can lead to artificially enriched regions. Also blacklisted (repeat) regions with ultra high signal will also be high in duplicates. Masking these regions prior to analysis can help remove this problem.
- The good kind of duplicates: You can expect some biological duplicates with ChIP-seq since you are only sequencing a small part of the genome. This number can increase if your depth of coverage is excessive or if your protein only binds to few sites. If there are a good proportion of biological dupicates, removal can lead to an underestimation of the ChIP signal.
The take-home:
- Consider your enrichment efficiency and sequencing depth. Try to discriminate via genome browser of your non-deduplicated data. Bona fide peaks will have multiple overlapping reads with offsets, while samples with only PCR duplicates will stack up perfectly without offsets. A possible solution to distinguishing biological duplicate from PCR artifact would be to include UMIs into your experimental setup.
- Retain duplicates for differential binding analysis.
- If you are expecting binding in repetitive regions, use paired-end sequencing and keep duplicates.
Otherwise, best practice is to remove duplicates prior to peak calling.
Modeling the shift size
The tag density around a true binding site should show a bimodal enrichment pattern (or paired peaks). MACS takes advantage of this bimodal pattern to empirically model the shifting size to better locate the precise binding sites.
To find paired peaks to build the model, MACS first scans the whole dataset searching for highly significant enriched regions. This is done only using the ChIP sample! Given a sonication size (bandwidth
) and a high-confidence fold-enrichment (mfold
), MACS slides two bandwidth
windows across the genome to find regions with tags more than mfold
enriched relative to a random tag genome distribution.
MACS randomly samples 1,000 of these high-quality peaks, separates their positive and negative strand tags, and aligns them by the midpoint between their centers. The distance between the modes of the two peaks in the alignment is defined as ‘d' and represents the estimated fragment length. MACS shifts all the tags by d/2 toward the 3' ends to the most likely protein-DNA interaction sites.
Scaling libraries
For experiments in which sequence depth differs between input and treatment samples, MACS linearly scales the total control tag count to be the same as the total ChIP tag count. The default behaviour is for the larger sample to be scaled down.
Effective genome length
To calculate λBG from tag count, MAC2 requires the effective genome size or the size of the genome that is mappable. Mappability is related to the uniqueness of the k-mers at a particular position the genome. Low-complexity and repetitive regions have low uniqueness, which means low mappability. Therefore we need to provide the effective genome length to correct for the loss of true signals in low-mappable regions.
How do I obtain the effective genome length?
The MACS2 software has some pre-computed values for commonly used organisms (human, mouse, worm and fly). If you wanted you could compute a more accurate values based on your organism and build. The deepTools docs has additional pre-computed values for more recent builds but also has some good materials on how to go about computing it.
Peak detection
After MACS shifts every tag by d/2, it then slides across the genome using a window size of 2d to find candidate peaks. The tag distribution along the genome can be modeled by a Poisson distribution. The Poisson is a one parameter model, where the parameter λ is the expected number of reads in that window.
Instead of using a uniform λ estimated from the whole genome, MACS uses a dynamic parameter, λlocal, defined for each candidate peak. The lambda parameter is estimated from the control sample and is deduced by taking the maximum value across various window sizes:
λlocal = max(λBG, λ1k, λ5k, λ10k).
In this way lambda captures the influence of local biases, and is robust against occasional low tag counts at small local regions. Possible sources for these biases include local chromatin structure, DNA amplification and sequencing bias, and genome copy number variation.
A region is considered to have a significant tag enrichment if the p-value < 10e-5 (this can be changed from the default). This is a Poisson distribution p-value based on λ.
Overlapping enriched peaks are merged, and each tag position is extended ‘d' bases from its center. The location in the peak with the highest fragment pileup, hereafter referred to as the summit, is predicted as the precise binding location. The ratio between the ChIP-seq tag count and λlocal is reported as the fold enrichment.
Estimation of false discovery rate
Each peak is considered an independent test and thus, when we encounter thousands of significant peaks detected in a sample we have a multiple testing problem. In MACSv1.4, the FDR was determined empirically by exchanging the ChIP and control samples. However, in MACS2, p-values are now corrected for multiple comparison using the Benjamini-Hochberg correction.
Running MACS2
We will be using the newest version of this tool, MACS2. The underlying algorithm for peak calling remains the same as before, but it comes with some enhancements in functionality.
Setting up
To run MACS2, we will first start an interactive session using 1 core (do this only if you don't already have one) and load the macs2 library:
We will also need to create a directory for the output generated from MACS2:
Now change directories to the results
folder:
Since we only created a filtered BAM file for a single sample, we will need to copy over BAM files for all 6 files. We have created these for you and you can copy them over using the command below:
MACS2 parameters
There are seven major functions available in MACS2 serving as sub-commands. We will only cover callpeak
in this lesson, but you can use macs2 COMMAND -h
to find out more, if you are interested.
callpeak
is the main function in MACS2 and can be invoked by typing macs2 callpeak
. If you type this command without parameters, you will see a full description of commandline options. Here is a shorter list of the commonly used ones:
Input file options
-t
: The IP data file (this is the only REQUIRED parameter for MACS)-c
: The control or mock data file-f
: format of input file; Default is 'AUTO' which will allow MACS to decide the format automatically.-g
: mappable genome size which is defined as the genome size which can be sequenced; some precompiled values provided.
Output arguments
--outdir
: MACS2 will save all output files into speficied folder for this option-n
: The prefix string for output files-B/--bdg
: store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files
Shifting model arguments
-s
: size of sequencing tags. Default, MACS will use the first 10 sequences from your input treatment file to determine it--bw
: The bandwidth which is used to scan the genome ONLY for model building. Can be set to the expected sonication fragment size.--mfold
: upper and lower limit for model building
Peak calling arguments
Meshcatchrelaoded Mac Os X
-q
: q-value (minimum FDR) cutoff-p
: p-value cutoff (instead of q-value cutoff)--nolambda
: do not consider the local bias/lambda at peak candidate regions--broad
: broad peak calling
NOTE: Relaxing the q-value does not behave as expected in this case since it is partially tied to peak widths. Ideally, if you relaxed the thresholds, you would simply get more peaks but with MACS2 relaxing thresholds also results in wider peaks.
Now that we have a feel for the different ways we can tweak our command, let's set up the command for our run on Nanog-rep1:
The tool is quite verbose so you should see lines of text being printed to the terminal, describing each step that is being carried out. If that runs successfully, go ahead and re-run the same command but this time let's capture that information into a log file using 2>
to re-direct the stadard error to file:
Ok, now let's do the same peak calling for the rest of our samples:
MACS2 Output files
File formats
Before we start exploring the output of MACS2, we'll briefly talk about the new file formats you will encounter.
narrowPeak:
A narrowPeak (.narrowPeak) file is used by the ENCODE project to provide called peaks of signal enrichment based on pooled, normalized (interpreted) data. It is a BED 6+4 format, which means the first 6 columns of a standard BED file with 4 additional fields:
WIG format:
Wiggle format (WIG) allows the display of continuous-valued data in a track format. Wiggle format is line-oriented. It is composed of declaration lines and data lines, and require a separate wiggle track definition line. There are two options for formatting wiggle data: variableStep and fixedStep. These formats were developed to allow the file to be written as compactly as possible.
BedGraph format:
The BedGraph format also allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data. This track type is similar to the wiggle (WIG) format, but unlike the wiggle format, data exported in the bedGraph format are preserved in their original state. For the purposes of visualization, these can be interchangeable.
MACS2 output files
Let's first move the log files to the log
directory:
Now, there should be 6 files output to the results directory for each of the 4 samples, so a total of 24 files:
_peaks.narrowPeak
: BED6+4 format file which contains the peak locations together with peak summit, pvalue and qvalue_peaks.xls
: a tabular file which contains information about called peaks. Additional information includes pileup and fold enrichment_summits.bed
: peak summits locations for every peak. To find the motifs at the binding sites, this file is recommended_model.R
: an R script which you can use to produce a PDF image about the model based on your data and cross-correlation plot_control_lambda.bdg
: bedGraph format for input sample_treat_pileup.bdg
: bedGraph format for treatment sample
Let's first obtain a summary of how many peaks were called in each sample. We can do this by counting the lines in the .narrowPeak
files:
We can also generate plots using the R script file that was output by MACS2. There is a _model.R
script in the directory. Let's load the R module and run the R script in the command line using the Rscript
command as demonstrated below:
NOTE: We need to load the gcc/6.2.0
before loading R. You can find out which modules need to be loaded first by using module spider R/3.4.1`
Now you should see a pdf file in your current directory by the same name. Create the plots for each of the samples and move them over to your laptop using Filezilla
.
Open up the pdf file for Nanog-rep1. The first plot illustrates the distance between the modes from which the shift size was determined.
The second plot is the cross-correlation plot. This is a graphical representation of the Pearson correlation of positive- and negative- strand tag densities, shifting the strands relative to each other by increasing distance. We will talk about this in more detail in the next lesson.
NOTE:SPP is another very commonly used tool for narrow peak calling. While we will not be going through the steps for this peak caller in this workshop, we do have a lesson on SPP that we encourage you to browse through if you are interested in learning more.
This lesson has been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.