Linux ABI
Motivation:
When I read:
./linux/types.h
, there is comment saying that:
aligned u64 should be used in defining kernel<->userspace ABIs to avoid common 332/64-bit comapt problems.
What is ABI:
It stands for Application Binary Interface.
ABI is important when it comes to application that use external libraries. If a program is built to use a particular library and that library is later updated, you don't want to have to re-compile that application (and from the end-user's standpoint, you may not have the source).
If the updated library uses the same ABI, then your program will not need to change.
The interface to the library (which is all your program really cares about) is the same even though the internal workings may have changed. Tow versions of a library that have the same ABI are sometimes called "binary-compatible" since they have the same low-level interface (you should be able to replace the old version with the new one and not have any major problems).
Sometimes, ABI changes are unavoidable. When this happens, any programs that use that library will not work unless they are re-comppiled to use the new version of the library. If ABI changes but the API does not, then the old and new library versions are sometimes called "source compatible". This implies that while a program compiled for one library version will not work with the other, source code written for one will work for the other if re-compiled.
For this reason, library writers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types and order of arguments), definitions of data types or data structures (return type and number, types, and order of argumenmts), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If you expand, say, a 16-bit data structure field into a 32-bit field, then already-compiled code that uses that data structure members gets converted into memory addresses and offsets during compilation and if the dataaaa structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.
That's why there is comment again:
aligned u64 should be used in defininng kernel<->userspace ABIs to avoid common 32/64-bit compat problems.
Let's keep going.
ABI is not necessarily something you will explicitly provide unless you are expecting people to interface with your code using assembly. It is nooot language-specific either, since (for example) a C application and a Pascal application will use the same ABI afterr they are compiled.
Forther More:
Regarding ABI is regarding ELF file format. The reason this information is included is because ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets.
This is how the application communicates important information about itself to the operating system. If you build a program in a not be able to interpret the binary file or run the application. This is one big either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.
All Comment:
aligned u64 should be used in defining kernel<->userspace ABIs to avoid common 32/64-bit compat problems.
64-bit values align to 4-byte boundaries on x86_32 (and possibly other architectures) and to 8-byte boundaries on 64-bit architectures. The new aligned_64 type enforce 8-byte alignment so that structs containing aligned 64 values have the same alignment on 32 bit and 64 bit architectures.
No conversions are necessary between 32 bit user-space and a 64 bit kernel.
Dynamic Memory Allocation - Jemalloc
What I want to know is not how to install, use or whatever. I just want to know how it works.
Deep Dive into Neutron
Neutron ML2
The Modular Layer2 (ml2) plugin is a framework allowing OpenStack Networking to simultaneusly utilze the variety of layer2 networking technology found in complex real-world data centres.
It currently works with the existing
- openvswitch
- linuxbridge
- hyperv
like:
[ Neutron Server {ML2 Plugin} ]
|_ [ Host A {linuxbridge agent} ]
|_ [ Host B{hyperv agent} ]
|_ [ Host C{openvswitch agent} ]
|_ [ Host D{openvswitch agent} ]
* Exsiting ML2 Plugin works with existing agents
* Separate agents for linuxbridge, openvswitch, and hyperv.
L2 agents, and is intended to
- replace
- deprecate
like:
[ Neutron Server {ML2 Plugin} ]
|_ [ Host A {modular agent} ]
|_ [ Host B {modular agent} ]
|_ [ Host C {modular agent} ]
|_ [ Host D {modular agent} ]
* Combine open source agents
* Have a single agent which can support linuxbridge and openvswitch
* Pluggable drivers for additional vswitches, infiniband, sr-iov, etc
The monolithic plugins associated with those L2 agents. The ml2 framework is also intended to greatly simplify adding support for new L2 networking technologies, requiring much less intial and ongoing effort than would be required to add a new monolithic core plugin. A modular agent may be developed as a follow-on effort.
Architecture
ML2 Plugin {
TypeDriver {
},
MechanismDriver {
OpenvSwitch, Hyper-V, OpenDaylight, Arista, Cisco Nexus
}
ExtensionDriver
}
TypeDriver
Each available network type is managed by an ml2 TypeDriver. TypeDrivers maintain any needed type-specific network state, and perform provider network validation and tenant network allocation.
The ml2 plugin currently includes drivers for:
// ./neutron/plugins/ml2/driver_api.py
38 class TypeDriver(object):
....
56 def get_type(self):
Get driver's network type.
....
64 def initialize(self):
Perform driver initialization
....
74 def is_partial_segment(self, segment):
Return true if segment is a patially specified segment
....
82 def validate_provider_segment(self, segment):
Validate attributes of a provider network segment
....
102 def reserve_provider_segment(self, session, segment):
Reeserve resource associated with a provider network segment
....
117 def allocate_tenant_segment(self, session):
Allocate resource for a new tenant network segment
....
133 def release_segment(self, session, segment):
Release network segment
....
147 def get_mtu(self, physical):
Get driver's network MTU
Mechanism Drivers
Each networking mechanism is managed by an mll2 MechanismDrivers. The MechanismDriver is responsible for taking the information established by the TypeDriver and ensuring that it is properly applied given the specific networking mechanisms that have been enabled.
The MechanismDriver interface currently supports the:
- creatin
- update
- deletion
of network and port resources. For every action that can be taken on a resource, the mechanism driver exposes two methods:
- ACTION_RESOURCE_percommit
- ACTION_RESOURCE_postcommit
The precommit method:
Used by mechanism drivers to validate the action being taken and make any required changes to the mechanism driver's private database.
Should not block, and therefore cannot communicate with anything outside of Neutron.
The postcommit method:
Is responsible for appropriately pushing the change to the resource to the entity responsible for applying that change.
// ./neutron/plugins/ml2/driver_api.py
548 class MechanismDriver(object):
....
570 def initialize(self):
Perform driver initialization
....
579 def create_network_precommit(self, context):
Allocate resources for a new network
....
592 def create_network_postcommit(self, context):
Create a network
....
605 def update_network_precommit(self, context):
Update resources of a network
....
623 def update_network_postcommit(self, context):
Update a network
....
641 def delete_network_precommit(self, context):
Delete resources for a network
....
655 def delete_network_postcommit(self, context):
Delete a network
....
669 def create_subnet_precommit(self, context):
Allocate resources for a new subnet
....
682 def create_subnet_postcommit(self, context):
Create a subnet
....
695 def update_subnet_precommit(self, context):
Update resources of a subnet
....
713 def update_subnet_postcommit(self, context):
Update a subnet
....
731 def delete_subnet_precommit(self, context):
Delete resources for a subnet
....
745 def delete_subnet_postcommit(self, context):
Delete a subnet
....
759 def create_port_precommit(self, context):
Allocate resources for a new port
....
771 def create_port_postcommit(self, context):
Create a port
....
783 def update_port_precommit(self, context):
Update resources of a port
....
800 def update_port_postcommit(self, context):
Update a port
....
818 def delete_port_precommit(self, context):
Delete resources of a port
....
830 def delete_port_postcommit(self, context):
Delete a port
....
844 def bind_port(self, context):
Attempt to bind a port
....
882 def check_vlan_transparency(self, context):
Check if the network supports vlan transparency
ExtensionDriver
An extension driver extends the core resources implemented by the ML2 plugin with additional atributes.
// ./neutron/plugins/ml2/driver_api.py
893 class ExtensionDriver(object):
....
905 def initialize(self):
Perform driver initialization
....
915 def extension_alias(self):
Supported extension alias
....
923 def process_create_network(self, plugin_context, data, result):
Process extended attributes for create network
....
937 def process_create_subnet(self, plugin_context, data, result):
Process extended attributes for create subnet
....
951 def process_create_port(self, plugin_context, data, result):
Process extended attributes for create port
....
965 def process_update_network(self, plugin_context, data, result):
Process extended attributes for update network
....
979 def process_update_subnet(self, plugin_context, data, result):
Process extended attributes for update subnet
....
993 def process_update_port(self, plugin_context, data, result):
Process extended attributes for update port
....
1007 def extend_network_dict(self, session, base_model, result):
Add extended attributes to network dictionary
....
1021 def extend_subnet_dict(self, session, base_model, result):
Add extened attributes to subnet dictionary
....
1035 def extend_port_dict(self, session, base_model, result):
Add extended attributes to subnet dictionary
How Plugin Loaded?
// ./neutron/plugins/ml2/managers.py
In TypeManager class, each function in driver_api.py is called:
39 class TypeManager(stevedore.named.NamedExtensionManager):
53 self._check_tenant_network_types(cfg.CONF.ml2.tenant_network_types)
54 self._check_external_network_type(cfg.CONF.ml2.external_network_type)
58 network_type = ext.obj.get_type()
170 driver.obj.initialize()
204 return driver.obj.is_partial_segment(segment)
213 driver.obj.validate_provider_segment(segment)
221 return driver.obj.reserve_provider_segment(session, segment)
225 return driver.obj.allocate_tenant_segment(session)
249 driver.obj.release_segment(session, segment)
In MechanismManager class, each function in driver_api.py is called:
284 class MechanismManager(stevedore.named.NamedExtensionManager):
318 driver.obj.initialize()
364 def create_network_precommit(self, context):
378 def create_network_postcommit(self, context):
392 def update_network_precommit(self, context):
405 def update_network_postcommit(self, context):
419 def delete_network_precommit(self, context):
490 def update_subnet_postcommit(self, context):
504 def delete_subnet_precommit(self, context):
517 def delete_subnet_postcommit(self, context):
535 def create_port_precommit(self, context):
562 def update_port_precommit(self, context):
589 def delete_port_precommit(self, context):
602 def delete_port_postcommit(self, context):
620 def bind_port(self, context):
Let's focus on TypeManager class.
To be continued...
How does Ceph OSD Handle the Request from Client
It's good starting point where you completely and perfectly understand how Ceph works.
This page will be update soon to put more perfect explanation coming from more analisys.
/*
* Operation started.
*/
#./src/osd/OSD.cc
5676 bool OSD::dispatch_op_fast(OpRequestRef& op, OSDMapRef& osdmap)
5677 {
...
5700 // client ops
5701 case CEPH_MSG_OSD_OP:
5702 handle_op(op, osdmap);
...
/*
* OSD started to handle request from client:
*/
7948 void OSD::handle_op(OpRequestRef& op, OSDMapRef& osdmap)
7949 {
...
8093 if (pg) {
8094 op->send_map_update = share_map.should_send;
8095 op->sent_epoch = m->get_map_epoch();
8096 enqueue_op(pg, op);
...
8160 void OSD::enqueue_op(PG *pg, OpRequestRef& op)
8161 {
...
8167 pg->queue_op(op);
#./src/osd/PG.cc
1835 void PG::queue_op(OpRequestRef& op)
1836 {
...
/*
* An OSD instance will be created but it must wait for OSD map.
*/
1847 osd->op_wq.queue(make_pair(PGRef(this), op));
/*
* An OSD looks for PG to read the data.
*/
#./src/osd/OSD.cc
8303 /*
8304 * NOTE: dequeue called in worker thread, with pg lock
8305 */
8306 void OSD::dequeue_op(
8307 PGRef pg, OpRequestRef op,
8308 ThreadPool::TPHandle &handle)
8309 {
...
8343 pg->do_request(op, handle);
/*
* Transaction will be created on the basis of context.
*/
#./src/osd/ReplicatedPG.cc
1238 void ReplicatedPG::do_request(
1239 OpRequestRef& op,
1240 ThreadPool::TPHandle &handle)
1241 {
...
1293 do_op(op); // do it now
1294 break;
...
1361 /** do_op - do an op
1362 * pg lock will be held (if multithreaded)
1363 * osd_lock NOT held.
1364 */
1365 void ReplicatedPG::do_op(OpRequestRef& op)
1366 {
...
1697 OpContext *ctx = new OpContext(op, m->get_reqid(), m->ops, obc, this);
...
1760 execute_ctx(ctx);
...
2218 void ReplicatedPG::execute_ctx(OpContext *ctx)
2219 {
...
2289 int result = prepare_transaction(ctx);
...
5668 int ReplicatedPG::prepare_transaction(OpContext *ctx)
5669 {
...
5680 // prepare the actual mutation
5681 int result = do_osd_ops(ctx, ctx->ops);
...
3348 int ReplicatedPG::do_osd_ops(OpContext *ctx, vector<OSDOp>& ops)
3349 {
...
3463 } else if (pool.info.require_rollback()) {
3464 ctx->pending_async_reads.push_back(
3465 make_pair(
3466 boost::make_tuple(op.extent.offset, op.extent.length, op.flags),
3467 make_pair(&osd_op.outdata, new FillInExtent(&op.extent.length))));
3468 dout(10) << " async_read noted for " << soid << dendl;
...
2094 ctx->reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0, false);
...
2345 // read or error?
2346 if (ctx->op_t->empty() || result < 0) {
...
2348 if (result == 0)
...
2351 if (ctx->pending_async_reads.empty()) {
...
2353 } else {
...
/*
* Read operation starts.
*/
2355 ctx->start_async_reads(this);
...
107 // OpContext
108 void ReplicatedPG::OpContext::start_async_reads(ReplicatedPG *pg)
109 {
110 inflightreads = 1;
111 pg->pgbackend->objects_read_async(
#./src/osd/ReplicatgedBackend.cc
277 void ReplicatedBackend::objects_read_async(
278 const hobject_t &hoid,
279 const list<pair<boost::tuple<uint64_t, uint64_t, uint32_t>,
280 pair<bufferlist*, Context*> > > &to_read,
281 Context *on_complete)
282 {
...
302 new AsyncReadCallback(r, on_complete)));
...
114 new OnReadComplete(pg, this));
...
93 struct OnReadComplete : public Context {
94 ReplicatedPG *pg;
95 ReplicatedPG::OpContext *opcontext;
96 OnReadComplete(
97 ReplicatedPG *pg,
98 ReplicatedPG::OpContext *ctx) : pg(pg), opcontext(ctx) {}
99 void finish(int r) {
100 if (r < 0)
101 opcontext->async_read_result = r;
102 opcontext->finish_read(pg);
103 }
104 ~OnReadComplete() {}
105 };
...
117 void ReplicatedPG::OpContext::finish_read(ReplicatedPG *pg)
118 {
119 assert(inflightreads > 0);
120 --inflightreads;
121 if (async_reads_complete()) {
122 assert(pg->in_progress_async_reads.size());
123 assert(pg->in_progress_async_reads.front().second == this);
124 pg->in_progress_async_reads.pop_front();
125 pg->complete_read_ctx(async_read_result, this);
126 }
127 }
...
5911 void ReplicatedPG::complete_read_ctx(int result, OpContext *ctx)
5912 {
5913 MOSDOp *m = static_cast<MOSDOp*>(ctx->op->get_req());
5914 assert(ctx->async_reads_complete());
...
5942 reply->set_result(result);
5943 reply->add_flags(CEPH_OSD_FLAG_ACK | CEPH_OSD_FLAG_ONDISK);
5944 osd->send_message_osd_client(reply, m->get_connection());
5945 close_op_ctx(ctx, 0);
5946 }
OpenStack Manual Installation - 4th Topic: To Build Cinder Component
Notice: This series is to expect to use kernel 3.10.0-229.
1.What Cinder provides.
It provides block storage which allows block devices to be exposed and connected to compute instances for expanded storage, better performance and integration with enterprise storage platforms.
It provides persistent block level storage devices for use with the OpenStack compute instances which can be exposed to applications as well.
The block storage system manages the creation, attaching and detaching of the block devices to servers. Block storage volumes are fully integrated into the OpenStack compute and the dashboard allowing for cloud users to manage their own storage requirements.
2. The capabilities.
The block storage system manages the creation, attaching and detaching of the block devices to servers.
Block storage is appropriate for performance sensitive scenarios such as database storage, expandable file systems, or providing a server with access to raw block level storage.
Snapshot management provides powerful functionality for backing up data stored on block stoage volumes.
3. Architecture.
1. cinder-api
To authenticates and routes requests throughout the block storage system.
2. cinder-scheduler
To schedule, route volume create requests to the appropriate volume service.
3. cinder-volume
To manage block storage devices, specifically the back-end devices themselves.
Cinder deployments will make use of a messaging queue to route information between the cinder processes as well as a database to store volume state.
4. Installation.
4.1. To install package
# yum -y install openstack-cinder
4.2. To do initial settting up.
# cp /etc/cinder/cinder.conf /etc/cinder/cinder.conf.org
# cp /usr/share/cinder/cinder-dist.conf /etc/cinder/cinder.conf
# source /root/keystonerc_admin
# openstack-db --init --service cinder --password cinder --rootpw cinder
# keystone user-create --name cinder --pass cinder
# keystone user-role-add --user cinder --type volume --description "OpenStack Block Storage Service"
# export CINDER_SERVICE_ID=`keystone service-list | grep cinder | awk '{print $2}'
/* Ip address should be cinder's */
# keystone endpoint-create \
--service-id ${CINDER_SERVICE_ID} \
--publicurl 'http://<ip address>/v1/%(tenant_id)s' \
--adminurl 'http://<ip address>/v1/%(tenant_id)s' \
--internalurl 'http://<ip address>/v1/%(tenant_id)s'
# keystone service-create --name=cinderv2 --type=volumev2 --description="Cinder Volume Service V2"
# export CINDER_SERVICE_ID=`keystone service-list | grep cinder | awk '{print $2}'
/* Ip address should be cinder's */
# keystone endpoint-create \
--service-id ${CINDER_SERVICE_ID} \
--publicurl 'http://<ip address>/v2/%(tenant_id)s' \
--adminurl 'http://<ip address>/v2/%(tenant_id)s' \
--internalurl 'http://<ip address>/v2/%(tenant_id)s'
4.3. To configure Cinder
# crudini --set /etc/cinder/cinder.conf keystone_authtoken admin_tenant_name services
# crudini --set /etc/cinder/cinder.conf keystone_authtoken admin_user cinder
# crudini --set /etc/cinder/cinder.conf keystone_authtoken admin_password cinder
# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_userid rabbitadmin
# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_password rabbitpass
/* Ip address should be message broker's */
# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_host <ip address>
# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_use_ssl True
# crudini --set /etc/cinder/cinder.conf DEFAULT rabbit_port 5671
4.4. To reflect configurations
# systemctl enable openstack-cinder-api
# systemctl enable openstack-cinder-scheduler
# systemctl enable openstack-cinder-volume
# openstack-service start cinder
# openstack-status