shen2
This user hasn't shared any profile information
shen2
购买了Linode 512主机,架了一个VPN翻墙
1两年前买的HostMonster主机快要到期了,准备物色新主机。由于需求越来越复杂,我需要一个可以完全自己配置的环境,于是基于Xen的VPS主机Linode成了最佳选择,再加上它最近开通了东京数据中心,从国内ping只有120ms,绝对是翻墙越货最佳之选。
花了一个小时就搞定了信用卡支付,ubuntu系统的安装,nginx/php5/mysql/phpmyadmin安装,在国内的独立服务器上已经干过无数次了。基本上都快成为本能了。
值得一提的是在东京机房使用ubuntu的apt-get,需要修改一下源。日本最快的源是jp.archive.ubuntu.com,批量将us.archive.ubuntu.com替换成jp.archive.ubuntu.com就可以了。
接下来,就是配置VPN了,这才是VPS主机最大的价值。
- 先安装pptpd:
apt-get install pptpd - 配置pptpd:
修改/etc/pptpd.conf配置文件,将以下两行的注释去掉
localip 192.168.0.1
remoteip 192.168.0.201-245
- 修改/etc/ppp/options配置文件,根据服务器所在的网络设置DNS服务器,我的Linode东京节点的DNS服务器设置是:
ms-dns 106.187.34.20
ms-dns 106.187.35.20
ms-dns 106.187.36.20
我后来发现似乎不设置DNS也可以,系统会直接使用服务器/etc/resolve.conf中的设置。 - 修改/etc/ppp/chap-secrets文件,设置密码
username pptpd password *
至此vpn已经可以直接拨通了,注意,在ubuntu上连接VPN要把Use Point-to-Point encryption (MPPE)的选项勾上。 - 修改/etc/sysctl.conf
net.ipv4.ip_forward=1 - 修改/etc/rc.local,加入:
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -o eth0 -j MASQUERADE
然后重启,就大功告成啦!twitter/facebook的打开速度嗷嗷快。
在header信息中隐藏php信息
0在php渲染的网页header信息中,会包含php的版本号信息,比如: X-Powered-by: php/5.3.3,这有些不安全,有些黑客可能采用扫描的方式,批量寻找低版本的php服务器,利用php漏洞(比如最近流行的hash冲突)来攻击服务器。
php.ini中有一个选项可以控制是否暴露这个信息,那就是:
expose_php = On
默认值是On,改成Off之后,就不会显示php版本信息啦。
或者在php-fpm的pool配置文件中设置:
php_admin_flag[expose_php] = off
重负荷nginx的几个关键配置参数
1不知不觉网站PV就爆发了。nginx压力越来越大,一些默认参数就显得不够用了。
我们的主服务器硬件配置非常健壮(双路至强5620 + 48GB内存 + SSD),理论上可以承受每天过500万的PV,当然,前提是优化得够好。
简单罗列一下优化过的几个参数:
- ulimit -n 65535这个参数位于/etc/default/nginx 中,或者/etc/init.d/nginx 文件中直接设置。
默认是1024,意思是最多打开的文件个数。1024怎么够,至少开到8192,网上很多文章都直接开到了65535。
- worker_processes 8;
worker数量,位于nginx.conf头部,一般来说有几个cpu核心开几个,不算超线程。 - worker_rlimit_nofile 65535;
位于nginx.conf头部,也是文件数量限制,直接开大吧。 - worker_connections 4096;
位于nginx.conf中,默认是1024,也不够。 - 另外,建议编译nginx的时候顺带编译status模块,以便监视性能。
location /status/ {
stub_status on;
}
小小炫耀一下服务器的status参数,想当年刚用nginx的时候,Writing参数不是0就是1,现在已经这么大了:
Active connections: 2140 server accepts handled requests 14727928 14727602 24524267 Reading: 92 Writing: 28 Waiting: 2020
用insert delayed减少阻塞时间
0高并发地进行insert经常导致阻塞很长时间。
有些场景下我们并不需要知道insert之后的结果。因此可以采用insert delayed方法。
insert delayed是myisam对与sql标准的扩展,在innodb引擎下并不支持。
insert delayed之后,系统会立刻返回结果,使用last_inserted_id方法不能获得insert delayed得到的主键ID
Tumblr的技术架构:150亿月PV,比twitter更难扩展
0With over 15 billion page views a month Tumblr has become an insanely popular blogging platform. Users may like Tumblr for its simplicity, its beauty, its strong focus on user experience, or its friendly and engaged community, but like it they do.
Growing at over 30% a month has not been without challenges. Some reliability problems among them. It helps to realize that Tumblr operates at surprisingly huge scales: 500 million page views a day, a peak rate of ~40k requests per second, ~3TB of new data to store a day, all running on 1000+ servers.
One of the common patterns across successful startups is the perilous chasm crossing from startup to wildly successful startup. Finding people, evolving infrastructures, servicing old infrastructures, while handling huge month over month increases in traffic, all with only four engineers, means you have to make difficult choices about what to work on. This was Tumblr’s situation. Now with twenty engineers there’s enough energy to work on issues and develop some very interesting solutions.
Tumblr started as a fairly typical large LAMP application. The direction they are moving in now is towards a distributed services model built around Scala, HBase, Redis, Kafka, Finagle, and an intriguing cell based architecture for powering their Dashboard. Effort is now going into fixing short term problems in their PHP application, pulling things out, and doing it right using services.
The theme at Tumblr is transition at massive scale. Transition from a LAMP stack to a somewhat bleeding edge stack. Transition from a small startup team to a fully armed and ready development team churning out new features and infrastructure. To help us understand how Tumblr is living this theme is startup veteran Blake Matheny, Distributed Systems Engineer at Tumblr. Here’s what Blake has to say about the House of Tumblr:
Site: http://www.tumblr.com/
Stats
- 500 million page views a day
- 15B+ page views month
- ~20 engineers
- Peak rate of ~40k requests per second
- 1+ TB/day into Hadoop cluster
- Many TB/day into MySQL/HBase/Redis/Memcache
- Growing at 30% a month
- ~1000 hardware nodes in production
- Billions of page visits per month per engineer
- Posts are about 50GB a day. Follower list updates are about 2.7TB a day.
- Dashboard runs at a million writes a second, 50K reads a second, and it is growing.
Software
- OS X for development, Linux (CentOS, Scientific) in production
- Apache
- PHP, Scala, Ruby
- Redis, HBase, MySQL
- Varnish, HA-Proxy, nginx,
- Memcache, Gearman, Kafka, Kestrel, Finagle
- Thrift, HTTP
- Func – a secure, scriptable remote control framework and API
- Git, Capistrano, Puppet, Jenkins
Hardware
- 500 web servers
- 200 database servers (many of these are part of a spare pool we pulled from for failures)
- 47 pools
- 30 shards
- 30 memcache servers
- 22 redis servers
- 15 varnish servers
- 25 haproxy nodes
- 8 nginx
- 14 job queue servers (kestrel + gearman)
Architecture
- Tumblr has a different usage pattern than other social networks.
- With 50+ million posts a day, an average post goes to many hundreds of people. It’s not just one or two users that have millions of followers. The graph for Tumblr users has hundreds of followers. This is different than any other social network and is what makes Tumblr so challenging to scale.
- #2 social network in terms of time spent by users. The content is engaging. It’s images and videos. The posts aren’t byte sized. They aren’t all long form, but they have the ability. People write in-depth content that’s worth reading so people stay for hours.
- Users form a connection with other users so they will go hundreds of pages back into the dashboard to read content. Other social networks are just a stream that you sample.
- Implication is that given the number of users, the average reach of the users, and the high posting activity of the users, there is a huge amount of updates to handle.
- Tumblr runs in one colocation site. Designs are keeping geographical distribution in mind for the future.
- Two components to Tumblr as a platform: public Tumblelogs and Dashboard
- Public Tumblelog is what the public deals with in terms of a blog. Easy to cache as its not that dynamic.
- Dashboard is similar to the Twitter timeline. Users follow real-time updates from all the users they follow.
- Very different scaling characteristics than the blogs. Caching isn’t as useful because every request is different, especially with active followers.
- Needs to be real-time and consistent. Should not show stale data. And it’s a lot of data to deal with. Posts are only about 50GB a day. Follower list updates are 2.7TB a day.
- Most users leverage Tumblr as tool for consuming of content. Of the 500+ million page views a day, 70% of that is for the Dashboard.
- Dashboard availability has been quite good. Tumblelog hasn’t been as good because they have a legacy infrastructure that has been hard to migrate away from. With a small team they had to pick and choose what they addressed for scaling issues.
Old Tumblr
- When the company started on Rackspace it gave each custom domain blog an A record. When they outgrew Rackspace there were too many users to migrate. This is 2007. They still have custom domains on Rackspace. They route through Rackspace back to their colo space using HAProxy and Varnish. Lots of legacy issues like this.
- A traditional LAMP progression.
- Historically developed with PHP. Nearly every engineer programs in PHP.
- Started with a web server, database server and a PHP application and started growing from there.
- To scale they started using memcache, then put in front-end caching, then HAProxy in front of the caches, then MySQL sharding. MySQL sharding has been hugely helpful.
- Use a squeeze everything out of a single server approach. In the past year they’ve developed a couple of backend services in C: an ID generator and Staircar, using Redis to power Dashboard notifications
- The Dashboard uses a scatter-gather approach. Events are displayed when a user access their Dashboard. Events for the users you follow are pulled and displayed. This will scale for another 6 months. Since the data is time ordered sharding schemes don’t work particularly well.
New Tumblr
- Changed to a JVM centric approach for hiring and speed of development reasons.
- Goal is to move everything out of the PHP app into services and make the app a thin layer over services that does request authentication, presentation, etc.
- Scala and Finagle Selection
- Internally they had a lot of people with Ruby and PHP experience, so Scala was appealing.
- Finagle was a compelling factor in choosing Scala. It is a library from Twitter. It handles most of the distributed issues like distributed tracing, service discovery, and service registration. You don’t have to implement all this stuff. It just comes for free.
- Once on the JVM Finagle provided all the primitives they needed (Thrift, ZooKeeper, etc).
- Finagle is being used by Foursquare and Twitter. Scala is also being used by Meetup.
- Like the Thrift application interface. It has really good performance.
- Liked Netty, but wanted out of Java, so Scala was a good choice.
- Picked Finagle because it was cool, knew some of the guys, it worked without a lot of networking code and did all the work needed in a distributed system.
- Node.js wasn’t selected because it is easier to scale the team with a JVM base. Node.js isn’t developed enough to have standards and best practices, a large volume of well tested code. With Scala you can use all the Java code. There’s not a lot of knowledge of how to use it in a scalable way and they target 5ms response times, 4 9s HA, 40K requests per second and some at 400K requests per second. There’s a lot in the Java ecosystem they can leverage.
- Internal services are being shifted from being C/libevent based to being Scala/Finagle based.
- Newer, non-relational data stores like HBase and Redis are being used, but the bulk of their data is currently stored in a heavily partitioned MySQL architecture. Not replacing MySQL with HBase.
- HBase backs their URL shortner with billions of URLs and all the historical data and analytics. It has been rock solid. HBase is used in situations with high write requirements, like a million writes a second for the Dashboard replacement. HBase wasn’t deployed instead of MySQL because they couldn’t bet the business on HBase with the people that they had, so they started using it with smaller less critical path projects to gain experience.
- Problem with MySQL and sharding for time series data is one shard is always really hot. Also ran into read replication lag due to insert concurrency on the slaves.
- Created a common services framework.
- Spent a lot of time upfront solving operations problem of how to manage a distributed system.
- Built a kind of Rails scaffolding, but for services. A template is used to bootstrap services internally.
- All services look identical from an operations perspective. Checking statistics, monitoring, starting and stopping all work the same way for all services.
- Tooling is put around the build process in SBT (a Scala build tool) using plugins and helpers to take care of common activities like tagging things in git, publishing to the repository, etc. Most developers don’t have to get in the guts of the build system.
- Front-end layer uses HAProxy. Varnish might be hit for public blogs. 40 machines.
- 500 web servers running Apache and their PHP application.
- 200 database servers. Many database servers are used for high availability reasons. Commodity hardware is used an the MTBF is surprisingly low. Much more hardware than expected is lost so there are many spares in case of failure.
- 6 backend services to support the PHP application. A team is dedicated to develop the backend services. A new service is rolled out every 2-3 weeks. Includes dashboard notifications, dashboard secondary index, URL shortener, and a memcache proxy to handle transparent sharding.
- Put a lot of time and effort and tooling into MySQL sharding. MongoDB is not used even though it is popular in NY (their location). MySQL can scale just fine..
- Gearman, a job queue system, is used for long running fire and forget type work.
- Availability is measured in terms of reach. Can a user reach custom domains or the dashboard? Also in terms of error rate.
- Historically the highest priority item is fixed. Now failure modes are analyzed and addressed systematically. Intention is to measure success from a user perspective and an application perspective. If part of a request can’t be fulfilled that is account for
- Initially an Actor model was used with Finagle, but that was dropped. For fire and forget work a job queue is used. In addition, Twitter’s utility library contains a Futures implementation and services are implemented in terms of futures. In the situations when a thread pool is needed futures are passed into a future pool. Everything is submitted to the future pool for asynchronous execution.
- Scala encourages no shared state. Finagle is assumed correct because it’s tested by Twitter in production. Mutable state is avoided using constructs in Scala or Finagle. No long running state machines are used. State is pulled from the database, used, and writte n back to the database. Advantage is developers don’t need to worry about threads or locks.
- 22 Redis servers. Each server has 8 – 32 instances so 100s of Redis instances are used in production.
- Used for backend storage for dashboard notifications.
- A notification is something like a user liked your post. Notifications show up in a user’s dashboard to indicate actions other users have taken on their content.
- High write ratio made MySQL a poor fit.
- Notifications are ephemeral so it wouldn’t be horrible if they were dropped, so Redis was an acceptable choice for this function.
- Gave them a chance to learn about Redis and get familiar with how it works.
- Redis has been completely problem free and the community is great.
- A Scala futures based interface for Redis was created. This functionality is now moving into their Cell Architecture.
- URL shortener uses Redis as the first level cache and HBase as permanent storage.
- Dashboard’s secondary index is built around Redis.
- Redis is used as Gearman’s persistence layer using a memcache proxy built using Finagle.
- Slowly moving from memcache to Redis. Would like to eventually settle on just one caching service. Performance is on par with memcache.
Internal Firehose
- Internally applications need access to the activity stream. An activity steam is information about users creating/deleting posts, liking/unliking posts, etc. A challenge is to distribute so much data in real-time. Wanted something that would scale internally and that an application ecosystem could reliably grow around. A central point of distribution was needed.
- Previously this information was distributed using Scribe/Hadoop. Services would log into Scribe and begin tailing and then pipe that data into an app. This model stopped scaling almost immediately, especially at peak where people are creating 1000s of posts a second. Didn’t want people tailing files and piping to grep.
- An internal firehose was created as a message bus. Services and applications talk to the firehose via Thrift.
- LinkedIn’s Kafka is used to store messages. Internally consumers use an HTTP stream to read from the firehose. MySQL wasn’t used because the sharding implementation is changing frequently so hitting it with a huge data stream is not a good idea.
- The firehose model is very flexible, not like Twitter’s firehose in which data is assumed to be lost.
- The firehose stream can be rewound in time. It retains a week of data. On connection it’s possible to specify the point in time to start reading.
- Multiple clients can connect and each client won’t see duplicate data. Each client has a client ID. Kafka supports a consumer group idea. Each consumer in a consumer group gets its own messages and won’t see duplicates. Multiple clients can be created using the same consumer ID and clients won’t see duplicate data. This allows data to be processed independently and in parallel. Kafka uses ZooKeeper to periodically checkpoint how far a consumer has read.
Cell Design for Dashboard Inbox
- The current scatter-gather model for providing Dashboard functionality has very limited runway. It won’t last much longer.
- The solution is to move to an inbox model implemented using a Cell Based Architecture that is similar to Facebook Messages.
- An inbox is the opposite of scatter-gather. A user’s dashboard, which is made up posts from followed users and actions taken by other users, is logically stored together in time order.
- Solves the scatter gather problem because it’s an inbox. You just ask what is in the inbox so it’s less expensive then going to each user a user follows. This will scale for a very long time.
- Rewriting the Dashboard is difficult. The data has a distributed nature, but it has a transactional quality, it’s not OK for users to get partial updates.
- The amount of data is incredible. Messages must be delivered to hundreds of different users on average which is a very different problem than Facebook faces. Large date + high distribution rate + multiple datacenters.
- Spec’ed at a million writes a second and 50K reads a second. The data set size is 2.7TB of data growth with no replication or compression turned on. The million writes a second is from the 24 byte row key that indicates what content is in the inbox.
- Doing this on an already popular application that has to be kept running.
- Cells
- A cell is a self-contained installation that has all the data for a range of users. All the data necessary to render a user’s Dashboard is in the cell.
- Users are mapped into cells. Many cells exist per data center.
- Each cell has an HBase cluster, service cluster, and Redis caching cluster.
- Users are homed to a cell and all cells consume all posts via firehose updates.
- Each cell is Finagle based and populates HBase via the firehose and service requests over Thrift.
- A user comes into the Dashboard, users home to a particular cell, a service node reads their dashboard via HBase, and passes the data back.
- Background tasks consume from the firehose to populate tables and process requests.
- A Redis caching layer is used for posts inside a cell.
- Request flow: a user publishes a post, the post is written to the firehose, all of the cells consume the posts and write that post content to post database, the cells lookup to see if any of the followers of the post creator are in the cell, if so the follower inboxes are updated with the post ID.
- Advantages of cell design:
- Massive scale requires parallelization and parallelization requires components be isolated from each other so there is no interaction. Cells provide a unit of parallelization that can be adjusted to any size as the user base grows.
- Cells isolate failures. One cell failure does not impact other cells.
- Cells enable nice things like the ability to test upgrades, implement rolling upgrades, and test different versions of software.
- The key idea that is easy to miss is: all posts are replicated to all cells.
- Each cell stores a single copy of all posts. Each cell can completely satisfy a Dashboard rendering request. Applications don’t ask for all the post IDs and then ask for the posts for those IDs. It can return the dashboard content for the user. Every cell has all the data needed to fulfill a Dashboard request without doing any cross cell communication.
- Two HBase tables are used: one that stores a copy of each post. That data is small compared to the other table which stores every post ID for every user within that cell. The second table tells what the user’s dashboard looks like which means they don’t have to go fetch all the users a user is following. It also means across clients they’ll know if you read a post and viewing a post on a different device won’t mean you read the same content twice. With the inbox model state can be kept on what you’ve read.
- Posts are not put directly in the inbox because the size is too great. So the ID is put in the inbox and the post content is put in the cell just once. This model greatly reduces the storage needed while making it simple to return a time ordered view of an users inbox. The downside is each cell contains a complete copy of call posts. Surprisingly posts are smaller than the inbox mappings. Post growth per day is 50GB per cell, inbox grows at 2.7TB a day. Users consume more than they produce.
- A user’s dashboard doesn’t contain the text of a post, just post IDs, and the majority of the growth is in the IDs.
- As followers change the design is safe because all posts are already in the cell. If only follower posts were stored in a cell then cell would be out of date as the followers changed and some sort of back fill process would be needed.
- An alternative design is to use a separate post cluster to store post text. The downside of this design is that if the cluster goes down it impacts the entire site. Using the cell design and post replication to all cells creates a very robust architecture.
- A user having millions of followers who are really active is handled by selectively materializing user feeds by their access model (see Feeding Frenzy).
- Different users have different access models and distribution models that are appropriate. Two different distribution modes: one for popular users and one for everyone else.
- Data is handled differently depending on the user type. Posts from active users wouldn’t actually be published, posts would selectively materialized.
- Users who follow millions of users are treated similarly to users who have millions of followers.
- Cell size is hard to determine. The size of cell is the impact site of a failure. The number of users homed to a cell is the impact. There’s a tradeoff to make in what they are willing to accept for the user experience and how much it will cost.
- Reading from the firehose is the biggest network issue. Within a cell the network traffic is manageable.
- As more cells are added cells can be placed into a cell group that reads from the firehose and then replicates to all cells within the group. A hierarchical replication scheme. This will also aid in moving to multiple datacenters.
On Being a Startup in New York
- NY is a different environment. Lots of finance and advertising. Hiring is challenging because there’s not as much startup experience.
- In the last few years NY has focused on helping startups. NYU and Columbia have programs for getting students interesting internships at startups instead of just going to Wall Street. Mayor Bloomberg is establishing a local campus focused on technology.
Team Structure
- Teams: infrastructure, platform, SRE, product, web ops, services.
- Infrastructure: Layer 5 and below. IP address and below, DNS, hardware provisioning.
- Platform: core app development, SQL sharding, services, web operations.
- SRE: sits between service team and web ops team. Focused on more immediate needs in terms of reliability and scalability.
- Service team: focuses on things that are slightly more strategic, that are a month or two months out.
- Web ops: responsible for problem detection and response, and tuning.
Software Deployment
- Started with a set of rsync scripts that distributed the PHP application everywhere. Once the number of machines reached 200 the system started having problems, deploys took a long time to finish and machines would be in various states of the deploy process.
- The next phase built the deploy process (development, staging, production) into their service stack using Capistrano. Worked for services on dozens of machines, but by connecting via SSH it started failing again when deploying to hundreds of machines.
- Now a piece of coordination software runs on all machines. Based around Func from RedHat, a lightweight API for issuing commands to hosts. Scaling is built into Func.
- Build deployment is over Func by saying do X on a set of hosts, which avoids SSH. Say you want to deploy software on group A. The master reaches out to a set of nodes and runs the deploy command.
- The deploy command is implemented via Capistrano. It can do a git checkout or pull from the repository. Easy to scale because they are talking HTTP. They like Capistrano because it supports simple directory based versioning that works well with their PHP app. Moving towards versioned updates, where each directory contains a SHA so it’s easy to check if a version is correct.
- The Func API is used to report back status, to say these machines have these software versions.
- Safe to restart any of their services because they’ll drain off connections and then restart.
- All features run in dark mode before activation.
Development
- Started with the philosophy that anyone could use any tool that they wanted, but as the team grew that didn’t work. Onboarding new employees was very difficult, so they’ve standardized on a stack so they can get good with those, grow the team quickly, address production issues more quickly, and build up operations around them.
- Process is roughly Scrum like. Lightweight.
- Every developer has a preconfigured development machine. It gets updates via Puppet.
- Dev machines can roll changes, test, then roll out to staging, and then roll out to production.
- Developers use vim and Textmate.
- Testing is via code reviews for the PHP application.
- On the service side they’ve implemented a testing infrastructure with commit hooks, Jenkins, and continuous integration and build notifications.
Hiring Process
- Interviews usually avoid math, puzzles, and brain teasers. Try to ask questions focused on work the candidate will actually do. Are they smart? Will they get stuff done? But measuring “gets things done” is difficult to assess. Goal is to find great people rather than keep people out.
- Focused on coding. They’ll ask for sample code. During phone interviews they will use Collabedit to write shared code.
- Interviews are not confrontational, they just want to find the best people. Candidates get to use all their tools, like Google, during the interview. The idea is developers are at their best when they have tools so that’s how they run the interviews.
- Challenge is finding people that have the scaling experience they require given Tumblr’s traffic levels. Few companies in the world are working on the problems they are.
- Example, for a new ID generator they needed A JVM process to generate service responses in less the 1ms at a rate at 10K requests per second with a 500 MB RAM limit with High Availability. They found the serial collector gave the lowest latency for this particular work load. Spent a lot of time on JVM tuning.
- On the Tumblr Engineering Blog they’ve posted memorials giving their respects for the passing of Dennis Ritchie & John McCarthy. It’s a geeky culture.
Lessons learned
- Automation everywhere.
- MySQL (plus sharding) scales, apps don’t.
- Redis is amazing.
- Scala apps perform fantastically.
- Scrap projects when you aren’t sure if they will work.
- Don’t hire people based on their survival through a useless technological gauntlet. Hire them because they fit your team and can do the job.
- Select a stack that will help you hire the people you need.
- Build around the skills of your team.
- Read papers and blog posts. Key design ideas like the cell architecture and selective materialization were taken from elsewhere.
- Ask your peers. They talked to engineers from Facebook, Twitter, LinkedIn about their experiences and learned from them. You may not have access to this level, but reach out to somebody somewhere.
- Wade, don’t jump into technologies. They took pains to learn HBase and Redis before putting them into production by using them in pilot projects or in roles where the damage would be limited.
I’d like to thank Blake very much for the interview. He was very generous with his time and patient with his explanations. Please contact me if you would like to talk about having your architecture profiled.
mysql查询中利用索引的机制
5今天遇到一个奇怪的问题,明明已经建立了索引,select语句的explain也表明会利用这个索引,可是结果偏偏没有用索引,最后扫描了全表。
两个结构完全一样的sql语句:
sql1: select * from table where col_a = 123 and col_b in (‘foo’,'bar’) order by id desc;
sql2: select * from table where col_a = 456 and col_b in (‘foo’,'bar’) order by id desc;
结果sql1选择利用了col_a的索引,速度很快,sql2利用了主键ID的索引,扫描了全表(40w行)。
仔细分析,发现数据库中,col_a=456的记录数有近1万条,而col_a=123的记录数只有几条。
于是就清楚了,mysql选择索引不仅仅依据查询结构和索引结构,还会根据索引大概估算选择每种索引的数据量,然后选择他认为最快的索引。
可能是主键索引会比普通index更快,所以mysql最后选择了数据量跟大的id索引。
那么,如何解决这个问题呢?
很简单,只要在order语句里写多个键即可,比如:order by col_a, id desc
如何知道header中的Cache-Control信息
0在nginx的fastcgi_params配置文件中增加一行。
fastcgi_param HTTP_CACHE_CONTROL $upstream_http_cache_control;
然后就可以在$_SERVER['HTTP_CACHE_CONTROL']中得到cache-control的值了。
这个有什么用?嘿嘿,你说呢?
jQuery1.6之前的版本,$.map不支持对象
0长期写php,让我几乎忘了序列数组和关联数组的区别。
对于一个JS对象,我想当然地就用了$.map。
后来才发现,1.6之前的版本,map只能接受Array参数,不接受Object。
好吧,这么基本的功能,这么有用的函数,怎么1.6才引入。
prototype污染了javascript原生的Array对象导致for(in)出错
1前两天遇到一个问题,程序里写了类似这样的代码:
var arr = [1,2,3,4,5];
for(var i in arr){
…
}
在自己网站上都对,但是在有些别人网页上就是不行。
检查之后发现,for的时候除了1,2,3,4,5之外,还多出来一堆成员函数。
原来是他们的网页上用了prototype。prototype直接修改了Array.prototype,给Array加了一堆函数。终于明白为什么jQuery一直强调不要污染原生对象了。
解决办法就是老老实实地用 或者用jQuery的$.each
for(var i; i
考虑了良久,本博近日开始更新
2已经两年不写博客了。主要是因为我想低调一些,作为一个创业者,最重要的事情就是专注地做好自己的事情。作为一个产品经理,整天对其他人的产品评头论足,只会显出自己的无知。
前段时间和几个刚认识的朋友聊天,说起他们曾经看过一个叫沈二铺子的博客,但是后来不知道为什么就不更新了。当时我就震惊了~~~真的没想到这一亩三分地还真的有人看。
最近我越来越觉得技术的积累非常重要,尤其是对一个团队来说。一个人犯过的错误,不应该让第二个人再犯;一个人解决过的问题,不应该让第二个人再去解决。每当解决一个技术问题的时候,我就在想,这个坑,别人遇到一定也会栽。于是就特别想找个地方把这个问题记录下来,以免团队里的其他人再次遇到。不管是wiki,trac,google docs,微博,轻博客,都不是很合适,微博过于浮躁,表达能力有限,wiki又不适合分享和讨论。
我竟然忘了,其实我还有博客嘛。虽然说我一度想把自己积累的这些经验只在团队内部分享,但是想来想去,作为一个满怀理想的程序猿,拯救世界的技术宅,我希望能通过自己的努力,帮更多的程序猿少走弯路,多快好省地写程序,造福大众。所以我决定,无保留地分享我的技术心得。
特别声明,虽然我的博客绝大多数都会以技术讨论为主,但是我本人大部分的时间和精力其实都花在产品上,技术只是做产品经理的同时附加的产物,所有技术都是为产品做支撑的。只是我不会在博客里讨论产品,想聊产品的人,欢迎到中关村来请我吃饭。
最新评论