Skip navigation
All Places > Information Security > Project Sonar > Blog

Project Sonar

8 posts

Project Sonar started in September of 2013 with the goal of improving security through the active analysis of public networks. For the first few months, we focused almost entirely on SSL, DNS, and HTTP enumeration. This uncovered all sorts of interesting security issues and contributed to a number of advisories and research papers. The SSL and DNS datasets were especially good at identifying assets for a given organization, often finding systems that the IT team had no inkling of. At this point, we had a decent amount of automation in place, and decided to start the next phase of Project Sonar, scanning UDP services.


While we received a few opt-out requests for HTTP scans in the past, these were completely eclipsed by the number of folks requesting to be excluded after our UDP probes generated an alert on their IDS or firewall. Handling opt-out requests became a part-time job that was rotated across the Labs team. We tried and often succeeded at rolling out exclusions within a few minutes of receiving a request, but it came at the cost of getting other work done. At the end of the day, the value of the data, and our ability to improve public security depended on having consistent scan data across a range of services. As of mid-December, the number of opt-out requests has leveled off, and we had a chance to starting digging into the results.


There was some good news for a change:

  • VxWorks systems with an internet-exposed debug service have dropped from a peak of ~300,000 in 2010 to around ~61,000 in late 2014
  • Servers with the IPMI protocol exposed to the internet have dropped from ~250,000 in June to ~214,000 in December 2014
  • NTP daemons with monlist enabled have decreased somewhere between 25-50% (our data doesn't quite agree with ShadowServer's)


The bad news is that most of the other stats stayed relatively constant across six months:

  • Approximately 200,000 Microsoft SQL Servers are still responding to UDP pings and many of these are end-of-life versions
  • Over 15,000,000 devices expose SIP to the internet and about half of these are from a single vendor in a single region.


One odd trend was a consistent increase in the number of systems exposing NATPMP to the internet. This number has increased from just over 1 million in June to 1.3 million in December. Given that NATPMP is never supposed to be internet facing, this points to even more exposure 2015.


We conducted over 330 internet-wide UDP scans in 2014, covering 13 different UDP probes, and generating over 96 gigabytes of compressed scan data. All of this data is now immediately available along with a brand new wiki that documents what we scan and how to use the published data.


2015 is looking like a great year for security research!



Mozilla's Firefox and Thunderbird recently removed 1024-bit certificate authority (CA) certificates from their trusted store. This change was announced to the various certificate authorities in May of this year and shipped with Firefox 32 on September 2nd. This change was a long time coming, as the National Institute of Standards and Technology (NIST) recommended that 1024-bit RSA keys be deprecated in 2010 and disallowed after 2013. A blog post at provided a list of specific certificates that would no longer be trusted starting with Firefox 32.


There is a little disagreement that 1024-bit RSA keys may be cracked today by adversaries with the resources of nation states. As technology marches on, the security of 1024-bit keys will continue to deteriorate and become accessible by operators of relatively small clusters of commodity hardware. In the case of a CA key, the successful factoring of the RSA primes would allow an adversary to sign any certificate just as the CA in question would. This would allow impersonation of any "secure" web site, so long as the software you use still trusts these keys.


This is certainly a welcome change, but how many sites are going to be affected by the removal of these CA certificates, and, how many of these sites have certificates that aren't due to expire anytime soon? Fortunately there is a means to answer these questions.


In June of 2012, the University of Michigan began scanning the Internet and collecting SSL certificates from all sites that responded on port 443. At Rapid7, we started our own collection of certificates starting in September of 2013 as part of Project Sonar, and have been conducting weekly scans since.


Both sets of scans record the entire certificate chain, including the intermediate CA keys that Mozilla recently removed from the trusted store. We loaded approximately 150 scans into a Postgres database, resulting in over 65 million unique certificates, and started crunching the data.


The first question we wanted to answer, which is how many sites are affected, was relatively easy to determine. We searched the certificate chain for each of the roughly 20 million web sites we index to check if the SHA1 hashes listed in the blog post are present in the signing chain. After several minutes Postgres listed 107,535 sites that are using a certificate signed by the soon-to-be untrusted CA certificates. That is a relatively large number of sites and represents roughly half a percent of all of the web sites in our database.


The next question we wanted to explore was how long the 1024-bit CA key signed certificates would continue to be used. This proved to be informative and presents a clearer picture of the impact. We modified the first query and grouped the sites by the certificate expiration date, rounded to the start of the month. The monthly counts of affected sites, grouped by expiration date, demonstrated the full extent of the problem.


The resultant data, shown in part in the graph below, makes it clear that the problem isn't nearly as bad as the initial numbers indicated, since a great many of the certificates have already expired and the rest will do so over the next year. Surprisingly, over 13,000 web sites presented a certificate that expired in July of this year. Digging into these, we found that almost all of these had been issued to Vodafone and expired on July 1st. These expired certificates still appear to be in use today.



The graph below demonstrates that the majority of affected certificates have already expired and those that haven't expired are due to expire in the next year. We have excluded certificates from the graph that expired prior to 2013 for legibility.




While Mozilla's decision will affect a few sites, most of those that are active and affected have already expired, and shouldn't be trusted on that basis alone.


In summary, the repeal of trust for these certificates is a sound decision based upon NIST recommendations, and while it initially appeared that a great many sites would be affected, the majority of these sites either have expired certificates or a certificate that expires within the next year. We hope that Chrome and other browsers will also remove these certificates to remove the potential risk involved with these 1024-bit CA keys.


Going forward, we are now tracking the certificates presented by SMTP, IMAP, and POP services, and will keep an eye on those as the data rolls in. If you still use a 1024-bit RSA key for any other purpose, such as a Secure Shell (SSH) or PGP, it is past time to consider those obsolete and start rolling out stronger keys, of at least 2048 bits, and using ECC-based keys where available.


- Labs


Project Sonar: One Month Later

Posted by hdmoore Employee Oct 30, 2013

It has been a full month since we launched Project Sonar and I wanted to provide quick update about where things are, the feedback we have received, and where we are going from here.


We have received a ton of questions from interested contributors about the legal risk of internet-wide scanning. These risks are real, but differ widely by region, country, and type of scan. We can't provide legal advice, but we have obtained help from the illustrious Marcia Hofmann, who has written a great blog post describing the issues involved. As always, every situation is different, and we do recommend getting legal counsel before embarking on your own scans. If you have don't have the appetite (or budget) to hire a lawyer, you can still get involved on the research side by downloading and analyzing publicly available datasets from Scans.IO.


Currently, we are running regular scans for SSL certificates, IPv4 reverse DNS lookups, and most recently, HTTP GET requests. Our current challenge is automating the pipeline between the job scheduler and the final upload to the Scans.IO portal. We should have the process worked out and the new datasets publicly available in the next couple weeks. As the processing side improves, we will continue to add new protocols and types of probes to our recurring scans. If you have any ideas for what you would like to see covered, please leave a comment below, or get in touch via



Last month Rapid7 Labs launched Project Sonar, a community effort to improve internet security through widespread scanning and analysis of public-facing computer systems. Though this project, Rapid7 is actively running large-scale scans to create datasets, sharing that information with others in the security community, and offering tools to help them create datasets, too.


Others in the security field are doing similar work. This fall, a research team at the University of Michigan introduced ZMap, an open-source tool capable of scanning the entire IPv4 address space. Errata Security has also launched Masscan, which can scan the entire internet in three minutes.


These scans have great benefits—the more information we collect and share about the security of the internet, the better equipped everyone will be to fix the problems they discover.


Of course, this sort of ambitious security research raises some daunting questions. What are the legal implications of scanning all the public-facing computers on the internet? The answer—as with many legal questions involving technology—isn’t clear.


In the United States, the federal law most likely to come into play is the Computer Fraud and Abuse Act, a computer trespass statute. Several provisions of the CFAA are particularly relevant in the context of widespread scanning. They make it illegal to:


  • “intentionally access[] a computer without authorization or exceed[] authorized access, and thereby obtain[] . . . information from any protected computer[.]” § 1030(a)(2)(C). (Notably, this is the broadest provision in the statute and the one most frequently abused by overzealous prosecutors.)


  • “knowingly cause[] the transmission of a program, information, code, or command, and as a result of such conduct, intentionally cause[] damage without authorization, to a protected computer[.]” § 1030(a)(5)(A).


  • “intentionally access[] a protected computer without authorization, and as a result of such conduct, recklessly cause[] damage[.]” § 1030(a)(5)(B).


  • “intentionally access[] a protected computer without authorization, and as a result of such conduct, cause[] damage and loss[.]” § 1030(a)(5)(C).


The CFAA is both a criminal and civil statute. Violations can result in criminal prosecution, fines, and prison time. In addition, private parties harmed by violations can sue for money damages or injunctive relief (i.e., a court order forbidding or demanding certain behavior).


There are similar computer crime laws in many states, as well as other countries.


The exact contours of the CFAA are a mystery. While many of the law’s prohibitions hinge on accessing a computer “without authorization” or in a manner that “exceeds authorized access,” the law doesn’t clearly explain what these phrases mean. “Without authorization” isn’t defined at all. The term “exceeds authorized access” means “to access a computer with authorization and to use such access to obtain or alter information in the computer that the accesser is not entitled so to obtain or alter.” § 1030(e)(6). Unfortunately, the CFAA doesn’t say what it means to access a computer “with authorization,” so this definition also leaves a lot to be desired.


This lack of clarity creates a great deal of legal grey area. In certain troubling cases, the courts have found that accessing a public-facing computer can amount to a CFAA violation despite the fact that no technical barrier was breached. Just last fall, Andrew Auernheimer was convicted of conspiracy to violate the CFAA when another person ran a script to scrape iPad users’ email addresses from unsecured AT&T servers. That result is currently on appeal, and will hopefully be overturned. (Disclosure: I am a member of Auernheimer’s defense team.)


While legal uncertainty is a fact of life for security researchers, there are ways to reduce the risk of angering someone enough to make an issue of your research, as well as the possibility that a court might rule that your research has violated computer crime law. Rapid7 Labs, the ZMap research team, and Errata Security have all chosen to take certain steps to reduce the likelihood of legal trouble. The ZMap team has also published an excellent set of scanning best practices.


These researchers have:


  • Been transparent about the nature of their research and the public benefits of it. Network operators may not mind scans if they know who’s doing it and why.


  • Avoided research tactics that could cause disruption to someone else’s computer network. This decreases the likelihood of causing “damage” or “loss,” both of which are elements of certain CFAA offenses.


  • Respected exclusion requests from network operators who didn’t want their systems to be scanned.  One might also consider responding to exclusion requests by providing information about the public benefits of the community’s research and see if the requester still wants to be excluded.


The vague language of the CFAA and many state computer crime laws creates a great deal of room for interpretation. There are many questions about whether certain conduct is legal, and these questions do not have simple, straightforward answers.


Given the inherent ambiguity of computer crime statutes and the law in general, it’s never possible to know for sure that scanning public-facing computers won’t create legal problems for you. If you’re planning to contribute to Project Sonar, you should consult an attorney who can advise you about your particular situation.

I wanted share a brief example of using a full scan of IPv4 to estimate the exposure level of a vulnerability. Last week, Craig Young, a security researcher at Tripwire, wrote a blog post about a vulnerability in the ReadyNAS network storage appliance. In an interview with Threatpost, Craig mentioned that although Netgear produced a patch in July, a quick search via SHODAN indicates that many users are still vulnerable, leaving them exposed to any attacker who can diff the patched and unpatched firmware.


This seemed like a great opportunity to review our Project Sonar HTTP results and tease out recent exposure rates from a single-pass scan of the IPv4 internet. The first thing I did was a grab the patched and unpatched firmware, unzipped the archives, extracted them with binwalk, and ran a quick diff between the two. The vulnerability is obviously on line 17 and is the result of an attacker-supplied $section variable being interpreted as arbitrary Perl code. Given that the web server runs as root and Metasploit is quite capable of exploiting Perl code injection vulnerabilities, this seems like a low bar to exploit.


Identifying ReadyNAS devices ended up being fairly easy. In response to a GET request on port 80, a device will respond with the following static HTML.


<meta http-equiv="refresh" content="0; url=/shares/">



Copyright 2007, NETGEAR

All rights reserved.



Our scan data is in the form of base64-encoded responses stored as individual lines of JSON:


{"host": "A.B.C.D", "data": "SFRUUC8xLjAgNDAwIEJhZCB...", "port": 80}


I wrote a quick script to process this data via stdin, match ReadyNAS devices, and print out the IP address and Last-Modified date from the header of the response. I ran the raw scan output through this script and made some coffee. The result from our October 4th scan consisted of 3,488 lines of results. This is a little different than the numbers listed by SHODAN, but they can be explained by DHCP, multiple merged scans, and the fact that the ReadyNAS web interface is mostly commonly accessed over SSL on port 443. The results looked like the following:   Thu, 07 Oct 2010 00:53:51 GMT    Thu, 07 Oct 2010 00:53:51 GMT    Tue, 02 Jul 2013 01:42:23 GMT   Mon, 29 Aug 2011 23:04:43 GMT


The interesting part about the Last-Modified header is that it seems to correlate with specific firmware versions. Version 4.2.24 was built on July 2nd, 2013 and we can assume that all versions prior to that are unpatched.


$ cat readynas.txt  | perl -pe 's/\d+\.\d+\.\d+\.\d+\t//' | sort | uniq -c | sort -rn

    717 Tue, 02 Jul 2013 01:33:54 GMT

    510 Tue, 02 Jul 2013 01:42:23 GMT

    429 Fri, 24 Aug 2012 22:55:26 GMT

    383 Wed, 05 Sep 2012 07:33:52 GMT

    212 Mon, 13 Jul 2009 20:56:46 GMT

    209 Mon, 29 Aug 2011 23:04:43 GMT

    200 Fri, 02 Sep 2011 00:51:04 GMT

    189 Sat, 06 Nov 2010 00:10:06 GMT

    133 Thu, 02 May 2013 17:00:27 GMT

    112 Thu, 31 May 2012 18:40:25 GMT



If we exclude all results with a Last-Modified date equal to or newer than July 2nd, 2013, we end up with 2,257 of 3,488 devices vulnerable, or approximately 65% of ReadyNAS devices with their web interface exposed to the internet on port 80 are remotely exploitable.


It isn't clear whether these stats would change significantly if the same scan was performed on port 443 or how the exposure rate has changed since this particular scan was run. This does give us a starting point to figure out how popular these devices are and what specific industries and regions are most affected, but I will leave that to a future blog post.


If you are interested in seeing an exploit in action, Craig is hosting a demo on Tuesday, October 29th.



… Or if scanning is not your thing, take a look at the data provided by others and share your views on what it means and what we can do about it.  Apply your learnings to your own environment – how are you exposed? Can you help other people with the knowledge you’ve gained?  Can they help you?


This is the point behind Project Sonar – we believe that if we work together we can achieve great things and make the internet more secure.  Unfortunately though, at the moment there isn’t much collaboration and internet scanning is seen as a fairly niche activity of hardcore security researchers. And what they keep finding is that there is widespread insecurity across the internet.


We believe that the only way we can effectively address this is by working together, sharing information, teaching and challenging each other. Not just researchers, but all security professionals. To help you get started, we’ve created and highlighted some free scanning tools, and we’re sharing a LOT of data from the research we’ve conducted over the last year. You can find links to everything in HD's blog here.


Why not have a look through it and see how it applies to your environment? We hope you’re not affected by the issues, but the chances are you might be, and it’s better to know so you can take action to protect yourself.  And then help others learn from your experience.


And help spread the word – challenge your friends and colleagues to get involved. Create a weekend project to #ScanAllTheThings. Tweet about it. Then post it to your LinkedIn and Facebook pages and encourage others to join in. You can kick off a new scanning project; you can analyze existing data sets; you can suggest action plans for fixing bugs or share your security horror stories. There are so many ways to get involved. You can do anything, but please get involved and do SOMETHING!


Together we can #ScanAllTheThings and start coming up with practical solutions.


Scanning All The Things

Posted by rep Employee Sep 26, 2013



Over the past year, the Rapid7 Labs team has conducted large scale analysis on the data coming out of the Critical.IO and Internet Census 2012 scanning projects. This revealed a number of widespread security issues and painted a gloomy picture of an internet rife with insecurity. The problem is, this isn't news, and the situation continues to get worse. Rapid7 Labs believes the only way to make meaningful progress is through data sharing and collaboration across the security community as a whole.  As a result, we launched Project Sonar at DerbyCon 3.0 and urged the community to get involved with the research and analysis effort. To make this easier, we highlighted various free tools and shared a huge amount of our own research data for analysis.


Below is a quick introduction to why internet-wide scanning is important, its history and goals, and a discussion of the feasibility and present best practices for those that want to join the party.


Gain visibility and insight


A few years ago internet-wide surveys were still deemed unfeasible, or at least too expensive to be worth the effort. There have been only a few projects that mapped out aspects of the internet - for example the IPv4 Census published by the University of Southern California in 2006. The project sent ICMP echo requests to all IPv4 addresses between 2003 and 2006 to collect statistics and trends about IP allocation. A more recent example of such research is the Internet Census 2012, which was accomplished through illegal means by the "Carna Botnet," which consisted of over 420,000 infected systems.


The EFF SSL Observatory investigated "publicly-visible SSL certificates on the Internet in order to search for vulnerabilities, document the practices of Certificate Authorities, and aid researchers interested the web's encryption infrastructure". Another case of widespread vulnerabilities in Serial Port Servers was published by HD Moore based on data from the Critical.IO internet scanning project.


The EFF Observatory, and even the botnet-powered Internet Census data, helps people understand trends and allows researchers to prioritize research based on the actual usage of devices and software. Raising awareness about widespread vulnerabilities through large-scale scanning efforts yields better insight into the service landscape on the Internet, and hopefully allows both the community and companies to mitigate risks more efficiently.


We believe that scanning efforts will be done by more people in the future and we consider it to be valuable to both researchers and companies. Research about problems / bugs raises awareness and companies gain visibility about their assets. Even though probing/scanning/data collection can be beneficial, there are dangers to it and it should always be conducted with care and using best practices. We provide more information on that below - and we will share all our data with the community to reduce data duplication and bandwidth usage among similar efforts.



Feasibility and costs


As mentioned, this kind of research was once considered to be very costly or even unfeasible. The Census projects either ran over a long time (2 months) or used thousands of devices. With the availability of better hardware and clever software, internet-wide scanning has become much easier and cheaper in recent years. The ZMap open source network scanner was built for this purpose and allows a GbE connected server to reach the entire Internet - all IPv4 - addresses - within 45 minutes. It achieves this by generating over a million packets per second if configured to use the full bandwidth. Of course this requires the hardware to be able to reach that throughput in packets and response processing. Other projects go even further - Masscan generates 25 million packets per second using dual 10 GE links and thus could reach the entire address space in 3 minutes.


So this means that technically one can do Internet-wide scans with a single machine - if the hardware is good enough. Of course this is not the only requirement - a couple others are needed as well.

  • The network equipment at the data center needs to be able to cope with the packet rates
  • The hosting company or ISP needs to allow this kind of action on their network (terms of service)
  • As "port scanning" is considered to be an "attack" by a large amount of IDSs and operators, this activity will generate a lot of abuse complaints to the hoster/ISP and you. Thus one needs to notify the hoster beforehand and agree on boundaries, packet rates, bandwidth limits and their view on abuse complaints.


Especially the network equipment and abuse handling process are things that are difficult to determine beforehand. We went through a couple of options before we were able to establish a good communication channel with our hosters and thus allowed to conduct this kind of project using their resources. We saw network equipment failing at packet rates over 100k/sec, high packet loss in others at rates over 500k/sec and had hosters locking our accounts even after we notified them about the process beforehand.


Settings and resource recommendations


We decided that for most scans it is actually not that important to bring the duration to below a few hours. Of course one might not get a real "snapshot" of the Internet state if the duration is longer - but the trade-off regarding error-rates, packet loss and impact on remote networks outweighs the higher speed by far in our opinion.


  • Always coordinate with the hosting company / ISP - make sure they monitor their equipment health and throughput and define limits for you
  • Don't scan unrouted (special) ranges within the IPv4 address space - the Zmap project compiled a list of these (also to be found on Wikipedia)
  • Benchmark your equipment before doing full scans - we list some recommendations below, but testing is key
  • Don't exceed 500k packets per second on one machine - this rate worked on most of the dedicated servers we tested and keeps scan time around 2 hours (still needs coordination with the hoster)
  • Distribute work across multiple source IPs and multiple source hosts - this reduces abuse complaints and allows you to use lower packet rates to achieve the same scan duration (lower error-rate)
  • When virtual machines are used keep in mind I/O delays due to the virtualization layer - use lower packet rates < 100k/sec (coordinate with hoster beforehand)
  • If possible, randomize target order to reduce loads on individual networks (Zmap provides this feature in a clever way)



Best practices


If one plans to do Internet wide scanning, maybe the most important aspect is to employ best practices and not interfere with availability of resources of others. The ZMap project put together a good list of these - and we summarize it here for the sake of completeness:


  • Coordinate not only with the hosting company but also with abuse reporters and other network maintainers
  • Review any applicable laws in your country/state regarding scanning activity - possibly coordinate with law enforcement
  • Provide possible opt-out for companies / network maintainers and exclude them from further scanning after a request
  • Explain scanning purpose and project goals clearly on a website (or similar) and refer involved people to it
  • Reduce packet rates and scan frequencies as much as possible for your research goal to reduce load and impact on networks and people



Implementation details


After covering the theoretical background and discussing goals and best practices, we want to mention a few of our implementation choices.


For port scanning we make use of the before-mentioned excellent Zmap software. The authors did a great job on the project and the clever IP randomization based on iterating over a cyclic group reduces load and impact on networks while still keeping very little state. Despite Zmap providing almost everything needed, we just use it as a Syn-scanner and do not actually implement probing modules within Zmap. The reachable hosts / ports are collected from Zmap and then processed on other systems using custom clients or possibly even Nmap NSE scripts - depending on the scan goal.


So as an example, for downloading SSL/TLS certificates from https webservers, we do a 443/TCP Zmap scan and feed the output to a couple of other systems that immediately connect to those ports and download the certificates.  This choice allowed us to implement simple custom code that is able to handle SSLv2 and the latest TLS at the same time. As we see slightly below 1% of the Internet having port 443 open, we have to handle around 5000 TCP connections per second when using Zmap with 500k packets per second. These 5000 TCP connections per second are distributed across a few systems to reduce error-rates.


As another example serves our DNS lookup job. When looking up massive amounts of DNS records (for example all common names found in the certificates) we use a relatively high amount of virtual machines across the world to reduce load on the DNS resolvers. In the implementation we use the excellent c-ares library from the cURL project. You can find our mass DNS resolver using c-ares in Python here.




In our opinion visibility into publicly available services and software is lacking severely. Research like the EFF observatory leads to better knowledge about problems and allows us to improve the security of the Internet overall. Datasets from efforts such as the Internet Census 2012, even though obtained through illegal means, provide fertile ground for researchers to find vulnerabilities and misconfigurations. If we were able to obtain datasets like this on a regular basis through legal means and without noticeable impact on equipment, it would allow the security community to research trends, statistics and problems of our current public Internet.


Companies can also use these kinds of datasets to gain visibility into their assets and public facing services. They can reduce risks of misconfiguration and learn about common problems with devices and software.


We intend to coordinate with other research institutions and scanning projects to be able to provide the data to everyone. We want to establish boundaries and limits of rates and frequencies for the best practices in order to reduce the impact of this kind of research on networks and IT staff.


By leveraging the data, together we can hopefully make the Internet a little safer in the future.


Welcome to Project Sonar!

Posted by hdmoore Employee Sep 26, 2013

Project Sonar is a community effort to improve security through the active analysis of public networks. This includes running scans across public internet-facing systems, organizing the results, and sharing the data with the information security community. The three components to this project are tools, datasets, and research.


Please visit the Sonar Wiki for more information.