Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Speech to Text on iPhone vs. Pixel (twitter.com/jamescham)
370 points by tosh on May 27, 2020 | hide | past | favorite | 129 comments


Some hackers have been trying to reuse Google's offline speech recognition models within other software toolkits, https://hackaday.io/project/164399-android-offline-speech-re...

> Especially the offline part is very appealing to me, as it should to any privacy conscious mind. Unfortunately this speech recognizer is only available to Pixel owners at this time. Since GBoard uses TensorFlow Lite, and the blog post is also mentioning the use of this library, I was wondering if I could get my hands on the model, and import it in my own projects, maybe even using LWTNN.

Recent (May 2020) news suggests that these models may be coming to Chromium, which would make them widely accessible for offline transcription and dictation, e.g. WebRTC or video captions, https://hackaday.io/project/164399-android-offline-speech-re...

> Google is building speech recognition into Chromium, to bring a feature called Live Caption to the browser. To transcribe videos playing in the browser a new API is slowly being introduced: SODA ... What is especially interesting is that it seems it will be using the same language packs and RNNT models as the Recorder and GBoard apks

In Aug 2019, the Live Transcribe engine was open-sourced, https://github.com/google/live-transcribe-speech-engine & https://opensource.googleblog.com/2019/08/bringing-live-tran...


Thats not the engine though, its just an API wrapper that enables infinite streams for the 5-minute limited sessions lengths of the Google API.


https://github.com/google/live-transcribe-speech-engine/blob...

> Extensible to offline models

Hopefully a future version can work without the cloud API.


Yep, that will be fun. Hopefully we will soon see the same hardware accelerators in laptops than currently existing in smartphones. I bet in the future, neural models will constantly run in background for some tasks.


Laptops urgently need to have a hardware switch for the microphone. A mobile phone should have one too - but that will never happen. At least the laptop is large enough that one little switch is not a problem (unless you are Apple and even essential peripherals are a problem).


>Apple has brought its hardware microphone disconnect security feature to its latest iPads.

The microphone disconnect security feature aims to make it far more difficult for hackers to use malware or a malicious app to eavesdrop on a device’s surroundings.

The feature was first introduced to Macs by way of Apple’s T2 security chip last year. The security chip ensured that the microphone was physically disconnected from the device when the user shuts their MacBook lid.

https://techcrunch.com/2020/04/03/apple-hardware-microphone-...


As someone who uses his laptop in clamshell mode most of the time, this was an anti-feature for me.

Having to literally lift the lid and MacOS readjusting the display every time I want to have a meeting was frustrating. It got to the point where I just bought a cheap USB microphone just to be done with it.

I support having a hardware switch for it so that I can have the choice.


For me it certainly is a feature. I want the microphone of when turning down the lid.

I mean, the microphone is under the lid anyway so you are not doing whoever you are talking to any favours by using it in clamshell mode anyway.


It would take Apple just one change in ToC to exclude something from the software switch. Firstly it would be only law enforcement, then diagnostic features, then selected trusted partners, then anyone with money. If you think Apple would never risk its reputation like that, just wait until their market value stops to grow. That is why we need a hardware switch.


Pinephone and Librem 5 have these afaik.

Purism laptops do as well, I believe.


There are already some novel applications of this. It is extremely useful to use Google's screening service for unknown numbers that might be a call I'm expecting from a new number. I can't recall the last time I actually listened to a voicemail. I know that at least voicemail transcription works elsewhere, and I can't speak to how well, but I do know that on the Pixel, transcription is really quite impressively accurate.

I'm sure it's not much different than the test shown in the twitter post, but I've enjoyed calling myself from another phone and then screening the call and doing my best micro machines guy fast-talking impression just to see how well it really can transcribe conversation at real-time speeds.


Call screening has saved me so much time that it's one of the main things stopping me from switching to iPhone.


iPhones have had speech to text for voicemails for years. Long enough that I can't remember a time without them.


That's quite different from call screening, where it actually picks up the call for you and transcribes what they're saying live and you can choose to pick up the call or ignore it based on their reply.


Sadly only available if your carrier supports it. Here in the Netherlands it doesn’t work with one of the largest carriers (Vodafone) for example. I also don’t know if it’s still English-only.


Is voicemail popularity some US thing? I actually don't know anybody who uses it.


I don’t expect voicemail from other human beings, but I do expect it from my bank, my doctor, the government, my landlord, my plumber, my child’s school, the repair shop my car is in, etc. For local in-person-visitsble companies, a phone call is still the #1 way they update you on things. And, as they make most of these update calls while everyone is at work, inevitably these calls all become voice messages.


At least in Western Europe, there is near zero chance you’ll get an unexpected call from any of those. Either you’re the one calling, or updates will be sent via email / message / WhatsApp / some specialized app. Probably another efficiency consequence of human labor being insanely expensive.


I'm in the UK, and my dentist and doctor both communicate by phone, including appointment reminders. The garage I use has a website but no email, and they call if you're waiting on a part (not exactly unexpected but close enough). My car insurance and home insurance and bank (much to my dismay) all semi-regularly phone me, each of the ones Ivr listed I've had an unplanned phone call from in the last 6-12 months


And on the other side, in the US I still expect calls and voicemails like was suggested, but the two bike repair shops I frequent and my dentist have switched to texting, while my medical provider emails.


It's very common with people around and over roughly 40. We weren't raised on texting and social media. Usage of voicemail only goes up with the age of my friends. Vocal communication is very efficient when clarity is important.


+1 on this, plus it's very well integrated and backed by audio when in doubt.


My iPhone transcribes voicemails so slowly that most times it's faster and easier to just listen to it.


That describes perfectly most interactions with Siri

Is it worth the hassle to try and stand a reasonable chance of failure, and then also have to do it manually, or should I just skip to the chase and do it myself.

Unless I’m setting a timer, probably just do it myself.


I enabled Type-to-Siri in accessibility, mostly for asking what song is playing.

You can still Hey Siri for voice.


Similar offline versus online privacy thing for music recognition.

My Pixel is sat here charging. Billie Eilish is singing in the background, by the time she sings "I should have known..." it has recognised the music and passively displayed "No Time To Die by Billie Eilish" on the lock screen.

Google's team were building an ML model for music recognition and they realised oh - the smallest model fits on a phone. We don't need to spin this up as a cloud service we can just deliver it to the phones of people who want it as a feature. "Now your phone can tell you what that music is".

Of course down the road this awareness of context allows even cleverer agent interaction. "Which Bond was that?" you ask as Adele sings "Skyfall". "In the movie Skyfall, James Bond was played by Daniel Craig".


How in the world is that possible with 97 million+ songs in existence?


I believe their model only works for the top 50k songs in your region. Pretty rarely does my pixel not recognise music that is playing around me, that seems to be a large enough number


More likely the clever part will be you will get more ads for spy cameras and the like.


Buried in the replies is a comparison with iOS’s on device speech recognition enabled, which paints a somewhat (though not entirely) different story: https://twitter.com/BenLumenDigital/status/12657182691908321...


I don’t know what the “new feature” bit is about; I’ve been using off-line dictation on iOS for years. Anytime I need to dictate something very long, I turn the phone on airplane mode so that I can dictate without Apple, inevitably deciding, at some point in the middle of my dictation, to delete my entire message and begin to retype it in slow motion (has anybody else experienced this? I’ve been dealing with this for at least five years, since an early version of their online dictation).

Unfortunately, airplane mode is the only way to enable off-line dictation. Or, is there another way?

The first version of dictation had no online mode at all; when Apple added that, presumably as a Slowloris attack on the entire world, it ruined the entire experience.


Yes. It is absolutely horrible. You can press-hold on the mic icon to force it to offline mode.


Oh! Good to know!


OMG thank you!


If you want to enable Apple's much-better offline dictation, you can press-hold on the microphone icon, then select "English (US)" or even any other language.

Apple's online, presumably neural network based speech transcription service is horrible. It is an absolute disaster. Slow, unresponsive, drops syllables randomly, and provides no good feedback to the user. It is hard to believe that it was designed by someone who actually intended this product to be consumed by real users. It fails even basic comparisons against the predecessor that it was intended to replace.


Apple is still behind on accuracy either using on device or server connection.


Yes, though keep in mind that Twitter/Hacker News power users are often very privacy conscious. I think people are happy about offline mode regardless of how relatively fast or accurate it is.


How "offline" is the offline mode?

Will it contact servers to sends statistics and samples when back online?


Since I got my Pixel 4, I've found myself using more and more speech to text as... it's just that quick and easy. Every so often, I try to use speech to text on my iPad Pro and I have an experience pretty similar to Poor Michael Geer over here.


+1 The Apple experience is frustrating at best and usually flat out wrong. Especially for use cases it is supposed to be good at like hands free driving where accuracy is paramount to use case's success.


I basically can't look at my phone screen when doing TTS for full sentences, it does something to derail me and I basically cannot finish a sentence, similar to how DAF trips people up. Am I alone in this?

[1] https://en.wikipedia.org/wiki/Delayed_Auditory_Feedback


I love when the "Good Mythical Morning" guys demonstrated the effect for entertainment.

https://www.youtube.com/watch?v=TB2rEddp-Oo


Yeah. I’m convinced the current model is just too confusing. But I really wish there were new interaction patterns that took advantage of low latency speech recognition...


Latency matters a lot, and it's one of the least-appreciated aspects of UX design.

I (accidentally) dropped my desktop Linux system into pure console mode the other day and realized how much FASTER it felt just because of the improvement in keyboard latency.


Some applications have noticeably better keyboard latency than others on my Linux system. Kitty, Emacs and Chromium beat most QT apps and Firefox by a mile. Also, latency is much better for me in Wayland than in (composited) X (sway vs i3). Not sure why that is.


The Pixels get a lot of hate from people who haven't used them though I think they are great phones and I'll never buy a Samsung phone again.

There are so many nice software touches on them that you just have to experience to appreciate.


> I think they are great phones and I'll never buy a Samsung phone again.

These are seperate statements.

You can not like pixel but still consider them better than samsung.

Personally I've had one samsung phone and it nearly put me off android entirely- had I not previously had a Oneplus1 I would have considered the entire ecosystem to be pretty garbage.

Samsung is a very low bar for software/ecosystem.


I'd love to say I've had the same experience, but unfortunately that isn't the case. I'm on my fourth Pixel 3 because of hardware issues (the first time it was my fault, all the other ones not really). My previous Pixel 2 worked flawlessly for two years. I'm starting to think of going back to Samsung after this experience.

Which is a shame, because I do love how unencumbered with bullshit features and spamware the phones are.


This is not specific to a phone model. It works reasonably well on the venerable Nexus line of phones too. Any android device that uses gboard. I still won't actually use it for anything, but it works within a limited vocabulary. It's still not what I would call generally useful, although I'm sure it is very helpful for those with disabilities.


Can I try this in an older Android tablet? I thought it was a TF-lite thing under the hood.


I have an extension in the Chrome store [0] that brings dictation into GMail. I piggyback on the Web Speech API, which in the case of Chrome uses Google's servers.

Considering the audio stream upload & processing & network jitter/lag, the speed at which text results come back is simply incredible. I don't remember the exact timing, but it was a roundtrip of 40-100?ms which is... crazy/magic.

I made a small experiment with this same Chrome text to speech engine which triggered a google image search and showed in near real time image results for the spoken words. The slowest part of that Rube Goldberg was the google images search + loading the images. [1]

I also.... "secretly" believe that very fast speech recognition is one of "the" secrets to building a smarter / better digital voice assistant. It's one of the key components (with a ton of groundbreaking NLP and/or the right regexes) that might allow closing the "strange feeling" you get when talking to the robot... voice... and... it... answers....... not ... exactly when...... you were expecting it.... to.

I'm super mega busy these days but also super mega interested in this. Reach out? :-)

[0] https://chrome.google.com/webstore/detail/dictation-for-gmai...

[1] https://www.instagram.com/p/BwqFQgWFsYu/


Latency is super important.... High latency is one of the largest problems with modern software.

But what about transcription accuracy? I mainly use Android but also use an iPhone.... I have found transcription accuracy is so much higher on my iPhone then Android.

I pulled out my old BlackBerry (Android) when I sent my Android phone in for repair recently. The voice transcription via blackberry's keyboard is hands down better then either stock Google or iPhone. It's surprisingly feels like a regression going back to the new phone (other than speed and battery life)


Did you watch the video? The iPhone went back and retroactively inserted errors into its transcribed text.


This could also be a different quality of the microphone.


An an iPhone user, I very often find speech recognition far too slow. It is to the point I rarely use the feature because it often causes frustration. The Google performance here is amazing.

Is this a feature only of Pixel phones?


The more recent Pixel devices have a local accelerator for different machine learning tasks, they brand them Visual Core and more recently Neural Core. https://www.androidauthority.com/google-pixel-4-neural-core-... has more details and a lot of the speed up is probably tied to these being onboard.


I just tested on my old OnePlus 5t, I find it's slower than the Pixel in this video, but still much faster than the iPhone.

It's not unique to the Pixel, it's just that it's a faster phone than my 5t.

But even though it's just a little slower than the Pixel, it makes a big difference. The closer it is to real-time, it seems to get so much better.


Some iOS Keyboards (SwiftKey) and apps like Draft support speech to text via Microsoft, which seems to work very well.


A printed dictionary captures extended history of human language. An audio recognition model based on global mass surveillance can snapshot similar history. After a good model exists and can be run entirely offline, the (sunk) privacy costs are dwarfed by new value that can be created by speech-to-text enablement of human expression.

This is what Google has done and it was a monumental human achievement, deserving a place in history alongside Gutenberg, due to the small size of the model that could operate fully offline on power and space constrained mobile phones.

Compare this approach (partial mass surveillance generating privacy-preserving offline models) with the approaches of competitors like Alexa & Siri: both use mass surveillance for model training, but neither of them make their models available for offline and privacy-preserving public use.


Wow, text to speech does not work anywhere near that good when I speak. Australian accent perhaps?


I was wondering about that. And also about other languages, because I see Thai speakers using TTS on their Android phones and it seems to work really well in terms of speed and accuracy.


Somewhat related: the Google Translate app uses older voices for some languages, Google Assistant uses the newer, better ones. Not sure why, but if you're using that feature, you should use Google Assistant ("be my translator, please")


The TTS close caption on Google Meet is also extremely useful, some features Google is just years ahead in.


Would be nice to test something opensource alongside. Like https://github.com/alphacep/vosk-api which runs on Android and iPhone offline.


QUESTION

How hard, on a scale of 1 to impossible, how hard would it be for Apple, or someone else to just grab the model google is using from the pixel and reverse engineer it and steal all their years of research.


My understanding is that Google’s big advantage is that they’ve collected so much good, annotated voice data.


It would probably be easy for any on-device model. But it would be illegal


What part of it is illegal? What if they reverse engineered the model, and then understood the fundamentals of how it worked, and implemented and trained the same architecture with different data?

Or trained their own architecture with data sampled from the Google model?

is it "stealing" data, the architecture, the parameters, or the act of reverse engineering and productizing the knowledge?


That's what patents are for. The tricky part is figuring out what aspects were obvious or not to an industry professional.

Models are also usually black boxes, and the techniques used are published.


In case anyone didn't notice, this is comparing Gboard (Google's virtual keyboard) on iOS versus Android. It's NOT comparing Apple's voice recognition tech with Google's.


That's actually (sneakily) not the case. Gboard does have voice transcription, but it's triggered by pressing the microphone button at the upper right corner of the keyboard. Apple's voice transcription is still triggered from the bottom right button, even if a 3rd party keyboard is being used.


Thanks. I stand corrected.


It is confusing but it is the system (iOS) text to speech used here. You have to tap the Google logo to get Google's text to speech. https://www.cultofmac.com/469485/google-gboard-improve-ios-d...


So this is google's product on both devices? I thought this is a demonstration of apple's built-in speech-to-text vs android's built in speech-to-text.

If it's the former this is less interesting except for the observation that speed of conversion really is valuable.


You missed mwest217's reply above explaining that it is indeed Apple's speech-to-text engine being used here.


Thanks, I don’t think that comment had been posted when I asked my question.


For me it seems like Google voice typing is getting worse and worse. It capitalizes random Words in a sentence Like this. It specifically ignores the word "o'clock".


Google pixel 3a, reading your comment: "for me it seems like Google voice typing is getting worse and worse. It capitalizes random words in a sentence like this. It's specifically ignores the word o clock."


See! I don't know who at Google has this personal vendetta but it has to be on purpose. You say "We start at three o'clock" and it just types "We start at 3" when what you want it to type is "We start at 3:00". You can sit there saying "one o'clock, two o'clock, three o'clock, four o'clock" and it just types out "1234". It knows the word I'm using but ignores it on purpose.

It's a real issue for me as "4:00" has three significant digits, it's specific, whereas "4" might mean anything between 3:50 and 4:10.


It's a pain, but can you say "four colon zero zero?"


Nope! Google ignores that too and you just get "400".


Google Pixel 2 XL: "for me it seems like Google voice typing is getting worse and worse it capitalizes random words in a sentence like this it specifically ignores the word okay"


I'd like to see these side-by-side with someone typing with thumbs or typing with a full keyboard and see who really wins.


It's amusing (from the comments) how much people are biased towards Apple.

1. Apple does it online, Google does it offline, hence Apple being slower is okay. But why is Apple's transcription more inaccurate then?

2. Google violates privacy because it used 411 data to train its models, hence its speech transcription quality is better. But Apple is the one doing it online, are we sure it's not using the voice data too, similar to how it has/was doing it with Siri?

It will sound like I am trying to defend/promote Google here, but that's very far from my intent.

As an iPhone user since the first iPhone, I just want iPhones to be better. I use iPhones for the same privacy concerns as many of you, but let's not give a pass to Apple. Let’s demand higher quality from them so that they spend the time, money and effort to improve. I don’t want to feel like I am compromising on a worse overall experience in favor of better privacy when i buy my next phone.


Why do you say: "I use iPhones for the same privacy concerns as many of you"? In the past I have heard rumors that is not true, with a quick search I found out a recent article on the rumors that I heard once: https://www.forbes.com/sites/gordonkelly/2020/05/21/apple-ip...

PS: Either way, I agree with most of your comment. :)


Because the alternative is Android, which is very difficult to de-Google. Even if you do de-Google you need to open yourself up to a swath of security concerns. Why would I trust some internet account that is giving me a binary that compromises my phone for me? Should I trust that it will only do what they say it will do because they are a good person? Would they have nothing to gain for having complete control of my device?

So, if the choice of secure phone is between stock Android and stock iPhone, then I think iPhone is the lesser of two privacy evils.


A degoogled Android phone is also for all intents and purposes useless to many people - my work 2fa, work profile, banking apps, even some games all require Google services.


Same with a degoogled iOS device. In that respect, they are no different.


GrapheneOS is a de-googled Android and secure.


What’s the % of users who use that? Until you can buy a de-googled Android phone straight from the store it’s kind of irrelevant to the comparison for the majority of non technical users.


I would assume ROMs like LineageOS would be viable as well.


iOS having better privacy is a myth. iOS sends your location to Apple every time any the GPS is used, and you can't turn it off. You cannot install apps on your phone without telling Apple which apps they are, and if you want to develop your own apps for your device without having to reinstall weekly, you have to hand over payment information.

Stock Android devices from nearly any vendor do not suffer from these problems, and reputable vendors do even better (like in this case, where even voice transcription does not send data off device).


Those are different things than what most people talking about iOS privacy mean by “privacy.”

The thing people usually mean by privacy is “security of personal data and metadata”—i.e. being able to use your phone to break the (perhaps unjust) law, without a state actor being able to then prove you broke the law by forensically analyzing your phone.

Phones already leak a lot of circumstantial forensic evidence just by being phones. They talk to cell towers, for instance. So there’s a certain level of information leakage you’re accepting by doing something private on a phone in the first place.

The point of choosing one phone over another, for its privacy, should be to secure the phone in all the other ways—to prevent any information from leaking that can be prevented from leaking while retaining the functionality of the phone.

In that regard, iOS is usually considered the winner.

(Also, iOS is frequently considered the winner just by the fact that Apple devices can’t be interfered with by OEMs at the behest of state actors; in est, the OEM is always Apple, and so the only applicable state actor is the US. If I’m e.g. a Canadian diplomat in China and my phone breaks, I’m not going to trust a Chinese-OEM Android phone, but I might be able to trust a phone I send a plainclothes gofer to buy me from a Chinese Apple Store.)


> In that regard, iOS is usually considered the winner.

That, too, is based in myth. More iPhones have had malware than Android phones available for purchase in Europe and the US by an order of magnitude, and Android vulnerabilities are more expensive than iOS vulnerabilities, so if your standard of privacy is protection from state actors, you should prefer Android devices.

> If I’m e.g. a Canadian diplomat in China and my phone breaks, I’m not going to trust a Chinese-OEM Android phone, but I might be able to trust a phone I send a plainclothes gofer to buy me from a Chinese Apple Store.)

If you're in China, you're in trouble because the CCP has access to all your iCloud data and any iMessage messages you send while there. Nobody is suggesting that you buy a Chinese OEM phone with who knows what modifications. Just get a Blackberry, Nokia, Google, or Android One device, and you'll be in a much better privacy situation than if you got an iPhone.


> More iPhones have had malware than Android phones available for purchase in Europe and the US by an order of magnitude

I'm assuming here that you're taking "standard precautions":

1. You get new phones from a trustworthy source, e.g. the official storefront of the relevant company.

2. You buy in person, so that the phone can't be intercepted in transit because some watchlist redirected it based on who you are.

3. You get someone likely to not be on any such watchlist to buy your phone for you (i.e. you hire some kid off the street to go into the store for you, and hand them a wad of cash to pay with.) This is to ensure that, for as long as possible, the phone's MAC address doesn't end up automatically associated with your activities. (It will eventually; but that's why you burn your phones pretty often.)

Under such rules, you won't get any "bonus with purchase" rootkits on the device. The device will only have a rootkit if all such devices in the current market have rootkits.

> Just get a Blackberry, Nokia, Google, or Android One device

You actually cannot buy Android phones in China—even from these brands—which haven't been passed through the hands of a Chinese distributor at the "OEM customization, root-of-trust-not-yet-signed" stage. (Heck, you can't even buy a Nintendo Switch in China without it going through the hands of Tencent.) Every one of these phones has a "Chinese edition" with different firmware, and that edition is the only one available for sale in China.

(How does it work for Android One phones? IIRC, the phones with these editions hit a different Android One firmware-update server, one run by the Chinese government. They still get "stock" Android firmware... in the sense that the only changes are a potential rootkit.)

The reason iPhones are trustable in that situation is that Apple has constrained their infrastructure such that they only have one firmware. That means that any China-specific customizations have to be built into that single global firmware, and activated by software (i.e. by choosing your "Region" in the phone setup.) And that means that your own diplomatic home office can inspect all such customizations using the full resources of your own state counterintelligence apparatus, and then give you the go-ahead (or not) for using the phone with such customizations.

The Chinese Android phone firmware is only distributed within China, so it's much harder to be sure you capture every version of it for security-analysis. And, even if you do, it may not tell you much about what they can do to specific people, as it may just contain "generalized backdoors" (e.g. cellular-carrier-triggered automatic firmware-update push) such that any code that actually spies on people is only pushed to the devices of People of Interest (likely with additional logic to delete itself if the phone leaves the country), such that it's nearly impossible to exfiltrate the device from China back to your own counterintelligence.

-----

But the larger consideration, if we're talking about e.g. populist recruitment into civic-action groups, is that none of those Western Android phone brands are popular in China, compared to Chinese-owned brands. Both because the state controls the advertising (so the Chinese brands get product placement on Chinese TV, and the Western brands do not); and because the Chinese phones are just plain cheaper for the same level of features (which mostly is down to vertical integration and simplified logistics with component manufacturers.) So, sure, you can try to buy one; but it'll be hard to find anyone carrying one. And you can't just rely on a random stranger to have one. Heck, just buying a phone from a Western phone-brand probably puts you on a watch-list.

...other than Apple, because Apple is unavoidably still seen as a fashionable brand in China, despite the Chinese's government's best efforts to quash this sentiment.


> I'm assuming here that you're taking "standard precautions":

Under the standard of "standard precautions," iOS has had multiple orders of magnitude more malware infections than Android instead of just one or two due to Xcodeghost.

Also, my statement was about people outside of China, as you can see by the end of the quoted sentence, so many of your reasonable precautions don't apply.

> You actually cannot buy Android phones in China [snip]

You misread my point. My point was that if you're in China, you're already screwed, no matter which device you legally purchase in China. For the rest of us outside of China (including you in Canada and me in the US), Android devices are clearly superior for privacy as I have shown earlier.

> IIRC, the phones with these editions hit a different Android One firmware-update server, one run by the Chinese government.

Android One phones are not legally sold in China because the update server is run by Google, which has no servers in China.


Apple does not do the transcription online, at least not on any modern iPhone. Turn on airplane mode and disconnect wifi and give it a try yourself.


It requires iOS 13. Here's the code that does it:

    var req: SFSpeechRecognitionRequest;
    if #available(iOS 13, *) {
        req.requiresOnDeviceRecognition = true
    }
And the docs link: https://developer.apple.com/documentation/speech/sfspeechrec...


i think it does both. online has much better accuracy in my experience.


Wow does anyone know why it’s slow on the iPhone?


iPhone is doing this online, sending data packet back to Apple.

Pixel is doing this completely offline.


I'll add another non-technical reason: ML is a Google core-competency, and has been for some time now[1], but it is/was not central to Apple.

1. Google 411 was introduced 13 years ago, in 2007, and look at the ML papers published and techniques Google pioneered since then.


iPhone is doing it on device as well.


Fairly certain that isn't true. On my wife's iPhone, she was never able to use voice to text while in airplane mode. Maybe that's changed with the latest 11 but it definitely didn't work on the X.


Go to Settings -> General -> Keyboards. If your language and phone support it, it should say "You can use Dictation for {language} when you are not connected to the internet." below Dictation Languages. It works since the 6s.


Is it a side-effect of Google expertise in analyzing people’s private data in order to transform it into something useful to sell ads?


The iPhone keyboard seems not to be the original one.

Is this really calling Apple’s transcription service? And does it maybe even add some latency?


granted neither google or apple really works well with my voice for some reason...


Must be a Pixel 4. My Pixel 3 result is arguably worse than the iOS one. It's too painful to ever use.


Twitter video is a mess. I can't read anything in this video


I will never support google in anyway, at least in tension fully, but Apple voice to text is absolutely a palling. I just dictated this message here is my iPhone, so I will leave it as is without fixing the fuck ups.


And Apple could literally do this faster than Google due to much better onboard hardware. They're just years behind at AI at this point. It's getting embarrassing. Some _individual researchers_ can do a better job than their entire speech-to-text team is doing here.

Although to me, neither example is useful without:

1. Automatic punctuation

2. Robust recognition in noisy environments

And neither system is capable of that yet, although Google's system is better at #2.


I'm curious about the privacy policies behind both engines and their development.

Google's gotten an advantage in many fields by simply trampling over user's rights. Apple has been more respectful of user rights, and that's kind of made life harder for them too.


Here's my controversial post of the day:

I'm okay with Google trampling on my rights, in exchange for the things it gives. Personalized search results, better ad hoc translation, all of these things work well only because they have my data. And for me, I like these things enough that I'm perfectly okay with the trade.


In this case however the transcription is done fully offline on device - that is why it is so fast. Yes Google may have trampled on your voicemail data and 411 calls to create a model that works so amazingly well across different accents and languages - but it is forgivable in this case given how good and useful it is!


> I'm okay with Google trampling on my rights, in exchange for the things it gives.

That's fine, you're a consenting adult renouncing your right to privacy, and that's perfectly fine.

The problem is, Google is trampling on the rights of _all_ users by default, without any consent (consent, by definition, must be an INFORMED choice).


> And for me

The controversy is that you don't get to choose. It is extremely hard to avoid giving your data to Google if you don't want to make that trade. You can't use an Android phone, you can't use ubiquitous services like Google Search, Youtube, GMail, or Google Maps, you can't use any website without being sure to block Google Analytics, Adsense, reCaptcha, and whatever else they own.

It's possible, but it's very hard, and requires constant vigilance. It is not a free choice in any meaningful sense.


New Pixels (I think 3, but it may only be 4) have an inference chip of some form. All of this is done on-device. They ported this to older devices, but it's not as fast (but still offline).

The assistant is one place (probably only place) where Google has a better privacy story than most (including Apple). Notwithstanding the phone offline capabilities, you can tell the home devices (which use online transcription) "that was not for you" and they will delete what they heard.


>All of this is done on-device

Having an on-device inference chip doesn't preclude them from sending back "telemetry" to improve their models. Unless they explicitly say that they don't send back data (or there's a toggle for that), my default assumption is that they do.


This is definitely true, the key thing here is that they don't have to (and assistant does work outside of service). In addition, there's no guarantee that "that was not for you" will actually delete details about the last command, but Apple doesn't even attempt this. You have no idea if something Siri shouldn't have heard will end up in a training dataset (disclaimer: I'm saying this in ignorance of the Siri privacy policy).


If I recall, Apple’s speech recognition relies on their servers while Pixel is on device.


Not necessarily user data. Google own reCAPTCHA and YouTube. Think about that.


Well the iPhone is doing this online. While Pixel is doing it offline. Which is why the latency difference. ( Specific to English only, some languages still requires connection on Pixel )

I think dictation on iPhone is good enough, at least 10 times better than pre iPhone / Machine learning era. But it is also not as good as google. Partly because Google has more Data and started research way earlier, and partly because this isn't a fundamental strength of Apple.

I am pretty sure Apple is working on Offline Dictation. If Google can do it with Snapdragon on a much smaller transistor budget, there is no reason why Apple cant do it with more transistor.


That just makes it even worse!

Look how much better the Pixel is doing on accuracy too. It's almost flawless while the iPhone, despite the advantage of offline processing, is getting maybe 10% wrong and writing complete nonsense words in there. Who is the hell is "Poor Michael Geer"?

(for clarity, I'm an iPhone user and won't be switching because privacy, but this makes me jealous)


I keep trying to dictate text messages to my iphone while driving and it keeps driving me nuts. Not usable.


As pointed out on another comment, this is using Google's 3rd party keyboard, so in effect it's google vs. google.


No, it's still using Apple's native speech-to-text feature.


iOS can do offline speech recognition (for English at least). It is even available to apps as an API. The quality is not great.


iPhone is doing it on device ("Offline" as you say) as well




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: